Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CLI --total functionality #147

Closed
2 tasks done
andrewtavis opened this issue Jun 7, 2024 · 6 comments
Closed
2 tasks done

Implement CLI --total functionality #147

andrewtavis opened this issue Jun 7, 2024 · 6 comments
Assignees
Labels
feature New feature or request

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Jun 7, 2024

Terms

Description

This issue would implement the --total (-t) functionality of the Scribe-Data CLI. This functionality would check Wikidata for the total of certain groupings of languages and word types. Usage of this would be:

scribe-data total -l German -wt nouns  # number of German noun lexemes
scribe-data total -l German  # number of German lexemes
scribe-data total -wt nouns  # number of noun lexemes
  • Note: it would be good to allow the user to pass nouns or noun, etc, in order to avoid unneeded errors :)

The following Python code could be edited for most of the functionality that we need for this, whereby we could also add some changes such that the word_type argument would also function :) From there the result of this function is returned to the user with a message including the given language and/or word types.

from SPARQLWrapper import SPARQLWrapper, JSON

def get_total_lexemes(language, word_type):
    endpoint_url = "https://query.wikidata.org/sparql"
    sparql = SPARQLWrapper(endpoint_url)

    # SPARQL query template.
    query_langage_template = """
    SELECT 
        (COUNT(DISTINCT ?lexeme) as ?total)

    WHERE {{
      VALUES ?language { wd:{} }
      ?lexeme dct:language ?language ;
              wikibase:lexicalCategory ?category .
    """

    filter_word_template  = """
         FILTER(?category IN ( { wd:{} } ))
    """

    end_of_query = """
    }}
    """

    if word_type:
        query_langage_template += filter_word_template

    query_langage_template += end_of_query

    # Replace {} in the query template with the language value.
    query = query_langage_template.format(language)  # , word_type  # <-- we want to include this and have this also be repalcesd

    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    return int(results["results"]["bindings"][0]["total"]["value"])
  • Note: the above function also needs to be able to accept lists, so it should be languages and word_types :)

Contribution

@mhmohona will be working on this as a part of GSoC 2024! ☀️ Please write in here so I can assign, and let us know if there's anything we can do to support!

@andrewtavis andrewtavis added the feature New feature or request label Jun 7, 2024
@mhmohona
Copy link
Collaborator

mhmohona commented Jun 7, 2024

Thank you for detail explanation. 😄

@andrewtavis
Copy link
Member Author

Very welcome! 🥳🥳

@andrewtavis andrewtavis changed the title Implement CLI --poll functionality Implement CLI --total functionality Jun 8, 2024
@andrewtavis
Copy link
Member Author

andrewtavis commented Jun 9, 2024

One thing to note here, we should likely allow the user to pass either noun or nouns, etc. Just so it's easier :) Adding this to the issue 😊

@andrewtavis
Copy link
Member Author

One thing to note here, we should likely allow the user to pass either noun or nouns, etc. Just so it's easier :) Adding this to the issue 😊

We can use the following for this, @mhmohona:

def correct_word_type(word_type: str) -> str:

I think that working on this one would be a great next step, @mhmohona! This would give you a bit of Wikidata experience as well :) I'll add in the files and the section for the CLI now!

@andrewtavis
Copy link
Member Author

3736222 adds in the basics for this, @mhmohona :) The work for this command can be in cli/total.py, and the command structure has already been added into cli/main.py. I think we should be able to work with the Python code in the issue text and SPARQLWrapper to make this work. Happy to discuss further!

@andrewtavis
Copy link
Member Author

Closed via #162 🥳 Thanks for this, @mhmohona! Amazing progress, and looking forward to when the rest of the PRs and functionality are in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants