Skip to content

Add workflow to check queries #339

@andrewtavis

Description

@andrewtavis

Terms

Description

This issue would create a new workflow in .github/workflows called check_query_identifiers.yaml that would call a Python script that would check all queries within the language_data_extraction directory to make sure that the identifiers used within them are appropriate. We can put these scripts in a new .github/workflows directory called check. The scripts would be:

  • /src/scribe_data/check/check_query_identifiers.py would check all queries in the language_data_extraction directory for two things
    • Is the language within ?lexeme dct:language wd:Q12345 in the query appropriate given the directory that it's in?
    • Given the data type of the query - i.e. nouns, verbs, etc - does the QID for this data type appear in wikibase:lexicalCategory wd:Q12345?

Queries that fail these conditions should be added to a list and shown to the user in an output of the script and thus the workflow. Something like:

There are queries that have incorrect language or data type identifiers.

Queries with incorrect languages QIDs are:
- English/nouns/query_nouns.sparql
- ...

Queries with incorrect data type QIDs are:
- English/nouns/query_nouns.sparql  # i.e. a single file should be able to appear in both
- French/verbs/query_verbs_1.sparql
- ...

A code snippet that could help with this comes from #330:

def extract_qid_from_sparql(file_path: Path) -> str:
    """
    Extract the QID from the specified SPARQL file.
    Args:
        file_path (Path): Path to the SPARQL file.
    Returns:
        str | None: The extracted QID or None if not found.
    """
    try:
        with open(file_path, "r", encoding="utf-8") as file:
            content = file.read()
            # Use regex to find the QID (e.g., wd:Q34311)
            match = re.search(r"wd:Q\d+", content)
            if match:
                return match.group(0).replace("wd:", "")  # Return the found QID
    except Exception as _:
        pass
        # print(f"Error reading {file_path}: {e}")
    return None  # Return None if not found or an error occurs

Contribution

Happy to support, answer questions and review as needed!

CC @DeleMike and @catreedle :)

Metadata

Metadata

Labels

featureNew feature or requesthacktoberfestIncluded as a part of Hacktoberfesthelp wantedExtra attention is needed

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions