Add workflow to check queries

### Terms

- [X] I have searched [open and closed feature requests](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aissue+label%3Afeature)
- [X] I agree to follow Scribe-Data's [Code of Conduct](https://github.com/scribe-org/Scribe-Data/blob/main/.github/CODE_OF_CONDUCT.md)

### Description

This issue would create a new workflow in [.github/workflows](https://github.com/scribe-org/Scribe-Data/tree/main/.github/workflows) called `check_query_identifiers.yaml` that would call a Python script that would check all queries within the `language_data_extraction` directory to make sure that the identifiers used within them are appropriate. We can put these scripts in a new [.github/workflows](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data) directory called `check`. The scripts would be:

- `/src/scribe_data/check/check_query_identifiers.py` would check all queries in the `language_data_extraction` directory for two things
    - Is the language within `?lexeme dct:language wd:Q12345` in the query appropriate given the directory that it's in?
    - Given the data type of the query - i.e. nouns, verbs, etc - does the QID for this data type appear in `wikibase:lexicalCategory wd:Q12345`?

Queries that fail these conditions should be added to a list and shown to the user in an output of the script and thus the workflow. Something like:

```bash
There are queries that have incorrect language or data type identifiers.

Queries with incorrect languages QIDs are:
- English/nouns/query_nouns.sparql
- ...

Queries with incorrect data type QIDs are:
- English/nouns/query_nouns.sparql  # i.e. a single file should be able to appear in both
- French/verbs/query_verbs_1.sparql
- ...
```

A code snippet that could help with this comes from #330:

```py
def extract_qid_from_sparql(file_path: Path) -> str:
    """
    Extract the QID from the specified SPARQL file.
    Args:
        file_path (Path): Path to the SPARQL file.
    Returns:
        str | None: The extracted QID or None if not found.
    """
    try:
        with open(file_path, "r", encoding="utf-8") as file:
            content = file.read()
            # Use regex to find the QID (e.g., wd:Q34311)
            match = re.search(r"wd:Q\d+", content)
            if match:
                return match.group(0).replace("wd:", "")  # Return the found QID
    except Exception as _:
        pass
        # print(f"Error reading {file_path}: {e}")
    return None  # Return None if not found or an error occurs
```

### Contribution

Happy to support, answer questions and review as needed!

CC @DeleMike and @catreedle :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add workflow to check queries #339

Terms

Description

Contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add workflow to check queries #339

Description

Terms

Description

Contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions