-
Notifications
You must be signed in to change notification settings - Fork 87
Closed
Labels
featureNew feature or requestNew feature or requesthacktoberfestIncluded as a part of HacktoberfestIncluded as a part of Hacktoberfesthelp wantedExtra attention is neededExtra attention is needed
Description
Terms
- I have searched open and closed feature requests
- I agree to follow Scribe-Data's Code of Conduct
Description
This issue would create a new workflow in .github/workflows called check_query_identifiers.yaml that would call a Python script that would check all queries within the language_data_extraction directory to make sure that the identifiers used within them are appropriate. We can put these scripts in a new .github/workflows directory called check. The scripts would be:
/src/scribe_data/check/check_query_identifiers.pywould check all queries in thelanguage_data_extractiondirectory for two things- Is the language within
?lexeme dct:language wd:Q12345in the query appropriate given the directory that it's in? - Given the data type of the query - i.e. nouns, verbs, etc - does the QID for this data type appear in
wikibase:lexicalCategory wd:Q12345?
- Is the language within
Queries that fail these conditions should be added to a list and shown to the user in an output of the script and thus the workflow. Something like:
There are queries that have incorrect language or data type identifiers.
Queries with incorrect languages QIDs are:
- English/nouns/query_nouns.sparql
- ...
Queries with incorrect data type QIDs are:
- English/nouns/query_nouns.sparql # i.e. a single file should be able to appear in both
- French/verbs/query_verbs_1.sparql
- ...A code snippet that could help with this comes from #330:
def extract_qid_from_sparql(file_path: Path) -> str:
"""
Extract the QID from the specified SPARQL file.
Args:
file_path (Path): Path to the SPARQL file.
Returns:
str | None: The extracted QID or None if not found.
"""
try:
with open(file_path, "r", encoding="utf-8") as file:
content = file.read()
# Use regex to find the QID (e.g., wd:Q34311)
match = re.search(r"wd:Q\d+", content)
if match:
return match.group(0).replace("wd:", "") # Return the found QID
except Exception as _:
pass
# print(f"Error reading {file_path}: {e}")
return None # Return None if not found or an error occursContribution
Happy to support, answer questions and review as needed!
CC @DeleMike and @catreedle :)
Metadata
Metadata
Assignees
Labels
featureNew feature or requestNew feature or requesthacktoberfestIncluded as a part of HacktoberfestIncluded as a part of Hacktoberfesthelp wantedExtra attention is neededExtra attention is needed
Type
Projects
Status
Done