-
Notifications
You must be signed in to change notification settings - Fork 87
fix: refiine cli to resolve languages data fetch incosistency #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: refiine cli to resolve languages data fetch incosistency #330
Conversation
- Implemented checks for non-existent languages and data types in the total command. - Added informative error messages guiding users to update or set their language metadata. - Enhanced feedback for improved usability of the CLI.
Thank you for the pull request!The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you! Maintainer checklist
|
|
Hi @DeleMike, Just a couple of things to consider:
Let me know your thoughts when you get a chance. :) |
|
I was browsing through the issues and found one that might be related to this. #293 |
Yes, we might need to revisit it after it has been merged. Thanks @catreedle! concerning the ability to allow users (the developer) to update the language metadata is a good option IMHO. In Flutter and other systems, there's always a need to sometimes update your dependencies or refetch them( I will think of ways soon and drop them in this discussion. |
hmm... did not see that. One thing, this issue originated when the # when we want to run `scribe-data t -l Latin` and Latin doesn't exist in the metadata hence we cannot provide a QID for it
language_qid = get_qid_by_input(language)
data_type_qid = get_qid_by_input(data_type)
if not language_qid:
print(
"The specified language does not exist. Please update your language_metadata.json file by using:\n"
"`scribe-data update --metadata`\n"
"Alternatively, you can manually set it with:\n"
"`scribe-data set-metadata -lang [your_language] -qid [your_qid]`.\n\n"
"This will ensure that you can fetch the correct data."
)
returnWe will basically inform the user that the specified language does not exist and they should update the metadata file. We also terminate the whole process early with the return keyword so that we exit from the procedure gracefully after telling the user what to do. Output of running The specified language does not exist. Please update your language_metadata.json file by using:
`scribe-data update --metadata`
Alternatively, you can manually set it with:
`scribe-data set-metadata -lang [your_language] -qid [your_qid]`.
This will ensure that you can fetch the correct data. |
|
Ok @DeleMike, I've had a bit of time to look this over. Big thing is that this functionality is useful now, but shouldn't be in the future as we shouldn't be releasing Scribe-Data or merging in language querying files if the language isn't in the metadata file. @OmarAI2003 will be expanding the language metadata file in #293, so then this should be fine 🤔 I agree with @catreedle that we don't necessarily want the user to have control over these internal files (edit: I'm unsure on this now). There is definitely some value here though 😊 Specifically maybe we can change this so that it does the following:
How does this sound? Edit: there's even a more expansive suggestion for how to use this code here. Really interesting possibilities! |
|
|
||
| with LANGUAGE_METADATA_FILE.open("r", encoding="utf-8") as file: | ||
| language_metadata = json.load(file) | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These try-excepts are fine, but can we not set language_metadata = {"languages": []} at the end of them (and the same for data types). If these files are not being loaded in then we have serious problems. If there's not a test for it already - that both of them are read and are in a specific location - then we should write one!
CC @DeleMike and @catreedle for checking for this test :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the reaction here, @catreedle, do you want to check the tests to see if we have some for those two files being accessible, and if not you can open a PR with tests for it? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, allow me some time as I’m still getting familiar with testing. I'll check and get back to you later. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see, there aren’t any specific tests to verify whether both language_metadata.json and data_type_metadata.json are being read, but the tests depend on these files. Deleting one causes an error during test runs. I've written tests to check the accessibility of these files, but they won’t run if the files are missing. I’m unsure about adding these tests and could use some help with this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could put it in a new test file for this if it's still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess you could but wait for a final go-ahead :)
EDIT: We definitely need the test because before submitting this PR, I thought about it, "what if the files are not loaded ??".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have two minor tests just like you described, @catreedle :) Feel free to put them in an appropriate directory/file that's for the whole application. I'll move it if needed 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a PR here. Looking forward to the review. :)
src/scribe_data/cli/cli_utils.py
Outdated
| } | ||
|
|
||
|
|
||
| def get_available_languages() -> list[tuple[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we'd just return the languages for the check_language_metadata.py file :) There's a degree of hard coding here with the query_verbs.sparql that we should avoid.
Maybe there is some value with letting a Scribe-Data developer set the metadata file with a language and a QID of their choice though 🤔 They could also use the QID during testing, but then that has specific return functionality now that's getting all data types by default instead of what's in Scribe-Data already. This would allow us to do a simple command in the docs to update the metadata file, but we should make this clear that it's for development only. We don't want the end user to add a language-qid pair to the metadata as they wouldn't have the ability to write the queries (if say they install with PIP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I thought about it only returning language names but I felt we could do better or create like a "standard". i.e. We will have to ensure that we have query_verbs.sparql exists for all directories. And I've checked, the French directory hasquery_verbs_1.sparql and query_verbs_2sparql, so I guess it won't be helpful.
src/scribe_data/cli/cli_utils.py
Outdated
| return available_languages | ||
|
|
||
|
|
||
| def extract_qid_from_sparql(file_path: Path) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract_qid_from_sparql, check_and_update_languages and update_language_metadata wouldn't be needed as this would be covered by the check workflow on PRs that will assure that the metadata file is appropriate :) I'm still on the fence when it comes to set_metadata though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with why we should move them. Great point 😄
I thought that the set_metadata would be useful if, for some reason, we could not automatically update the metadata(that is, when it was still in use), then we can call scribe-data set-metadata -lang French -qid Q150 for example, and this updates our metadata file.
src/scribe_data/cli/main.py
Outdated
| ) | ||
| TOTAL_DESCRIPTION = "Check Wikidata for the total available data for the given languages and data types." | ||
| CONVERT_DESCRIPTION = "Convert data returned by Scribe-Data to different file types." | ||
| UPDATE_DESCRIPTION = "Update the metadata file with available languages and QIDs." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main would need the update functionality removed as it'll be covered by the new workflow on PRs :)
src/scribe_data/cli/total.py
Outdated
|
|
||
| if not language_qid: | ||
| print( | ||
| "The specified language does not exist. Please update your language_metadata.json file by using:\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to check these outputs once the updates are done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure! Thanks @andrewtavis !
src/scribe_data/cli/cli_utils.py
Outdated
| print(f"Error updating language metadata: {e}") | ||
|
|
||
|
|
||
| def set_metadata(language_name: str, qid: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do keep set_metadata, then we should include an iso argument :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @DeleMike, it'd be great if for those functions that are kept that you check some of the other docstrings and use similar formatting. As the function docstrings are written now, they won't be parsed correctly for the readthedocs documentation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! I did not know about this. will update :)
|
Specifically of interest to me now though, @DeleMike and @catreedle, and maybe something that you two would like to take on and maybe @DeleMike can take the lead: We could repurpose this code as well to have another check of the actual queries. This idea came from
We'd be able to use the language and data type metadata files for this. For the first test we're seeing if the first QID is the correct language via parsing Let me know how the above sounds! I think that the above serves to make use of the code here in intuitive ways so the efforts aren't wasted, and we get a great PR testing setup that checks the metadata files and queries to make sure that we're not introducing things that don't work 😊 |
This is very thoughtful. I'm willing to work/collaborate on it. 😊 |
|
Hey, @andrewtavis and @catreedle thanks folks for the review😄. I just wanted to drop my initial thoughts. From all I have read, we can generate a set of new issues from the works in this PR. I will be dropping my comments! |
|
I think we should definitely get this down to a point where we can merge and use the code from it as example codes within the issues we make 😊 Let me know if I should make the issues and tag you both! |
Lovely! I would love to improve this code. From the quote, I understand that:
|
yes, please. we would love to work on the issues created! :) |
|
Ok, let me know how you want to proceed here, @DeleMike :) I'm happy to clean this PR up with the code we want for now, and then also make the issues at the same time using the current functions as snippets. Let me know if you'd like to do any changes before - so I'll wait for a confirmation here and then finalize this one 😊 |
|
Yeah, true! or is there something I could quickly do so that it does not affect our codebase? the docstrings format issue? |
|
I can get to the docstrings really quick, @DeleMike :) No stress at all. Just wanted to note it for the next time! |
|
I'll do a quick fix of this one soon then and then make issues for the workflows 🚀 |
oh okay! noted 😄 |
I'll be waiting for them :) Thanks @andrewtavis for the support! and thank you @catreedle. Let's get ready!! 💪😄 |
|
alright, checking... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided against doing set metadata as well, @DeleMike, as I think we need to focus on making sure that the CLI options are for the end users and not for the development team. Thanks so much for these suggestions though! I think that the workflows are going to improve the project so much, and all of this was the catalyst 😊
That's okay :-) |
Contributor checklist
Description
This PR enhances the Scribe-Data CLI by validating user input for languages and data types. If a non-existent language is specified, the user will receive a clear message informing them that the language does not exist, along with instructions on how to update their language_metadata.json file using:
Alternatively, users can manually set the metadata with:
These enhancements aim to improve user experience and ensure accurate data retrieval in the Scribe-Data CLI.
Related issue
This issue is closely related to #295