-
Notifications
You must be signed in to change notification settings - Fork 74
Tweaks and improvements to file handling #601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add tests for extension extraction.
WalkthroughThe changes update error handling and utility logic in the codebase. In the file handling pipeline, an assertion that checked for non-empty temporary files is replaced with explicit conditional logic to raise a specific Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Utils
Caller->>Utils: extract_path_ext(path, default_ext)
Utils->>Utils: Parse URL to get path component
Utils->>Utils: Extract extension from path
alt Extension found
Utils->>Caller: Return extension (lowercase)
else No extension, default_ext provided
Utils->>Caller: Return default_ext (lowercase)
else No extension, no default_ext
Utils->>Caller: Raise ValueError
end
sequenceDiagram
participant FileHandler
participant FileSystem
FileHandler->>FileSystem: Write to temp file
FileHandler->>FileSystem: Check if temp file is non-empty
alt File is empty
FileHandler->>FileHandler: Raise InvalidFileException
else File is not empty
FileHandler->>FileSystem: Proceed with file handling
end
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🧰 Additional context used🧬 Code Graph Analysis (2)ricecooker/utils/pipeline/file_handler.py (1)
tests/utils/test_extensions.py (1)
🔇 Additional comments (13)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
marcellamaki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests all seem to align with my understanding of them. Added just a couple of comments just for clarity
|
|
||
| from ricecooker.utils.utils import extract_path_ext | ||
|
|
||
| # Tests generated by Claude Sonnet 3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is just the first time i'm fully registering it, but I think using this as a convention across our code bases for generated tests (even if we then review them) is would be a helpful change to make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time I've explicitly added it, at least partly because it's the first time I have used tests verbatim from LLM output, because they were good enough.
| assert ( | ||
| extract_path_ext("https://api.domain.org/data.json?id=123&token=abc") == "json" | ||
| ) | ||
| assert extract_path_ext("http://site.com/download.tar.gz?download=true") == "gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is genuinely me not knowing, but is this just a "gz" as the file extension, or is some part of the tar also required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I googled and it was not illuminating)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally the convention is that a TAR file by itself is not compressed, it is just an archive of files, so that would just be .tar if it is then also compressed you add the extension .gz. So if we were dealing with these types of files in ricecooker, we would use the new file pipeline to handle the .gz file extension, then spit out a .tar file that would be handled by another handler.
Until we do need to handle them though, I think this is mostly academic.
Summary
References
Issues discovered during learningequality/kolibri-library#19
Reviewer guidance
Read tests carefully and ensure they are doing what they assert - they were generated by AI, and while I have read through them to ensure the same, a second look is helpful.
Summary by CodeRabbit
Bug Fixes
Tests