Add binary file detection and large file handling#256
Conversation
Binary files (.so, .pyd, .dylib, etc.) inside wheel archives were being decoded as text without error, producing MB+ garbage strings that got fed into jsdiff's createTwoFilesPatch - extremely slow on large inputs. Fix: detect binary files by extension, check for null bytes in content, and cap text files at 512KB for diffing. https://claude.ai/code/session_01Ngz7FcMyqbhzvCoVH7xPjm
|
Claude finished @simonw's task —— View job PR Review Complete ✅I've thoroughly reviewed the pull request for binary file detection and large file handling improvements. Here's my comprehensive feedback: Todo List
Code Quality and Best Practices ✅Excellent implementation overall! The code follows JavaScript best practices:
Performance Considerations ✅Strong performance optimizations implemented:
Potential Issues Found
|
|
pypdfium2 no longer hangs. |
The prefix check used '[Large file' instead of '[Large file:' — functionally equivalent but imprecise. Match the actual marker format. https://claude.ai/code/session_01Ngz7FcMyqbhzvCoVH7xPjm
|
Claude finished @simonw's task —— View job PR Review Complete ✅I've thoroughly reviewed the pull request for binary file detection and large file handling improvements. Here's my comprehensive feedback: Todo List
Code Quality and Best Practices ✅Excellent implementation overall! The code follows JavaScript best practices:
Performance Considerations ✅Strong performance optimizations implemented:
Potential Issues Found
|
Summary
Enhanced the wheel contents viewer to properly detect and handle binary files and large text files, preventing them from being displayed in diffs which improves performance and readability.
Key Changes
BINARY_EXTENSIONSset containing common binary file extensions (.so, .pyd, .dll, .png, .zip, .whl, etc.)looksLikeBinary()function that checks for null bytes in the first 8KB of file content as a reliable binary indicatorisBinaryFilename()function to quickly identify binary files by extension before attempting to read themMAX_FILE_SIZE_FOR_DIFFconstant (512KB) to skip files exceeding this threshold, displaying a message with file size insteadstartsWith()checks instead of exact equality, allowing proper handling of the new "[Large file: XXkB]" formatImplementation Details
https://claude.ai/code/session_01Ngz7FcMyqbhzvCoVH7xPjm