New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfixes #10
Bugfixes #10
Conversation
secure_filename is used for a reason. I suggest adding an 'original_filename' field instead. |
|
The filename is provided by the client and is used when versions of the file are written to disk later. There is nothing stopping the client from sending a filename like "../hello.txt" and thus write the file to a different location. This can have some bad consequences. (See http://flask-russian-docs.readthedocs.io/ru/latest/patterns/fileuploads.html) It does sound a bit weird if Flask doesn't support Cyrillic filenames. One option is to not use the filename as base for written files, but then they become harder to work with later. Another is to transliterate the filename before calling secure_filename: from transliterate import detect_language, translit
language = detect_language(file.name)
transliterated = translit(file.name, language, reversed=True)
filename = secure_filename(transliterated) |
What do you think about replacing '\0' and '/' on *nix systems and '\' with other illegal characters on windows? And escaping special file names: '.'. '..', 'CON', ... |
I'm not if I understand you correctly, but I don't think doing the sanitisation ourselves is a good idea. |
I have found a library that can sanitize filenames: https://github.com/thombashi/pathvalidate. It is more permissive than |
Is there anything wrong with using both a transliterated + secured filename, as well as adding a new field containing the original filename? The UI may display the original name. To me this seems a bit cleaner. Something like this in def get_document(path, parent=None):
...
language = detect_language(path)
transliterated = translit(path, language, reversed=True)
doc.path = secure_filename(transliterated)
doc.original_path = path
... Or let 'path' be the original and have a 'secure_path' property we use when writing files to disk. |
Transliteration is not universal and depends on dictionaries for particular languages. Moreover, detecting the language using a file path could be not accurate in case of mixing latin chars with non-latin. I'll make |
better that 7z, add p7zip-rar dependency to extract rar archives
After all, I have to admit that your variant securing filenames is better. Now I have an issue with travis as it uses old version of Ubuntu. I`ll add mock script for pdfimage executable, it should fix tests. |
Great stuff! Thanks! |
Various bugfixes and improvements:
unzip
for ZIP archives as it can detect non-ascii filenames encoding