Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Sanitize paths (folder/directories) for downloads from plugins #212

Closed
Luro02 opened this issue Aug 12, 2021 · 10 comments · Fixed by #224 or #232
Closed

[Bug Report] Sanitize paths (folder/directories) for downloads from plugins #212

Luro02 opened this issue Aug 12, 2021 · 10 comments · Fixed by #224 or #232
Labels
bug Something isn't working

Comments

@Luro02
Copy link

Luro02 commented Aug 12, 2021

Describe the bug
I tried to download a manga through a plugin, and it worked, except that the created folder looks something like this:
"HOT\n\t\t\t\t Manga"

In theory, Linux does not have restrictions on the names of directories, but this causes problems with for example commands. (could not cd into the directory or remove it).

I was able to remove it, by using python, where I did something like this:

>>> import os
>>> import shutil
>>> os.listdir(".") # then copied the broken folder name
>>> shutil.rmtree("and pasted it in here")

This solved the issue for me, but I am familiar with programming and I do not think that everyone is able to do this. (most people will end up with folders they can not remove?)

I would suggest applying the following rules to the directory/file-names:

  • remove any kind of ASCII control character (0x00 to 0x1F, see ASCII Table)
  • remove the following printable characters from directories: " * : < > ? / \ | ~ # { }
  • strip all whitespace from the start and end of the directory/file-name
  • remove trailing dots (.)

Source 1
Source 2

I think some above-mentioned rules are already performed by your code.

I would apply the rules by replacing each forbidden character with a space and then applying this regex:
string.replace("\\s\\s+", " ")

@Luro02 Luro02 added the bug Something isn't working label Aug 12, 2021
@hkalexling
Copy link
Member

I agree and thanks for the suggestion! Let me work on it when I have the time, and PRs are always welcome.

@hkalexling
Copy link
Member

This is the filename pre-processing rule I ended up using, which replaces whitespaces, dots (.), slashes (/), and ASCII control characters with _.

https://github.com/hkalexling/Mango/blob/60a126024c8f4fa383fa08f5ecdcbfc843df6278/src/plugin/downloader.cr#L26-L30

I don't think the other special characters you listed matter too much, because when accessing the files/folders (e.g., through cd as in your example), you are supposed to put the paths in quotes anyway.

@Leeingnyo
Copy link
Member

It seems a little bit harsh(?) to replace all special characters to underscores. How about using https://github.com/szTheory/zaru_crystal instead? Hope there would be an option to turn off Windows filename filter.

@hkalexling
Copy link
Member

@Leeingnyo Sorry not sure I understand. Isn't the zaru shard even "harsher"? It sanitizes more special characters than we do, and it does so by simply removing them.

https://github.com/szTheory/zaru_crystal/blob/d4a08e1cc030a075b139a8c0b5ab4b8651450eae/src/zaru.cr#L37-L39

@Leeingnyo
Copy link
Member

Actually, I just want to keep whitespaces as whitespaces :) and dots as dots, too.

They remove \:*?"<>| because Windows don't allow them at filenames. trailing dots are also illegal in Windows.
I would suggest to:

  • trim whitespaces at start and end.
    • They are legal but not necessary, I think.
  • remove unprintable(?) ascii codes (\000-\01F)
    • \000 is illegal. others are fine (in Linux) but using them in a filename is not an ordinary situation.
  • remove /, or replace it
    • I think this character could be used in titles on the Internet like The Foo Bar Manga! [1/2], The Foo Bar Manga! [2/2]. An underscore could be an alternative
  • optional rules for Windows
    • remove \:*?"<>|
    • remove trailing dots

The option sanitize_windows_filename has a value nil, true, false. This is needed for the people who run the Mango on WSL linux like me :) I'm not sure that there is a way to detect an os in the Crystal.

  • nil: default. automatically detect its os
  • true: apply optional rules
  • false: only apply basic rules

or (It's getting complicated)

sanitize_filename_rules:
- filter-windows-illegal
- trim whitespaces at start and end
- collapse-whitespaces
- [other-optional-rule]

@hkalexling
Copy link
Member

hkalexling commented Sep 8, 2021

The problem with dots is that for example you can have .. as the manga title and Mango would save the chapters in the parent directory. This can be very dangerous. When the manga title is ., the chapters will be saved in the library folder directly and not inside a nested folder. Also we don't want chapters to have hidden filenames (e.g., .file.cbz) because Mango ignores hidden files

https://github.com/hkalexling/Mango/blob/8d84a3c50210ac37cb82ac45142810210fd05f55/src/library/title.cr#L30

For WSL, does the standard UNIX filename rules apply? Or does it use the Windows rules?

@hkalexling
Copy link
Member

But I guess we can keep whitespaces. Just need to trim and collapse them, and replace them with the standard whitespace ( )

@Leeingnyo
Copy link
Member

Yes I agree that we should avoid '.' and '..', and /^\./ in filename. I mean replacing all dots to underscores isn't necessary. For example, a title [No.1 Comics] You still my No.1.cbz could be preserved instead of translating to [No_1 Comics] You still my No_1.cbz.

the WSL follows UNIX's one in their environment, but the files can be accessed by Windows Explorer (seems as a broken file, causes some strange actions in Windows). Since I use symlinks NTFS drives for the Mango library running in WSL2, downloaded files by plugin can be easily accessed in Windows.

@hkalexling
Copy link
Member

@Leeingnyo Makes sense 👍 I will update the PR.

@mango-assistant
Copy link

Hi there! The issue has been fixed in v0.24.0. Thanks for the bug report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants