Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitize arbitrary room display names before creating directories from them #26

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HarHarLinks
Copy link

I ran into this problem with a room name containing a /, but there may be more cases worth preventing. Stuff like simple .. is prevented by parts of the string being hardcoded.

There may be more and i'm still testing this, so WIP for now.

Feel free to modify the style of replaced characters to your liking. I like having a _ where another character would have been.

@CicadaCinema
Copy link
Contributor

This feels like a dangerous change to make because you are losing information in the process. I think it would be better to use something along the lines of base32 or base64 encoding (with some characters replaced in the case of base64).

@HarHarLinks
Copy link
Author

this is true, and I had considered this, but I went for this route because it seemed more important being able to browse the downloaded archive human readably. we could just write the unchanged room display name into an additional file?

@CicadaCinema
Copy link
Contributor

Whatever approach is taken, I would argue that there should be absolutely no confusion between the true name and a sanitised form of it. So if you're going to write to a new file for the room name, you shouldn't have an alternative version of the same name anywhere in your archive. It should be clear to the user where the true information lies.

@HarHarLinks
Copy link
Author

HarHarLinks commented Feb 8, 2021

more notes/points to discuss:

  • Windows/NTFS(?) does not support *, ? and : (probably more, should look up a list) in file/directory names.
    => keeping the room ID perfectly intact is not an option either.

  • emoji should get their alias in textform (using some lookup, there are probably libs for that) added on to enable better searching

  • the export should be restructured (unless the current flat structure makes more sense for some reason?): ./[chats/]room.display_name_roomid:homeserver/{messages/,avatars/,messages.json,metadata.txt}

@russelldavies
Copy link
Owner

@HarHarLinks Thanks for the contribution but I'm in agreement with @ColonisationCaptain that this would cause unwanted information loss. Rather than having to deal with filesystem peculiarities, I think a better approach is to use a content addressable storage approach. So the filename will be a hash of the file contents and the original filename will be stored in some metadata.

I don't see any advantage either of having separate directories anymore for avatars and other room media so I will likely combine them into a single directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants