Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved archive escaping of group and label in file/path names #1046

Merged
merged 4 commits into from
Jan 17, 2017

Conversation

jlstevens
Copy link
Contributor

The PR aims to address issue #1021. I now know where to implement the escaping fix as illustrated in the first commit. We should discuss how to make this prettier/more robust (I don't like the look of the escaped output yet):

image

I suppose that instead of using '-sep-' I should see if Python offers proper escaping of filenames/paths on Unix/Windows. As I said, this PR is still WIP!

@jbednar
Copy link
Member

jbednar commented Jan 10, 2017

Shouldn't the escaping be done using a whitelist (opt in), not a blacklist (opt out)? I.e., shouldn't it be sanitizing by allowing only certain characters through, rather than specifically removing os.sep? Maybe I'm confused about the purpose.

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 10, 2017

Shouldn't the escaping be done using a whitelist (opt in), not a blacklist (opt out)

I came to the same conclusion which is why I decided to sanitize in same style as the identifier sanitizer but only for specific characters that are problematic for filesystems.

Here is the updated example:

image

I think this PR is now ready to review.

@jbednar
Copy link
Member

jbednar commented Jan 10, 2017

Looks fine to me, though I personally would rather have "," instead of "slash" and "backslash". To each his own.

@jlstevens
Copy link
Contributor Author

If you can suggest a similar replacement for : and a way to distinguish slash and backslash, I would be happy to change it! Apparently backtick is ASCII character...but I expect it introduces its own problems...

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 10, 2017

Perhaps we should not worry about 'bad' filenames and just address the specific issue mentioned in #1021 i.e slashes causing problems with paths. Maybe we should just sanitize slash to , and backslash to ` and leave the filenames alone otherwise - there are a lot of characters that can cause problems in some situations (one site I've seen recommends avoiding all of #<$+%>`&*'|{?"=}/:\@!)

@jbednar
Copy link
Member

jbednar commented Jan 10, 2017

I guess I can't think of any case where one would want to distinguish between slash and backslash, so I'm happy for those to map to the same thing. slash will come up for any mathematical expressions with division, which I think a lot of people will want to plot. It's hard to see how backslash would be useful in that context, and even harder to see how it would need to be distinguished from slash if it does somehow come up. So I'd favor being concise, which is very likely to be desirable, over distinguishing between cases I don't think would ever come up. In the same vein, I'm happy to map ":" to "=", as again I don't think those would need to be distinguished. But it's up to you; those are just my opinions and gut feelings, and you should go with your own on such ill-defined issues.

@jlstevens
Copy link
Contributor Author

In the end, I've opted to simply replace each of: , \ and / with an underscore _.

@jlstevens
Copy link
Contributor Author

The tests have passed so I think this PR is now ready to merge.

@@ -31,6 +31,15 @@
from .util import unique_iterator, group_sanitizer, label_sanitizer


def sanitizer(name, replacements={':':'_', '/':'_', '\\':'_'}):
Copy link
Member

@philippjfr philippjfr Jan 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth sorting the replacement items so this is deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll use a list of tuples to make sure the order is fixed.

@jlstevens
Copy link
Contributor Author

I think this is now ready to merge once the tests pass.

Note that I didn't test the last commit directly. It is a simple change and I'm confident it is correct but it never hurts to double check!

@philippjfr
Copy link
Member

May have to rebase or merge master to get the tests passing.

@philippjfr
Copy link
Member

The fact that the PR build passed but the push build didn't is confusing me. Seems to indicate that some interaction with master is making this misbehave.

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 17, 2017

I've rebased against latest master. We'll see what happens now!

@philippjfr
Copy link
Member

Looks good and tests now passing!

@philippjfr philippjfr merged commit 56deba7 into master Jan 17, 2017
@philippjfr philippjfr deleted the archive_escaping_fix branch January 27, 2017 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants