New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved archive escaping of group and label in file/path names #1046

Merged
merged 4 commits into from Jan 17, 2017

Conversation

Projects
None yet
3 participants
@jlstevens
Member

jlstevens commented Jan 10, 2017

The PR aims to address issue #1021. I now know where to implement the escaping fix as illustrated in the first commit. We should discuss how to make this prettier/more robust (I don't like the look of the escaped output yet):

image

I suppose that instead of using '-sep-' I should see if Python offers proper escaping of filenames/paths on Unix/Windows. As I said, this PR is still WIP!

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 10, 2017

Shouldn't the escaping be done using a whitelist (opt in), not a blacklist (opt out)? I.e., shouldn't it be sanitizing by allowing only certain characters through, rather than specifically removing os.sep? Maybe I'm confused about the purpose.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 10, 2017

Shouldn't the escaping be done using a whitelist (opt in), not a blacklist (opt out)

I came to the same conclusion which is why I decided to sanitize in same style as the identifier sanitizer but only for specific characters that are problematic for filesystems.

Here is the updated example:

image

I think this PR is now ready to review.

@jlstevens jlstevens requested a review from philippjfr Jan 10, 2017

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 10, 2017

Looks fine to me, though I personally would rather have "," instead of "slash" and "backslash". To each his own.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 10, 2017

If you can suggest a similar replacement for : and a way to distinguish slash and backslash, I would be happy to change it! Apparently backtick is ASCII character...but I expect it introduces its own problems...

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 10, 2017

Perhaps we should not worry about 'bad' filenames and just address the specific issue mentioned in #1021 i.e slashes causing problems with paths. Maybe we should just sanitize slash to , and backslash to ` and leave the filenames alone otherwise - there are a lot of characters that can cause problems in some situations (one site I've seen recommends avoiding all of #<$+%>`&*'|{?"=}/:\@!)

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 10, 2017

I guess I can't think of any case where one would want to distinguish between slash and backslash, so I'm happy for those to map to the same thing. slash will come up for any mathematical expressions with division, which I think a lot of people will want to plot. It's hard to see how backslash would be useful in that context, and even harder to see how it would need to be distinguished from slash if it does somehow come up. So I'd favor being concise, which is very likely to be desirable, over distinguishing between cases I don't think would ever come up. In the same vein, I'm happy to map ":" to "=", as again I don't think those would need to be distinguished. But it's up to you; those are just my opinions and gut feelings, and you should go with your own on such ill-defined issues.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 10, 2017

In the end, I've opted to simply replace each of: , \ and / with an underscore _.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 10, 2017

The tests have passed so I think this PR is now ready to merge.

@@ -31,6 +31,15 @@
from .util import unique_iterator, group_sanitizer, label_sanitizer
def sanitizer(name, replacements={':':'_', '/':'_', '\\':'_'}):

This comment has been minimized.

@philippjfr

philippjfr Jan 13, 2017

Member

Might be worth sorting the replacement items so this is deterministic.

This comment has been minimized.

@jlstevens

jlstevens Jan 13, 2017

Member

Agreed. I'll use a list of tuples to make sure the order is fixed.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 16, 2017

I think this is now ready to merge once the tests pass.

Note that I didn't test the last commit directly. It is a simple change and I'm confident it is correct but it never hurts to double check!

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 16, 2017

May have to rebase or merge master to get the tests passing.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 16, 2017

The fact that the PR build passed but the push build didn't is confusing me. Seems to indicate that some interaction with master is making this misbehave.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 17, 2017

I've rebased against latest master. We'll see what happens now!

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 17, 2017

Looks good and tests now passing!

@philippjfr philippjfr merged commit 56deba7 into master Jan 17, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.4%) to 77.268%
Details
s3-reference-data-cache Test data is cached.
Details

@philippjfr philippjfr deleted the archive_escaping_fix branch Jan 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment