Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: checksum collision if there is dos2unix in the pipeline #992

Closed
efiop opened this issue Aug 9, 2018 · 6 comments
Closed

cache: checksum collision if there is dos2unix in the pipeline #992

efiop opened this issue Aug 9, 2018 · 6 comments
Labels
bug Did we break something? p3-nice-to-have It should be done this or next sprint
Milestone

Comments

@efiop
Copy link
Contributor

efiop commented Aug 9, 2018

As @shcheklein noticed, if user will have dvc add data && dvc run -d data -o data.unix dos2unix data data.unix in his pipeline, we will get a cache collision since we use dos2unix to compute checksum in the first place.

@efiop efiop added this to the 1.0.0 milestone Aug 9, 2018
@efiop efiop self-assigned this Aug 9, 2018
@ghost
Copy link

ghost commented Feb 4, 2019

@efiop , @shcheklein , should we this to our documentation? or is there any other way to address this issue?

@shcheklein
Copy link
Member

@MrOutis I think the idea was that we'll migrate to a regular md5 calculation as part of the 1.0 release since it requires breaking the compatibility. May be we can migrate to sha1 as was discussed in #1568 along the way. What do think @efiop ?

@shcheklein
Copy link
Member

It is also related to this one iterative/dvc.org#68

@efiop
Copy link
Contributor Author

efiop commented Feb 4, 2019

@shcheklein well, the problem with crlf would arise in any checksum, so it won't be solved by migrating to sha1 or any other algorithm. Most likely, we'll simply need to introduce an additional suffix that would mark that that file is dos2unix one. Need to take a closer look at it sometime.

@MrOutis As @shcheklein noticed, we already have an issue for it at iterative/dvc.org#68 , so we might eventually get to writing a doc for it in the future.

@shcheklein
Copy link
Member

@efiop yep, I understand the crlf issue. My point was if we go with this change (switch to a regular md5) and decide to break compatibility, maybe it makes sense to migrate to sha1 or whatnot along the way.

@efiop
Copy link
Contributor Author

efiop commented Oct 5, 2020

Closing in favor of #4658

@efiop efiop closed this as completed Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

2 participants