Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.dvcignore is broken on negation when blacklisting all #4103

Closed
skshetry opened this issue Jun 24, 2020 · 5 comments · Fixed by #4166
Closed

.dvcignore is broken on negation when blacklisting all #4103

skshetry opened this issue Jun 24, 2020 · 5 comments · Fixed by #4166
Labels
bug Did we break something? p2-medium Medium priority, should be done, but less important research

Comments

@skshetry
Copy link
Member

Bug Report

Please provide information about your setup

I am not sure how far this extends to, but a lot of trials for .dvcignore failed when I blacklisted all and tried to whitelist some:

*
!scripts
*
!/scripts
/*
!/scripts/
/*
!/scripts/*
/*
!/scripts/**
/*
!scripts/

What worked:

/*
!/scripts

Why do I feel suddenly the spelling of scripts is wrong? 😄

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jun 24, 2020
@efiop
Copy link
Member

efiop commented Jun 24, 2020

CC @pared is this expected?

@pared
Copy link
Contributor

pared commented Jun 24, 2020

Considering that we have been designing .dvcignore to comply with .gitignore, I will refer to original:
It seems to me that we have some disrepancies in behaviour between git and dvc. I prepared a script to illustrate that:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
dvc init --quiet

git commit -am "ble"

mkdir data
echo 1 >> data/1
echo 2 >> data/2
echo 3 >> data/3
echo not_ignored >> data/file
mkdir data/dir
echo not_ignored >> data/dir/file

echo 'dvci*' >> .gitignore 

echo unrelated >> dvci0

echo '*' > dvci1

echo '*' > dvci2
echo '!/dir' >> dvci2
echo '!/file' >> dvci2

echo '*' > dvci3
echo 'dir' >> dvci3
echo 'file' >> dvci3

echo '/*' > dvci4
echo '!/dir/' >> dvci4
echo '!/file/' >> dvci4

echo '/*' > dvci5
echo '!/dir/*' >> dvci5

echo '/*' > dvci6
echo '!/dir/**' >> dvci6

echo '/*' > dvci7
echo '!dir/' >> dvci7
echo '!file/' >> dvci7

echo '/*' > dvci8
echo '!/dir' >> dvci8
echo '!/file' >> dvci8


for i in {0..8}
do
	echo '### START ###'
	echo 'ignore content:'
	cat dvci$i
	echo '----------------'
	cp dvci$i data/.dvcignore
	echo 'dvc results:' 
	python -c "from dvc.repo import Repo;print(list(Repo().tree.walk_files('data')))"
	

	# git test
	echo ''
	echo 'git results:'
	cp dvci$i data/.gitignore
	res=()
	for value in "data/1" "data/2" "data/3" "data/dir/file" "data/file"
	do
		if [[ ! $(git check-ignore $value) ]]; then
			res+="'$value', "
		fi
	done
	echo "[${res[@]}]"

	echo '### STOP ###'
	echo ''
	echo ''
done

And the two differences occur in following situations:

ignore content:
/*
!/dir/
!/file/
----------------
dvc results:
[]

git results:
['data/dir/file', ]

and

ignore content:
/*
!dir/
!file/
----------------
dvc results:
[]

git results:
['data/dir/file', ]

I also compared the results from before latest dvc optimization, #3967 results were the same as for master.

So, in conclusion, I think its a bug.

@pared pared added the bug Did we break something? label Jun 24, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jun 24, 2020
@efiop efiop added p2-medium Medium priority, should be done, but less important research triage Needs to be triaged labels Jun 24, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jun 24, 2020
@karajan1001
Copy link
Contributor

I compared the result to #4120 and there is no difference.

@skshetry
Copy link
Member Author

@karajan1001, I think this issue is with the upstream library, as this issue looks similar to: cpburnz/python-pathspec#19, but I haven't looked closely to verify.

@karajan1001
Copy link
Contributor

karajan1001 commented Jun 28, 2020

@karajan1001, I think this issue is with the upstream library, as this issue looks similar to: cpburnz/python-path-specification#19, but I haven't looked closely to verify.

@skshetry I agreed.

I had looked into the PathSpec code and the following result is weird to me. It is very different from the logic I learned from the PathSpec code.

ignore content:
/*
!dir/
!file/


git results:
['data/dir/file', ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p2-medium Medium priority, should be done, but less important research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants