Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooch paths don't work with GitHub Actions. #5267

Closed
mkcor opened this issue Mar 10, 2021 · 13 comments
Closed

Pooch paths don't work with GitHub Actions. #5267

mkcor opened this issue Mar 10, 2021 · 13 comments
Labels
🤖 type: Infrastructure CI, packaging, tools and automation

Comments

@mkcor
Copy link
Member

mkcor commented Mar 10, 2021

Hello @BierretA,

It definitely looks like an issue with Pooch via GitHub Actions (or vice versa). Tests run fine locally for me as well. Looking into the first failure (TestColorconv.test_lab2xyz), we can see that, over GitHub Actions, Pooch is attempting the following

Downloading file 'color/tests/data/lab_array_a_10.npy' from 'https://github.com/scikit-image/scikit-image/raw/main/skimage/color/tests/data/lab_array_a_10.npy' to '/home/runner/.cache/scikit-image/main'.

Unsurprisingly, the expected file doesn't live at https://github.com/scikit-image/scikit-image/raw/main/skimage/color/tests/data/ (yet) since skimage/color/tests/data/lab_array_a_10.npy ships with this PR!

I know I didn't run into any issue with Pooch when I added data files (#4939 merged on Sep 2, 2020). Actually, it is the first time (with this PR) we've added files in the data registry since we started running tests with GitHub Actions. Indeed, the time sequence goes like this:

I need to dig deeper to find out how Pooch builds paths, unless someone else at @scikit-image/core knows it right away. Thanks!

Originally posted by @mkcor in #5234 (comment)

@stefanv
Copy link
Member

stefanv commented Mar 10, 2021

I think this is to be expected; how could Pooch know where to find the data otherwise? The solution may be to make a single PR to add the data, and another to utilize the data.

@hmaarrfk
Copy link
Member

i suspect the file was changed, and the hash was left unchanged in the registry.

pooch should use the local version for builds installed in development mode.

@mkcor
Copy link
Member Author

mkcor commented Mar 10, 2021

I think this is to be expected; how could Pooch know where to find the data otherwise? The solution may be to make a single PR to add the data, and another to utilize the data.

Oh, right... I was trying to imagine something dynamic with branch names! 🤦‍♀️

@BierretA could you please submit a separate PR only with the new data files (and updated data registry)? Let me know if there's anything you need help with. Thanks!

@stefanv
Copy link
Member

stefanv commented Mar 10, 2021

Looks like @hmaarrfk has a better answer; thanks!

@hmaarrfk
Copy link
Member

can you confirm that the hash of the file in question matches?

@BierretA
Copy link
Contributor

Hello @hmaarrfk ,

The file color/tests/data/lab_array_a_10.npy was added with the PR. I added the sha256 hash in the registry data/_registry.py by hand. Is there another way to do it? I am not familiar with Pooch at all.
I just checked the hash in the registry and it matches the current file.

To me, it looks like Pooch used the local version of the file when the test was run locally, but not on Github Actions.

@mkcor : I can do the separate PR if it proves useful

@mkcor
Copy link
Member Author

mkcor commented Mar 15, 2021

can you confirm that the hash of the file in question matches?

@hmaarrfk the respective hashes of all data files (added with the PR) do match.

@BierretA you did the right thing -- there's just this data file, skimage/color/tests/data/lab_B_2.npy, which does not appear in the data registry...

I can do the separate PR if it proves useful

In my current understanding, this would resolve the following failures:

FAILED color/tests/test_colorconv.py::TestColorconv::test_lab2xyz - requests....
FAILED color/tests/test_colorconv.py::TestColorconv::test_luv2xyz - requests....
FAILED color/tests/test_colorconv.py::TestColorconv::test_xyz2lab - requests....
FAILED color/tests/test_colorconv.py::TestColorconv::test_xyz2luv - requests....

but not these:

FAILED data/tests/test_data.py::test_download_all_with_pooch - requests.excep...
FAILED io/tests/test_collection.py::TestImageCollection::test_custom_load_func_w_kwarg

pooch should use the local version for builds installed in development mode.

@hmaarrfk well, it looks for the data files on "main" where they will ultimately live, indeed (once the PR is merged). By 'should' do you mean it's Pooch's expected/default behaviour or should we configure it to ensure "it uses the local version for builds installed in development mode?"

Thanks!

@hmaarrfk
Copy link
Member

@mkcor I think the only test that should fail is the one marked with INSTALL_FROM_SDIST where tests, and everything is installed from the source distribution.

All others should be able to use the local builds.

If only the sdist test is failing, I would merge and move on.

@hmaarrfk
Copy link
Member

hmaarrfk commented Mar 17, 2021

Its also possible that the build procedure got changed more than I kept up with.

It used to be that the tests were run in editable mode installs by default. In those cases it was able to find the local versions of the files.

For

pip install . -vv

would require files to be pulled from github

pip install -e . -vv

would be able to pull the files from the local git repo. <- edit added the word local

I'm not sure which command is being used in the github actions.

It might be simplest to merge the files directly before this particular PR.

@mkcor
Copy link
Member Author

mkcor commented Mar 18, 2021

In my current understanding, this would resolve the following failures: [...] but not these: [...]

Confirming this: https://github.com/scikit-image/scikit-image/actions/runs/664335330

I think the only test that should fail is the one marked with INSTALL_FROM_SDIST where tests, and everything is installed from the source distribution. [...]
I'm not sure which command is being used in the github actions.

@hmaarrfk if not INSTALL_FROM_SDIST, it's the command you except that is used in GitHub Actions:

pip install -vv -e .;

Nonetheless, the tests (other than sdist) apparently pull the files from GitHub (not from the local repo), again:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://github.com/scikit-image/scikit-image/raw/main/skimage/color/tests/data/lab_array_a_10.npy
[...]
Downloading file 'color/tests/data/lab_array_a_10.npy' from 'https://github.com/scikit-image/scikit-image/raw/main/skimage/color/tests/data/lab_array_a_10.npy' to '/home/runner/.cache/scikit-image/main'.

from https://github.com/scikit-image/scikit-image/runs/2139138616 for example.

So, in PR #5276, I reverted the update of the data registry, including only the new data files. Once #5276 is merged, we can update the data registry and tests such as test_download_all_with_pooch will pass.

@stefanv
Copy link
Member

stefanv commented Mar 18, 2021

Thank you for investigating @mkcor. When we're done, it would be good to add a note about this process to the developer guide too.

@grlee77
Copy link
Contributor

grlee77 commented Apr 5, 2021

@mkcor, can this issue be closed or is there still something remaining to be resolved?

@grlee77 grlee77 added the 🤖 type: Infrastructure CI, packaging, tools and automation label Apr 5, 2021
@mkcor
Copy link
Member Author

mkcor commented Apr 5, 2021

@grlee77 it can be closed, right, thanks for keeping tabs! Closing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 type: Infrastructure CI, packaging, tools and automation
Projects
None yet
Development

No branches or pull requests

5 participants