Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas package not found #59

Closed
leehart opened this issue Aug 27, 2019 · 17 comments
Closed

Pandas package not found #59

leehart opened this issue Aug 27, 2019 · 17 comments

Comments

@leehart
Copy link
Collaborator

leehart commented Aug 27, 2019

The install-conda.sh script is failing for me [version c668a73], reporting:

conda.exceptions.ResolvePackageNotFound: 
  - pandas==0.25.0=py36hb3f55d8_0

which appears to be set in environment-pinned-linux.yml.

Have I forgotten a step?

@leehart leehart closed this as completed Aug 27, 2019
@leehart
Copy link
Collaborator Author

leehart commented Aug 27, 2019

The install script succeeds if I delete environment-pinned-linux.yml first, which regenerates the file with pandas=0.25.1=py36hb3f55d8_0 instead (amongst other changes).

@leehart leehart reopened this Aug 27, 2019
@leehart
Copy link
Collaborator Author

leehart commented Aug 27, 2019

I see in the README that the environment-pinned- configs are meant to be generated by the Travis CI, rather than via a PR from branch, e.g. https://github.com/malariagen/binder/compare/59-update-pandas-package

@leehart
Copy link
Collaborator Author

leehart commented Aug 28, 2019

This issue is also causing Travis CI on vector-ops to fail.
To complicate matters, we suspect environment.yml might have been removed accidentally.
I gather we might need to reinstate environment.yml and regenerate the pinned environments.
However, the current scheme is evidently quite fragile because certain pinned versions might become unavailable, e.g. pandas==0.25.0=py36hb3f55d8_0.
We could perhaps determine which packages guarantee semantic versioning and pin to allow non-API breaking minor upgrades. We could also perhaps automatically regenerate the pinned configs regularly to update minor version upgrades. Otherwise if install-conda.sh fails because of the pinned configs, it's quite a blocker!

@tnguyensanger
Copy link

However, the current scheme is evidently quite fragile because certain pinned versions might become unavailable

Would this imply that conda is be a poor choice for reproducibility in general? If package versions frequently disappear, what's the best way to ensure that everyone is using the same package version?

@leehart
Copy link
Collaborator Author

leehart commented Aug 28, 2019

@slejdops and I pondered some options, and we'll see what @alimanfoo reckons.
I guess the original assumption was that they wouldn't disappear, or if they did then we'd just regenerate the files, maybe.
And I guess the reason pandas==0.25.0=py36hb3f55d8_0 disappeared, and pandas=0.25.1=py36hb3f55d8_0 appeared instead, was because it's just a "patch" version upgrade (in semantic versioning terms https://semver.org), so there was no point maintaining it.
If any packages guarantee semantic versioning, then I wonder whether we could allow patch version changes, instead of requiring strict version pinning. Or, if we require strict version pinning, then we could mitigate this problem by regenerating the pinned files (and the pinned versions) automatically and regularly.
Are there any alternatives to using conda? I can't imagine us ditching it because of this.

@leehart
Copy link
Collaborator Author

leehart commented Aug 28, 2019

It occurs to me that even if we could figure out how to regenerate the pinned files regularly, and update the binder repo every time there was a package shift, this might in turn generate regular downstream chores, such as updating the vector-ops repo's binder submodule, re-running the install-conda.sh script, etc.
There's a value in keeping everything up to date, but also a value in not having to update regularly. My main concern here though would be trying to keep things stable and robust, and trying to avoid flaky fragile dependencies. Ideally all our cards shouldn't fall down when the folks at Pandas decide to release an upgrade from 0.25.0 to 0.25.1.

@leehart
Copy link
Collaborator Author

leehart commented Aug 29, 2019

Turns out the AWOL environment.yml was a red herring (docs need updating).
Plan:

  • I'll update the README wrt environment.yml and requirements-*.txt files
    - [ ] I'll regenerate the pinned files without the build strings (e.g. py36hb3f55d8_0) and then submit a PR for that as a temporary fix while we... [we're keeping the build strings]
  • investigate why pandas==0.25.0=py36hb3f55d8_0 is not available. Should it be? (Is there an error upstream?) Otherwise: what is the rationale for its absence? Then discuss how to proceed.

@alimanfoo
Copy link
Member

alimanfoo commented Aug 29, 2019

Hi @leehart, I found this documentation on conda-forge regarding fixing broken packages, which says that generally packages are never removed from the conda-forge channel, but sometimes the label for a package may be changed from "main" to "broken" if the package is found to be broken in some way. Labels are a way of organising packages within a channel, generally we never know about them because we're always using the default "main" label.

I also found that the conda-forge pandas-0.25.0-py36hb3f55d8_0.tar.bz2 package has indeed been moved to the broken label. So it must have been found to be broken in some way. Although I can't find any record of why/how it was broken.

So I think this means we can keep the build strings in the pinned environment files, because packages should stay in the conda-forge channel and only get moved if broken.

@alimanfoo
Copy link
Member

Btw here's all the pandas files currently on the conda-forge channel, again this shows that pandas-0.25.0-py36hb3f55d8_0.tar.bz2 has been labelled broken.

@alimanfoo
Copy link
Member

Also btw, conda env export --no-builds will export an environment without the build strings. We don't need that here if we're keeping build strings, but just in case anyone needs that for some other reason.

@leehart
Copy link
Collaborator Author

leehart commented Aug 29, 2019

Here's the issue conda-forge/pandas-feedstock#69

@leehart
Copy link
Collaborator Author

leehart commented Aug 29, 2019

Current plan:

As for things breaking whenever a dependency is labelled as broken, I think the current plan is still to regenerate the pinned files/version whenever something breaks, which still feels vulnerable, so it would be nice to cook up something to mitigate this, imo.

@alimanfoo
Copy link
Member

As for things breaking whenever a dependency is labelled as broken, I think the current plan is still to regenerate the pinned files/version whenever something breaks, which still feels vulnerable, so it would be nice to cook up something to mitigate this, imo.

I know what you mean, I keep going back and forth on this. For now I think we can just live with this and keep pinning to the full build string, and ideally we encourage @conda-forge to only move something to broken if it is seriously (i.e., unusably) broken, and that will be something that happens very quickly after a package is published, which means that (a) us pinning to a package that then gets moved to broken is very unlikely, and (b) if it does ever happen it's good that we find out about it.

@leehart
Copy link
Collaborator Author

leehart commented Aug 29, 2019

Is there a way to allow our systems to use broken versions regardless? I wonder if that's a way forward. Then if it breaks, it breaks, and it always was broken, in a sense, so nothing has changed, and we preserve reproducibility. If that's possible and desirable, we could maybe just emit a warning and carry on as normal.

@leehart
Copy link
Collaborator Author

leehart commented Aug 29, 2019

^ if only we could filter on the type of broken, because broken because of security should be avoided

@leehart
Copy link
Collaborator Author

leehart commented Sep 2, 2019

Closing because pandas 0.25.0 has since been unbroken, and we can now run install-conda.sh successfully.

@leehart leehart closed this as completed Sep 2, 2019
@alimanfoo
Copy link
Member

For future reference, found this: jupyterhub/repo2docker#731 - we are not the first people to have hit reproducibility problems due to conda-forge moving packages to the broken label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants