Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an installation package for conda package manager #120

Closed
gvyshnya opened this issue Jul 24, 2017 · 36 comments
Closed

Create an installation package for conda package manager #120

gvyshnya opened this issue Jul 24, 2017 · 36 comments
Assignees
Labels
enhancement Enhances DVC good first issue hacktoberfest help wanted p2-medium Medium priority, should be done, but less important

Comments

@gvyshnya
Copy link
Contributor

Anaconda (https://www.continuum.io/what-is-anaconda) is the leading Python distribution for data science today. It has its internal package manager - conda (https://conda.io/docs/index.html), which is a rival to a well-known pip.

Since Anaconda as well as its python-only lightweight version of Miniconda (https://conda.io/miniconda.html) are getting more and more tracking within Data Science community these days, porting DVC installer to conda may become a good step to streamline DVC usage across industrial analytical circles.

@gvyshnya gvyshnya added the enhancement Enhances DVC label Jul 24, 2017
@efiop
Copy link
Contributor

efiop commented Jul 30, 2017

Hi @gvyshnya !

Thank you for your feedback! Anaconda actually was our first guess when we were developing installers for dvc(you can actually see traces of it in git log), but considering that dvc is currently more of a standalone utility, we actually opted in favor of pyinstaller to create a standalone binary for dvc and distribute it in usual packages(rpm,dev,exe), and pip to distribute it as a python package. That being said, we actually were thinking of creating anaconda/miniconda package in the future, when dvc will be more fit to be used as a library. We can now see that there is a clear demand for it and will try to deliver it in the near future.

@Casyfill
Copy link

Casyfill commented Apr 7, 2018

looking forward to conda support!

@efiop efiop self-assigned this Apr 7, 2018
@efiop efiop added this to the dvc-9.6 milestone Apr 7, 2018
@efiop efiop modified the milestones: dvc-9.6, dvc-9.7 Apr 15, 2018
@efiop
Copy link
Contributor

efiop commented Apr 15, 2018

Fixed 79d7100 the issue with download_url/url fields in our package info that didn't allow me to use conda skeleton pypi dvc on 0.9.5. This fix will be released in 0.9.6 and I'll be sure to get back to creating conda package right after 0.9.6 is published on pypi.

@efiop
Copy link
Contributor

efiop commented May 15, 2018

Creating a conda package for dvc requires creating packages for all dependencies, as meta.yaml doesn't support pip dependencies for conda packages, only for environments. Thus making creating conda package for dvc time-consuming and tedious. If anyone from the community feels like working on it, please feel free to do so. For now, considering that we provide (among others) a pip package, which can be specified in conda env as a dependency, I don't see a real need in creating conda package right now and might revisit this issue in releases after 0.9.7.

@efiop efiop removed this from the dvc-0.9.7 milestone May 15, 2018
@ghost ghost added the hacktoberfest label Oct 17, 2018
@efiop
Copy link
Contributor

efiop commented Feb 25, 2019

Closing as stale. Please feel free to reopen if you feel like working on this.

@efiop efiop closed this as completed Feb 25, 2019
@yfarjoun
Copy link
Contributor

Conda seems to have better support for creating identical and consistent environments on different platforms. For example, my development env is OSX (my laptop) but production is Ubuntu linux. I need to make sure that there are no differences in the packages installed on the two environments and that I am able to easily spin up a new machine with the same packages...

@tfenne
Copy link

tfenne commented Mar 18, 2019

I agree with @yfarjoun. There are a few reasons why it would be really nice to have a recipe for dvc in one of the main conda channels:

  1. Convenience. In my projects I prefer to create reproducible environments with conda. While one can obviously install packages using pip into an environment created by conda that's both significantly less convenient (and more awkward to automate) and makes it much harder to generate reproducible environments.
  2. Reproducibility. Conda was, as I understand it, largely invented because the existing package management solutions in python space (including pip) did not provide ways to make fully reproducible environments. Conda now includes many non-python packages, and is largely the default way to install native (e.g. C, C++) bioinformatics packages as well as python installations and packages. It is much tougher to make a reproducible environment where conda does 90% of the setup and pip then installs packages. This is particularly difficult when the pip packages drag in a lot of dependencies and some of those are shared with packages already installed via conda. Since running pip dvc[s3] in a bare environment installs 38 packages, that's quite challenging.
  3. Irony? Sorry if this is too tongue-in-cheek, but it just seems ironic to me that a package whose goals are to provide reproducibility in data science is installed in ways that make reproducibility of the installation difficult!

@efiop
Copy link
Contributor

efiop commented Mar 18, 2019

@yfarjoun @tfenne Thank you guys for all the feedback! We really appreciate it! Reopening this issue 🙂

Guys, btw, could you elaborate on why is using

dependencies:
  - pip:
    - dvc==0.32.1

in your conda env not reproducible?

@tfenne
Copy link

tfenne commented Mar 18, 2019

Thanks @efiop. This is essentially the strategy I'm using, but it's a bit more complicated than that. What that section actually ends up looking like is more like this:

  - pip:
    - appdirs==1.4.3
    - asciimatics==1.10.0
    - boto3==1.7.4
    - botocore==1.10.84
    - chardet==3.0.4
    - colorama==0.4.1
    - configobj==5.0.6
    - configparser==3.7.3
    - contextlib2==0.5.5
    - decorator==4.4.0
    - distro==1.4.0
    - docutils==0.14
    - dvc==0.32.1
    - future==0.17.1
    - gitdb2==2.0.5
    - gitpython==2.1.11
    - grandalf==0.6
    - idna==2.8
    - jmespath==0.9.4
    - jsonpath-rw==1.4.0
    - msgpack==0.6.0
    - nanotime==0.5.2
    - networkx==2.2
    - ply==3.11
    - pyasn1==0.4.5
    - pyfiglet==0.8.post1
    - requests==2.21.0
    - s3transfer==0.1.13
    - schema==0.7.0
    - smmap2==2.0.5
    - urllib3==1.24.1
    - wcwidth==0.1.7
    - zc.lockfile==1.4

... because without pinning the versions of all the dependencies, it's hard to guarantee reproducibility. Currently this is working because where dvc requires a package that is previously installed by conda (in my env) the version that's installed satisfies the requirement. But if it required an earlier or later version that would start to be difficult to manage.

@efiop efiop reopened this Mar 18, 2019
@J0
Copy link
Contributor

J0 commented Mar 22, 2019

@efiop just curious, is anyone actively working on this issue? If not, it seems like something I wouldn't mind working on over the next week.

@efiop
Copy link
Contributor

efiop commented Mar 22, 2019

@J0 That would be amazing! 🙂 No, no one is working on it right now. Thank you so much for looking into this!

@shcheklein shcheklein assigned J0 and unassigned efiop Mar 22, 2019
@efiop efiop added the p2-medium Medium priority, should be done, but less important label Mar 25, 2019
@brbarkley
Copy link
Contributor

FYI, the outstanding DVC dependencies that do not have a conda build are:

  • grandalf
  • inflect
  • jsonpath-ng
  • nanotime
  • schema
  • treelib

For DVC to provide a conda build, I believe the above packages will also need a conda build. See contributing packages guidelines on conda-forge. The process for porting a PyPi package to conda-forge is becoming increasingly streamlined but still not a trivial task.

I would like to see DVC on conda but currently do not have the time to assist on this issue.

@ei-grad
Copy link
Contributor

ei-grad commented May 17, 2019

Started to work on this. A basic meta.yaml for dvc is here - https://github.com/ei-grad/staged-recipes/blob/dvc/recipes/dvc/meta.yaml.

About dependencies:

@ei-grad: It is a bit unclear, if I want to add a package with dependencies which are not already on conda-forge, should I put this dependencies in the same pull-request with the package I want to add? Or should it be a separate PR for each dependency?
@chrisburr: Both will work but you should consider:
If the recipes are complex a separate PR will be easier to review
If you do it in one PR the first feedstock build will fail due to missing dependencies so you'll have to restart it ~an hour later
Multiple PRs can take longer to get reviewed

I guess it is better to put them in the same PR with the DVC.

Btw, @brbarkley could you please share how did you get the list of outstanding dependencies?

@ei-grad ei-grad self-assigned this May 17, 2019
@efiop
Copy link
Contributor

efiop commented May 21, 2019

looks like there's already a version on conda cloud: https://anaconda.org/derickl/dvc 👀

that guy seems to have packaged everything that is needed. Including grandalf https://anaconda.org/derickl/grandalf .

@ghost
Copy link

ghost commented May 21, 2019

@efiop , those are outside conda-forge (don't know if this is like the official distribution or something)

@ryokugyu
Copy link

@J0 That would be amazing! 🙂 No, no one is working on it right now. Thank you so much for looking into this!

Any update on it? @J0

@efiop
Copy link
Contributor

efiop commented May 22, 2019

Hi @derickl ! We've found your conda package for dvc and we were wondering if you would be willing to contribute your scripts to create an official dvc repo, that we could help maintaining and keeping up-to-date?

@PeterFogh
Copy link
Contributor

PeterFogh commented May 22, 2019

Thanks to all of you working on this. It world be awesome to have a Conda dvc package, as I mainly use conda as package manager. However, I prefer if it is possible to have the dvc package in the main or conda-force channel.

@GildedHonour
Copy link
Contributor

Help is needed on this, right? Whom can I discuss that with?

@yfarjoun
Copy link
Contributor

I'm happy to talk as a user.

@shcheklein
Copy link
Member

@GildedHonour we actually have a guy who is looking into this right now. Are you interested in helping us for this specific task or just want to be involved and help DVC in general? Would be happy to discuss and find more stuff where we need more hands :)

@GildedHonour
Copy link
Contributor

@shcheklein in general too. Yes, let's discuss.

@shcheklein
Copy link
Member

@GildedHonour Alex, can you find me and/or Ruslan on dvc.org/chat (ivan and ruslan)? would be happy to chat.

@GildedHonour
Copy link
Contributor

@shcheklein just done

@efiop
Copy link
Contributor

efiop commented Jul 29, 2019

conda-forge/staged-recipes#8963 was merged. Dvc should be available throug conda-forge now https://github.com/conda-forge/dvc-feedstock , unless I'm missing something. Big thanks to @Maxris 🎉

@efiop efiop closed this as completed Jul 29, 2019
@maxhora
Copy link
Contributor

maxhora commented Jul 29, 2019

@efiop unfortunately, dvc package will be uploaded to conda-forge channel once we will have 1st successful ci build in feedstock repo's master https://github.com/conda-forge/dvc-feedstock/commits/master ( so far the build was failed because of others not yet uploaded dependencies ).

Another important thing is that only Python 2.7 and 3.6 builds are enabled for dvc feedstock. To enable Python 3.7 builds it will be needed to remove restriction from there https://github.com/conda-forge/dvc-feedstock/blob/master/recipe/meta.yaml#L14 , but before we can do that it's required to bring Python 3.7 based builds for all DVC's dependencies.

@efiop
Copy link
Contributor

efiop commented Jul 29, 2019

@Maxris Thanks for the clarification! Let's keep this open for now then.

@efiop efiop reopened this Jul 29, 2019
@efiop efiop assigned maxhora and unassigned ei-grad Jul 29, 2019
@maxhora
Copy link
Contributor

maxhora commented Jul 29, 2019

DVC 0.53.2 for Python 2.7 and 3.6 is available through conda-forge now!

conda install -c conda-forge dvc

@maxhora
Copy link
Contributor

maxhora commented Jul 31, 2019

Python 3.7 build of dvc is available now!

Odd thing is that on Windows 10 I'm receiving following error when trying to run installed dvc from conda-forge:

Fatal error in launcher: Unable to create process using '"c:\bld\dvc_1564563047081\_h_env\python.exe"  "C:\Users\max\Miniconda3\Scripts\dvc.exe" '

Will try to investigate this more.

@maxhora
Copy link
Contributor

maxhora commented Aug 10, 2019

Finally, dvc 0.54.1 build 1 with all extra deps is available in conda-forge

@shcheklein
Copy link
Member

@Maxris awesome stuff! Thanks. The only thing is the doc on how do we support/update it in the future before we close this ticket (finally).

@shcheklein
Copy link
Member

k, thanks, @Maxris, we have all the docs ready now - https://github.com/iterative/dvc/wiki/Maintenance-of-Anaconda-package-in-conda-forge-channel

@efiop please, take a look and let's update our release check list to include a step to upgrade requirements is necessary.

I think we are ready to close this issue at last 🎉

@efiop
Copy link
Contributor

efiop commented Aug 12, 2019

@shcheklein
Copy link
Member

thanks @efiop 🙏 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC good first issue hacktoberfest help wanted p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests