Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch all dependencies to pure python and release ArchiveBox pip package #177

Closed
5 tasks
pirate opened this issue Mar 14, 2019 · 5 comments
Closed
5 tasks
Labels
size: hard status: wip Work is in-progress / has already been partially completed touches: configuration touches: data/schema/architecture

Comments

@pirate
Copy link
Member

pirate commented Mar 14, 2019

I originally thought moving to Python-only dependencies would be intractable, but after some more research I now realize this is quite straightforward.

  • apt install curl -> pip install requests archivenow (requests docs, archivenow docs)
  • apt install wget -> pip install wpull pywb (wpull docs, pywb docs)
  • apt install git -> pip install GitPython (docs)
  • apt install youtube-dl -> pip install youtube-dl (docs)
  • apt install chromium-browser -> pip install pyppeteer (docs)

Then we wont need users to install any system dependencies anymore, and we can move to using only requirements.txt and setup.py to install ArchiveBox via pip.

@pirate pirate added status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: data/schema/architecture labels Mar 14, 2019
@pirate pirate changed the title Switch all dependencies to pure python and release pip version of ArchiveBox Switch all dependencies to pure python and release ArchiveBox pip package Mar 14, 2019
@pirate pirate pinned this issue Mar 14, 2019
@anarcat
Copy link

anarcat commented Mar 15, 2019

awesome, can't wait to see that one fly! :) let me know if you need help testing the stuff or get stuck.

@007
Copy link

007 commented Mar 15, 2019

Anything you're fetching with curl should be replaced with wget or vice versa, and that'll cut down on some dependencies in the pip translation.

@makew0rld
Copy link

wpull only officially supports Python 3.4 and 3.5, even now it seems. The most recent commit was in Oct. 2019, and the version on PyPI is still outdated. It's a cool tool, but I would not recommend using it, and it doesn't seem to be well maintained.

If you still want to use it anyway, you can install it from Git, and then use a Python dependency manager to only use Python 3.5 for it, but I would not recommend that.

Git install:

pip3 install git+https://github.com/ArchiveTeam/wpull.git@v2.0.3#egg=wpull

@pirate
Copy link
Member Author

pirate commented Aug 10, 2020

Yeah I looked at wpull recently and came to the same conclusion. Wget2 looks more promising than wpull.

I think I'm going to close this issue for now, as we start to expand the suite of archiving methods it's looking more and more like many of them will be node-based. Considering we already support pip install archivebox now to get the bulk of archivebox's functionality, and we offer all the methods out-of-the-box via docker, making everything python-only is no longer a priority.

@makew0rld
Copy link

The other issue I see with this is managing conflicting versions of Python dependencies for these tools. I would personally recommend Poetry for that, as it's popular and I've had great experiences with it, but whatever you choose, I still think is an important step. Apologies if you were already going to do this.

I also don't see the value in replacing git with a Python version.

@pirate pirate closed this as completed Aug 10, 2020
@pirate pirate unpinned this issue Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: hard status: wip Work is in-progress / has already been partially completed touches: configuration touches: data/schema/architecture
Projects
None yet
Development

No branches or pull requests

4 participants