Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello from bookmark-archiver! #20

Closed
pirate opened this issue Nov 23, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@pirate
Copy link

commented Nov 23, 2018

Hi! I maintain https://github.com/pirate/bookmark-archiver, and I just learned that this project exists today from HN & the LWN post about archiving sites!

You have a lot of good ideas in this repo, very similar to how I've been planning to improve bookmark-archiver in the coming months:

  • django
  • dramatiq instead of celery
  • stable mysql db of archived sites with migrations
  • json/csv/xml output for the index
  • warc/html/pdf/screenshot/youtubedl/git output for sites
  • 1, 2, and 3-link deep crawling with https://github.com/internetarchive/brozzler

You're welcome to use any of the code from bookmark-archiver of course, and I may take inspiration from your repo as well for the UI and NLTK automatic tagging and summarization, we've had tickets open for that for a while.

Best of luck! Please hit me up on twitter: @theSquashSH if you ever want to chat or cooperate on stuff, I just added a link to reminiscence at the bottom of the BA readme.

P.S. I may meet up with the author of the LWN article in Montreal at some point, I'll talk to him as well about Reminiscence.

@kanishka-linux

This comment has been minimized.

Copy link
Owner

commented Nov 23, 2018

Hi, I've looked into bookmark-archiver, and it is a great tool for archiving already bookmarked collection from variety of services.

You're welcome to use any of the code from bookmark-archiver of course,

Sure! you're also free to use any code from reminiscence. The code related to tagging/summarization using NLTK is highly modular, and you can use it as it is in bookmark-archiver.

Please hit me up on twitter: @theSquashSH if you ever want to chat or cooperate on stuff

I do not use twitter, but I will certainly try to contact you via email in order to discuss various stuffs related to archiving. You've pretty good list for improving BA like warc, git, deep crawling etc..in which I'm also interested. I'm looking forward to have more interesting conversation with you on various archiving related topics. You're also free to contact me via email.

I just added a link to reminiscence at the bottom of the BA readme.

That's great! Thanks!

By the way is there any reason to use dramatiq instead of celery? Reminiscence has built-in task queue manager which is sufficient for regular users, but I think having celery is lot more reliable that's why option has been provided for using it.

@pirate

This comment has been minimized.

Copy link
Author

commented Nov 23, 2018

Great! Email is fine too of course, although my git email usually goes to spam, you can use kanishka-linux@sweeting.me.

Having used Celery at decent scale for several years, moving to dramatiq was a breath of fresh air, I've found it's much more reliable so far. Dramatiq has some guarantees making sure it never loses tasks that Celery would have routinely dropped, and I find it's easier to manage across multiple servers.

I'll close this issue to get it off your open issues list, but I look forward to chatting in the future!

@pirate pirate closed this Nov 23, 2018

@pirate

This comment has been minimized.

Copy link
Author

commented Nov 23, 2018

Oh btw you should definitely file a PR to get Reminiscence added to https://github.com/iipc/awesome-web-archiving @kanishka-linux

I just got off the phone with Mark Graham @ archive.org too, he's super friendly and I'm sure would love to discuss Reminiscence with you. You should reach out if you're interested in joining the Archive.org circle of folk who build archiving tools!

@kanishka-linux

This comment has been minimized.

Copy link
Owner

commented Nov 24, 2018

hey, thanks for the list. It is pretty useful.

I just got off the phone with Mark Graham @ archive.org too, he's super friendly and I'm sure would love to discuss Reminiscence with you. You should reach out if you're interested in joining the Archive.org circle of folk who build archiving tools!

That sounds great! Once I'll get some free time, I'll certainly try contacting them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.