Skip to content
Scrape pinboard users/ links
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
pinscrapy
.gitignore
README.md
main.py
scrapy.cfg

README.md

pinscrapy

A crawler that scrapes Pinboard bookmarks recursively. The algorithm works as follows:

  1. It starts with a user and scrapes all bookmarks
  2. For each bookmark, identified by a url_slug, it finds all users that have also saved the same bookmark.
  3. For each user from step 2, it repeats the process (go to 1, ...)

The output is a item that contains information for each bookmark and a second one that contains the list of users who pinned each of those bookmarks. This item is stored either locally or on S3 as a json or parque file or in a MongoDB collection.

You can’t perform that action at this time.