Full text search for pinboard.in pins
Python HTML Shell CSS
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
crawlers
data
web
.gitignore
LICENSE
README.md
docker-compose.yml
requirements.txt
scrapy.cfg

README.md

pinboogle

Full text search for pinboard.in pins

Requirements

You should have docker and docker-compose installed. We built using python 3, but things may work on previous versions. You should be using virtualenv anyway.

Executing spider

To install scrapy python project dependencies:

$ pip install -r requirements.txt
$ scrapy crawl --logfile=data/spider.log --loglevel=INFO -o data/data.json -t json -a user=[PINBOARD_USERNAME] -a after=[NUMBER] pinboard

Where PINBOARD_USERNAME is the user name you are registered and NUMBER is from which timestamp you want to fetch pinboard links (use 1 to start from oldest).

Solr

After scraping data and adding it to data folder, start solr container with:

$ docker-compose up -d

Solr will start and precreate a core named 'pinboard' as default. Also ./data folder will be available inside the container at /var/data folder. If you want to explore the container, login with:

$ docker exec -ti pinboogle_solr_1 /bin/bash

You can also access the admin site at http://{DOCKER_HOST}:8983/solr

Importing data

With Solr container running, use the following command to create all the fields needed for Pinboard json we got from scraping task:

$ docker exec -ti pinboogle_solr_1 /var/data/schema_migration.sh

And the next command to import the json file:

$ docker exec -it --user=solr pinboogle_solr_1 bin/post -c pinboard /var/data/[JSON_FILE]

Search Frontend

Search interface is implemented with Flask and it's located on ./web folder.

When you run docker-compose up you will see that another container will be built. To access it, go to your browser at:

http://{DOCKER_HOST}:5000

Everything that you change on web folder will be reflected on container, if you change or add any dependency, rebuild the container with:

$ docker-compose down
$ docker-compose up -d --build

Disclaimer

We built this to play with scrapy, solr, python and docker. Pinboard has a paid subscription if you want to have full text search on your links, if you want high quality results, subscribe to it.