Visual scraping for Scrapy
HTML Python JavaScript CSS Makefile Shell Batchfile
Latest commit 0cb65f0 Nov 10, 2017 @ruairif ruairif Merge pull request #828 from scrapinghub/develop
Fix missing file when downloading project. Fix issue with requirement…
Permalink
Failed to load latest commit information.
bin Add version number Sep 10, 2015
data/projects Add data directory for storing spider data and ignore all new data in it Feb 27, 2017
docker Update install instructions Feb 24, 2017
docs Clarify directories in install docs Mar 14, 2017
nginx Add input box to create project Jan 19, 2017
portia_server Fix missing file when downloading project. Fix issue with requirement… Nov 8, 2017
portiaui Merge branch 'master' into develop May 31, 2017
slybot Release Slybot 0.13.1 Jun 28, 2017
slyd Update requirements Oct 18, 2017
splash_utils Move interaction file to end of combined.js to fix page loading in sp… Mar 8, 2017
.dockerignore Update install instructions Feb 24, 2017
.drone.yml Fix drone build Mar 6, 2017
.editorconfig Convert app to ember cli format Feb 19, 2015
.gitattributes Add .gitattributes file to handle line endings on windows Oct 15, 2015
.gitignore Correctly handle XML. Add fallback when xml type can't be guessed Mar 6, 2017
.jshintrc Convert app to ember cli format Feb 19, 2015
.travis.yml Release Slybot 0.13.0 Jun 8, 2017
CHANGES Release Portia 2.0.8 Apr 20, 2017
Dockerfile Fixed the database migration issue. Mar 5, 2017
LICENSE Update LICENSE Mar 25, 2014
README.md Update README.md Feb 24, 2017
VERSION Release Portia 2.0.8 Apr 20, 2017
Vagrantfile Add step for migrating Django database in provisioning script Apr 25, 2017
portia.conf Fix running portia from vagrant Feb 24, 2017
provision.sh Ignore splash cleanup after building Nov 6, 2017

README.md

Portia

Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.

Try it out

To try Portia for free without needing to install anything sign up for an account at scrapinghub and you can use our hosted version.

Running Portia

The easiest way to run Portia is using Docker.

You can run Portia using docker by running:

docker run -v ~/portia_projects:/app/data/projects:rw -p 9001:9001 scrapinghub/portia

For more detailed instructions, and alternatives to using Docker, see the Installation docs.

Documentation

Documentation can be found here. Source files can be found in the docs directory.