Crawling on web pages (old and useless project)
OK, I'm adding a README to this repo for legacy purposes only. Hello, reader from the year <insert a year after 2015 here>! How is life in the th century? Are we all dead yet?
Anyway, this repo was intended to build a graph composed of web pages (nodes) and links from one another (edges). It relies on a Java Library (JSoup) to do the parsing, effectively killing a fly with a nuclear missile.
The program works, but gathers an impressive amount of data (most of it not interesting). To do in (if any) future releases: add filters.