This tutorial shows how to combine Jsoup, Neo4j, Spring Data, and several other technologies.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
LICENSE
README.md
pom.xml

README.md

crawl4neo

This tutorial shows how to combine Jsoup, Neo4j, Spring Data, and several other technologies.

I developed a very similar system in the past. I needed a map of how all the pages linked on a 20,000 page site. But more important, I wanted to find the boilerplate html and exclude it for content extraction. The system normalized and stored all the dom trees to do it. The details of just how it works will unfold over time.

Please follow our ScrumBucket tutorial as we build out our project.