Sample code for turning a website into a json api using modern Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src/main
.gitignore
README.md
pom.xml

README.md

Java Webscraper API example

This project is an example of how one can use modern java libraries to quickly & easily create a web api for any existing 3rd party web page (Similar to what sites like kimonolabs and import.io do).

Specifically, we're going to create a web service that scrapes techmeme.com to serve up tech headlines as json from any point in time.

Libraries used

  • Immutables --- Used to create our models. By using a handful of very powerful annotations along with code generation, we'll be able to create immutable objects and builders to represent our data models.
  • Jsoup --- Used for retrieving html and parsing it. This is an older library that has stood the test of time. Simply pass in css selectors to get the relevant html sections needed.
  • Pippo --- Used as our web framework. This is a relatively new web framework for java that combines a very simple interface with a minimal footprint and a high degree of customizability. Reminds me of a Dropwizard with a simpler interface.
  • Java 8 --- We'll be making use of Java 8 streams and optionals to process the incoming data.

Setup Instructions

  1. Be sure to have java 8 & maven installed
  2. Compile the source code: mvn package
  3. Run the server: java -jar target/apiweb-1.0-SNAPSHOT.jar
  4. Make a request to the server (I've set the server port to 8081 in application.properties).
    • Let's get the tech headlines form new years day in 2015: http://localhost:8081/headlines?date=2015-01-01
    • Let's get the tech headlines for today: http://localhost:8081/headlines

Sample output:

Request: http://localhost:8081/headlines?date=2015-01-01

[
  {
    reporter: "Sarah Frier",
    source: "Bloomberg",
    title: "Snapchat raises $485.6M at $10B+ valuation from 23 investors",
    summary: "  —  Snapchat Raises $485.6 Million to Close Out Big Fundraising Year  —  Snapchat Inc., among a pack of elite technology startups that has attained a valuation of $10 billion or more, capped the year with a filing that disclosed it raised $485.6 million.",
    url: "http://www.bloomberg.com/news/2015-01-01/snapchat-raises-485-6-million-to-close-out-big-fundraising-year.html"
  },
  {
    reporter: "William Turton",
    source: "The Daily Dot",
    title: "U.K. police allegedly arrest Lizard Squad hacker",
    summary: "… Lizard Squad took credit for the Dec. 25 distributed denial-of-service (DDoS) attacks against the PlayStation Network and Xbox Live.  DDoS attacks overwhelm a network with too much traffic, leaving targeted networks inaccessible for legitimate users.",
    url: "http://www.dailydot.com/crime/lizard-squad-vinnie-omari-arrested/"
  }
]