The examples in this repository are support to the Spark in Action, 2nd edition book by Jean-Georges Perrin and published by Manning. Find out more about the book on Manning's website.
Welcome to Spark in Action, 2nd edition, chapter 99. This chapter is about all the stuff that we'd love to have in the book, but we could not because it is already more than 600 pages.
This code is designed to work with Apache Spark v3.0.0.
Data quality labs are located in the dq
sub package.
This lab mixes machine learning and data quality to predict the revenues of a party of 40 people at a restaurant.
Located in the covid19 package.
The data being ingested for those labs is coming from the Center for Systems Science and Engineering (CSSE), part of the Whiting School of Engineering of Johns Hopkins University (JHU). The data is share on GitHub at https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data.
Simple data ingestion,
Located in the misc
package.
Bunch of stuff in progress, please ignore.
Lots of datasets in this repo, which will be cleaned soon!
Notes:
- This repository only contains Java examples.
Follow me on Twitter to get updates about the book and Apache Spark: @jgperrin. Join the book's community on Facebook or in Manning's live site.