This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).
There is an overview video on YouTube.
This project is a demo movie recommender application. This demo has been installed with approximately four thousand movies and 500,000 ratings. The ratings have been generated randomly. The purpose of this web application is to allow users to search for movies, rate movies, and receive recommendations for movies based on their ratings.
Start with Introduction to read more about this project.
You can import these notebooks into IBM Data Science Experience. I have occasionally experienced issues when trying to load from a URL. If that happens to you, try cloning or downloading this repo and importing the notebooks as files.
The overall architecture looks like this:
The technologies used in this demo are:
Core components (Web Application)
- Python flask application
- IBM Bluemix for hosting the web application and services
- IBM Cloudant NoSQL for storing movies, ratings, user accounts and recommendations
- IBM Datascience Experience (DSX) and Spark as a Service
Optional components (Hadoop Warehouse)
The core demo can run without these components.
- IBM Compose Redis for maintaining an Atomic Increment counter for ID fields for user accounts. Use this if you want integer user account ids rather than the guuids generated by Cloudant.
- IBM Message Hub for the web application to send a stream of ratings as they are entered by the user.
- IBM BigInsights on Cloud using spark streaming to ingest data from MessageHub and expose via Hive.
Click on the link below, then follow the instructions. Note that this step may take quite a long time (maybe 30 minutes).
- CAUTION: a python flask application instance with 128MB memory and an instance of Cloudant 'Lite' will get deployed - you may get charged for these services. Please check charges before deploying. Note that Redis, Message Hub and BigInsights do not get deployed by default. If you wish to deploy the solution these optional components, follow the instructions here
After deploying to Bluemix, you will need to create a new DSX project and import the notebooks. The notebook Step 07 is responsible for creating recommendations and saving them to Cloudant. You will not get recommendations until you have setup this notebook with your Cloudant credentials and run the notebook from DSX.
The screenshot below shows some movies being rated by a user.
The screenshot below shows movie recommendations provided by Spark machine learning.