Skip to content

Created a movie recommendation system on Azure utilizing Spark SQL by analyzing the MovieLens dataset.

Notifications You must be signed in to change notification settings

sakethmukkanti/Movielens-Dataset-Analysis-Azure-Data-Engineering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Building a Movie Recommender System

This is a data engineering project for movie suggestions based on MovieLens raw dataset. It is built using below mentioned Azure services.

  1. Azure blob storage
  2. Azure data lake storage gen2
  3. Azure Data Factory
  4. Azure databricks
  5. Azure Synapse Analytics

The Architecture Diagram for this project is shown below -


I have used azure data factory as a orchestration tool for building and executing data pipeline. The main tasks involved are -
  1. Data cleaning using ADF's data flow by removing duplicate rows and null values and ingesting them to Azure data lake storage gen2 in parquet format.
  2. Data transformation in azure databricks by calculating Bayesian average ratings and top 5 tags for each movie using spark SQL.
  3. Data analysis and best movie by genre or rating calculations in Azure synapse analytics.



I have used the below mentioned resources in Azure portal for building this movie recommender project end-to-end.

  1. Key vault
  2. Synapse workspace
  3. Azure Databricks Service
  4. Data factory (V2)
  5. Storage account
  6. Storage account

About

Created a movie recommendation system on Azure utilizing Spark SQL by analyzing the MovieLens dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published