Skip to content

rudacaya/sparkify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Sparkify: Churn Prediction Project

For subscription-based companies, customer churn is one of the most important metrics to follow. It is defined as the amount of customers that stop using the company's services. It is important to understand why a customer churn to find the best ways to avoid it and to identify customers whose churn probability is high to take actions to retain it. 

In this repo we will work with the fictitious streaming music company Sparkify, where, as in Spotify or Pandora, users can have a free account or a paid one. The main purpose is to predict when a customer will cancel it's subscription to know in advance to make prevent it.

Our job is to explore our user's data to predict when a user will churn to take actions to prevent it. For this we have to developed different supervised machine learning models. All of this using spark.

Main Files

  • The Notebook: Sparkify_Final.ipynb

Necessary Packages:

  • Python 3.6
  • Data Wrangling and cleaning libraries: PySpark, PySpark SQL, pandas, numpy
  • Data Visualization: matplotlib
  • ML library: PySparkML
  • Jupyter Lab

References:

https://www.udacity.com/course/data-scientist-nanodegree--nd025

Medium Article:

https://rdcastillo.medium.com/sparkify-churn-prediction-with-spark-e82bc87b738e

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published