Udacity Nanodegree Capstone Project - Sparkify

Installation

You will need Pyspark SQL and Pyspark MLPython, also Pandas, Matplotlib, Seaborn. The code should run with no issues using Python versions 3.*.

Project Motivation

This is Udacity's Capstone Project, using spark to analyze user behavior data from music app Sparkify. The main goal is to predict churns based on user log data(a tiny subset (128MB) of the full dataset available (12GB)) from a music app. The log contains some basic information about the user as well as information about a single action. A user can contain many entries. In the data provided, a part of the user is churned, that can be distinguished through the cancellation of the account.

File Descriptions

Sprakify.ipynb: main file of the project, a jupyter notebook contains of exploratory data analysis, feature engineering and modeling to predict churns.

, and which is exported into Sparkify Project.html.

Models

The follows models are used for classification of users: Logistic Regression, Decision Trees, Gradient Boosted Trees

Result

The data provided is the user log of the service, having demographic info, user activities, timestamps and etc. We analyze the log and build a model to identify customers who are highly likely to quit using our service, and thus, send marketing offers to them to prevent them from churning. We use f1-score to measure of model performance because we need precision and recall at the same time as we don't want to miss too many customers who are likely to churn whilst we don't want to waste too much on those who are not likely to churn. The model we built has a f1-score of 0.85 for training dataset and 0.8 for test dataset. You can find an article about this project posted here.

Acknowledgement

Must give credit to Udacity for the project.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
Sparkify-ML-pipeline.ipynb		Sparkify-ML-pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Nanodegree Capstone Project - Sparkify

Table of Contents

Installation

Project Motivation

File Descriptions

Models

Result

Acknowledgement

About

Releases

Packages

Languages

stan-git-369/Sparkify

Folders and files

Latest commit

History

Repository files navigation

Udacity Nanodegree Capstone Project - Sparkify

Table of Contents

Installation

Project Motivation

File Descriptions

Models

Result

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages