Description

This repository covers the Capstone Project (Spark Project 'Sparkify') of the Udacity Nanodegree Course Data Science.

Blog Post

The Capstone Project is documented in this Medium Blog Post: https://medium.com/@t.mw/predicting-churn-with-apache-spark-and-pyspark-ml-429c3ad79670

Installation

Nothing to install, just run the Jupyter Notebook Sparkify.ipynb.

Libraries

The notebook imports and uses following libriaries:

numpy
pandas
pyspark.sql
pyspark.ml
scipy.stats
matplotlib.pyplot

Data

The Notebook uses the mini data set 'mini_sparkify_event_data.json' (size 128 MB) that comes with the Udacity workspace. The data is not contained in this repository, it is expected to be in the same folder as the notebook.

Project Motivation

The Spark Project was chosen as Capstone Project, because it's an opportunity to get to know Apache Spark and Big Data Machine Learning Methods that have not been coverd by the course so far.

File Descriptions

Jupiyter Notebook: Sparkify.ipynb
Readme
License

How to use

Provide dataset 'mini_sparkify_event_data.json'.
Run the Notebook and click through.

Licensing

This Repository and the Notebook are left under GNU General Public License v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Blog Post

Installation

Libraries

Data

Project Motivation

File Descriptions

How to use

Licensing

About

Releases

Packages

Languages

License

lugalbandaw/dsnd-4

Folders and files

Latest commit

History

Repository files navigation

Description

Blog Post

Installation

Libraries

Data

Project Motivation

File Descriptions

How to use

Licensing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages