Skip to content

lugalbandaw/dsnd-4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Description

This repository covers the Capstone Project (Spark Project 'Sparkify') of the Udacity Nanodegree Course Data Science.

Blog Post

The Capstone Project is documented in this Medium Blog Post: https://medium.com/@t.mw/predicting-churn-with-apache-spark-and-pyspark-ml-429c3ad79670

Installation

Nothing to install, just run the Jupyter Notebook Sparkify.ipynb.

Libraries

The notebook imports and uses following libriaries:

  • numpy
  • pandas
  • pyspark.sql
  • pyspark.ml
  • scipy.stats
  • matplotlib.pyplot

Data

The Notebook uses the mini data set 'mini_sparkify_event_data.json' (size 128 MB) that comes with the Udacity workspace. The data is not contained in this repository, it is expected to be in the same folder as the notebook.

Project Motivation

The Spark Project was chosen as Capstone Project, because it's an opportunity to get to know Apache Spark and Big Data Machine Learning Methods that have not been coverd by the course so far.

File Descriptions

  • Jupiyter Notebook: Sparkify.ipynb
  • Readme
  • License

How to use

  1. Provide dataset 'mini_sparkify_event_data.json'.
  2. Run the Notebook and click through.

Licensing

This Repository and the Notebook are left under GNU General Public License v3.0

About

Udacity Data Science Nanodegree - Capstone Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published