GitHub - gustavo-leandro/sparkify-project: A spark project from Udacity Data Science Nanodegree

Project Motivation

This is a project made to predict users who churn from the Sparkify platform.

From a log file of user activities, I did some analysis, treatments and tested some ML models to make this prediction

Installation

This project uses Python 3 and the following libraries:

datetime pandas pyspark.sql pyspark.ml

File Descriptions

There is one files: Sparkify.ipynb - notebook with all the code and analysis made. mini_sparkify_event_data.json - json file with the logs of Sparkify usage.

Results

The Random Forest model obtained the best result in comparison with the other models tested, but you can find more details at the post available here.

Licensing, Authors, Acknowledgements

The data belongs to Udacity. Feel free to use the code here as you would like!

== End ==

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sparkify.ipynb

Sparkify.ipynb

Repository files navigation

Table of Contents

Project Motivation

Installation

File Descriptions

Results

Licensing, Authors, Acknowledgements

About

Releases

Packages

Languages

gustavo-leandro/sparkify-project

Folders and files

Latest commit

History

README.md

README.md

Sparkify.ipynb

Sparkify.ipynb

Repository files navigation

Table of Contents

Project Motivation

Installation

File Descriptions

Results

Licensing, Authors, Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages