This is a project made to predict users who churn from the Sparkify platform.
From a log file of user activities, I did some analysis, treatments and tested some ML models to make this prediction
This project uses Python 3 and the following libraries:
datetime pandas pyspark.sql pyspark.ml
There is one files: Sparkify.ipynb - notebook with all the code and analysis made. mini_sparkify_event_data.json - json file with the logs of Sparkify usage.
The Random Forest model obtained the best result in comparison with the other models tested, but you can find more details at the post available here.
The data belongs to Udacity. Feel free to use the code here as you would like!
== End ==