This is a Github repository created to submit the fourth Homework of the Algorithmic Methods for Data Mining (ADM) course for the MSc. in Data Science at the Sapienza University of Rome.
-
README.md
: A markdown file that explains the content of the repository. -
main.ipynb
: A Jupyter Notebook file containing all the relevant exercises and reports belonging to the homework questions, the Command Line Question, and the Algorithmic Question. -
modules/
: A folder including 4 Python modules used to solve the exercises inmain.ipynb
. The files included are:-
__init__.py
: A init file that allows us to import the modules into our Jupyter Notebook. -
data_handler.py
: A Python file including aDataHandler
class designed to handle data cleaning and feature engineering on Kaggle's Netflix Clicks Dataset. -
recommender.py
: A Python file including aRecommender
class designed to build a Recommendation Engine with LSH using user data obtained from Kaggle's Netflix Clicks Dataset. -
cluster.py
: A Python file including three classes:FAMD
,KMeans
, andKMeans++
designed to perform Factor Analysis of Mixed Data on Kaggle's Netflix Clicks Dataset and then perform parallelized k-Means and k-Means++ clustering using PySpark. -
plotter.py
: A Python file including aPlotter
class designed to build auxiliary plots for the written report onmain.ipynb
.
-
-
commandline.sh
: A bash script including the code to solve the Command Line Question. -
images/
: A folder containing a screenshot of the successful execution of thecommandline.sh
script. -
.gitignore
: A predetermined.gitignore
file that tells Git which files or folders to ignore in a Python project. -
LICENSE
: A file containing an MIT permissive license.
In this homework we worked with Kaggle's predefined Netflix Clicks Dataset.
If the Notebook doesn't load through Github please try all of these steps:
-
Try compiling the Notebook through its NBViewer.
-
Try downloading the Notebook and opening it in your local computer.
Author: Miguel Angel Sanchez Cortes
Email: sanchezcortes.2049495@studenti.uniroma1.it
MSc. in Data Science, Sapienza University of Rome