Skip to content

sleepypioneer/machine-learning-zoomcamp

Repository files navigation

Machine Learning Bookcamp

Notes and exercises from following alexeygrigorev Machine Learning Zoomcamp

notes & project links

Week one: Intro to Machine Learning

Week two: Regression

Week three: Classification (churn prediction)

Week four: model evaluation

Week five: model deployment

Week six: decision trees

Week seven: decision trees

Week eight: deep learning

Week nine: serverless

Week ten: kubernetes

Running the local dev environment

Running scripts locally in a virtual environment

Create a virtual environment

python3.9 -m venv machine-learning-zoomcamp

Activate virtual environment

machine-learning-zoomcamp\Scripts\activate
# or on linux
source machine-learning-zoomcamp/bin/activate

Start a Jupyter server for notebooks 📓

To spin up a Docker container with Jupyter Notebook libraries installed, run:

make dev

Download data

The first time collecting data run the following to create the data directory:

mkdir data

To download all data you can run make get-data or for individual sets run the following commands.

Car data (for week one homework and week two notebook)

cd data && mkdir car_data && cd car_data && wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv

New York City Airbnb Open Data (for weeks two and three homework)

cd data && mkdir airbnb_data && cd airbnb_data && wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/AB_NYC_2019.csv

UCI Student Performance Data (for weeks two homework)

cd data && mkdir student_performance_data && cd student_performance_data && wget https://archive.ics.uci.edu/ml/machine-learning-databases/00320/student.zip && unzip student.zip

Customer churn data

cd data && mkdir telco_customer_churn && cd telco_customer_churn && wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-03-churn-prediction/WA_Fn-UseC_-Telco-Customer-Churn.csv

Credit risk data

cd data && mkdir credit_risk && cd credit_risk && wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/AB_NYC_2019.csv

Clothing dataset (subset)

cd data && git clone git@github.com:alexeygrigorev/clothing-dataset-small.git

Adding new libraries

Libraries which are currently not available in the Jupyter Notebook should be added to the Dockerfile.

Releases

No releases published

Packages

No packages published

Languages