K-Means Clustering with Python and Olympian Data
This repository contains a Jupyter notebook which can be used in a workshop about k-means clustering using the 120 years of Olympic history: athletes and results dataset available on Kaggle.
- You will need miniconda (or the full anaconda) for Python 3.7. Allow it to prepend the install location to your path.
- (Don't forget to source your
.bash_profileso bash can find the
- Clone this repo
- Using the
environment.ymlfile, create a new conda environment:
conda env create -f environment.yml
- To activate the environment, run
source activate myenv.
- To test that everything works, run
jupyter notebookand navigate to
localhost:8888/in your browser. You should see an interface like this:
Working with the Jupyter Notebook
There are two versions of this notebook:
olympic_kmeans_follow_along.ipynblets you follow along, filling in the code as you go.
olympic_kmeans.ipynbis the full notebook, with answers if you get stuck
Click on the notebook you wish to run.
Inside each notebook are several cells. When interacting with the cells, you can either be in:
- Edit Mode (green border) for editing cells. Selecting a cell and hitting ENTER will put you in Edit Mode.
- Command Mode (blue border) for running cells. Hitting ESCAPE on a cell in Insert Mode will put you back in Command Mode.
To run a selected cell, you can either hit the "Run" button in the top menu bar or by hitting Shift+Enter in Command Mode.