Skip to content
A Jupyter notebook for k-means clustering on Olympic athlete data
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

K-Means Clustering with Python and Olympian Data

This repository contains a Jupyter notebook which can be used in a workshop about k-means clustering using the 120 years of Olympic history: athletes and results dataset available on Kaggle.


  • You will need miniconda (or the full anaconda) for Python 3.7. Allow it to prepend the install location to your path.
  • (Don't forget to source your .bash_profile so bash can find the conda binary!)
  • Clone this repo
  • Using the environment.yml file, create a new conda environment: conda env create -f environment.yml
  • To activate the environment, run source activate myenv.
  • To test that everything works, run jupyter notebook and navigate to localhost:8888/ in your browser. You should see an interface like this:

Jupyter Notebook Screenshot

Working with the Jupyter Notebook

There are two versions of this notebook:

  • olympic_kmeans_follow_along.ipynb lets you follow along, filling in the code as you go.
  • olympic_kmeans.ipynb is the full notebook, with answers if you get stuck

Click on the notebook you wish to run.

Inside each notebook are several cells. When interacting with the cells, you can either be in:

  • Edit Mode (green border) for editing cells. Selecting a cell and hitting ENTER will put you in Edit Mode.

Edit Mode

  • Command Mode (blue border) for running cells. Hitting ESCAPE on a cell in Insert Mode will put you back in Command Mode.

Command Mode

To run a selected cell, you can either hit the "Run" button in the top menu bar or by hitting Shift+Enter in Command Mode.


You can’t perform that action at this time.