This is the repository detailing the data work flow of Olymvis, a visualization project by by Zihui (Chris) Fang and Hongtao Hao, with Equal Contribution, as the term paper for Professor Yong-Yeol Ahn's Data Visualization course in 2019 Fall.
The main data set used in this project is Olympic_history by rgriff23.
Other complementary data sets:
-
continent.csv is used for extracting the ISO-3166 three letter country code and the corresponding continent name.
-
host_countries.csv is used in vis 3.
-
continent_4.xlsx is used to extract the IOC (International Olympic Committee) country codes and the ISO-3166 three letter country codes. We merge
continent_4
withcontinent
to get the corresponding continent name of each IOC code. We then merge this with summer.csv to produce continent_percentage_tidy and continent_percentage_untidy, two of which we used to visualize changes in female particiation in the Olympics in different continents.
We used these scripts to produce the results in output from data sources.
-
summer.csv
was produced by filtering only the Summer Olympics inathelet_events.csv
. It was processed by extract_summer.py. -
year_sex_percentage.csv
listed the percentages of both male and female in each Summer Olympic Game. It was produced by prepare_year_sex_percentage.py. -
continent_percentage_tidy.csv
&continent_percentage_untidy.csv
were to visualize changes in female particiation in the Olympics in different continents. They were produced by prepare_continent_percentage.py. -
medal_summary.csv
was used to visualize home-field advantage. It was produced by prepare_medal_summary.py. -
medal_efficiency.csv
was used to visualize medal efficiency for each participating country or region. It was produced by prepare_medal_efficiency.py.
We used these notebooks to produce our visulizations.
MIT
Contact Hongtao or Chris if you have any suggestions or questions!