# Converting PDF results of APMO 2011 to CSV

In the past, the results for APMO were reported in `PDF` format. We want to convert these to a more friendly `CSV` format for display and analysis.

In the case of APMO 2012, after copying and pasting the ranked table from the PDF to the text file in `results_pre2016/apmo2011_res_text.txt,` and a bit of tweaking we get the following:

```
1 KOREA 10 289 1 2 4 3
2 USA 10 269 1 2 4 3
...
34 SYRIA 10 5 0 0 0 0
35 QATAR 7 1 0 0 0 00
```

In other words, each row is a row of data, which we may directly read using `pandas.read_csv()`.We do this and perform some standarization. Then we save the info to `CSV`.

In [5]:
import pandas as pd

# We use a year variable for reusability.
year=2011

We open the file and split it by lines.

In [6]:
df=pd.read_csv('results_pre2016/apmo%s_res_text.txt' % year, sep=" ", header=None)
df.columns=['Rank', 'Country', '# of Contestants', 'Total Score', 'Gold Awards', 'Silver Awards', 'Bronze Awards', 'Honorable Mentions']
df

Unnamed: 0,Rank,Country,# of Contestants,Total Score,Gold Awards,Silver Awards,Bronze Awards,Honorable Mentions
0,1,KOREA,10,289,1,2,4,3
1,2,USA,10,269,1,2,4,3
2,3,THAILAND,10,228,1,2,4,3
3,4,PERU,10,223,1,2,4,3
4,5,TAIWAN,10,220,1,2,4,3
5,6,JAPAN,10,208,1,2,4,3
6,7,RUSSIA,10,205,1,2,4,3
7,8,SINGAPORE,10,205,1,2,4,3
8,9,BRAZIL,10,190,1,2,4,3
9,10,HONG_KONG,10,173,1,2,4,3


From the information above, we note that we will need to change the name of some countries to get the standard country names that we are using. 

In [7]:
df['Country']=df['Country'].str.title().str.replace('_', ' ')
df.loc[0,'Country']='Republic of Korea'
df.loc[1,'Country']='United States of America'
df.loc[29,'Country']='Trinidad and Tobago'
df.loc[31,'Country']="Côte d'Ivoire"
df

Unnamed: 0,Rank,Country,# of Contestants,Total Score,Gold Awards,Silver Awards,Bronze Awards,Honorable Mentions
0,1,Republic of Korea,10,289,1,2,4,3
1,2,United States of America,10,269,1,2,4,3
2,3,Thailand,10,228,1,2,4,3
3,4,Peru,10,223,1,2,4,3
4,5,Taiwan,10,220,1,2,4,3
5,6,Japan,10,208,1,2,4,3
6,7,Russia,10,205,1,2,4,3
7,8,Singapore,10,205,1,2,4,3
8,9,Brazil,10,190,1,2,4,3
9,10,Hong Kong,10,173,1,2,4,3


Now we add the ISO three letter code that we use for navigation on the website. We load the info from the `iso-alpha-3.csv` file.

When we perform the merge, Pandas reorders the rows. This is an undesired behaviour, so we order back by rank. To do this, we first need to convert the rank column type to `int`.

In [8]:
codes=pd.read_csv('iso-alpha-3.csv')
data_coded=pd.merge(codes,df,left_on='country', right_on='Country', how='right').drop('country', axis=1)
data_coded['Rank']=data_coded.Rank.astype(int)
data_coded=data_coded.sort_values('Rank')
cols=data_coded.columns.tolist()
data_coded=data_coded[[cols[1],cols[0]]+cols[2:]]
data_coded.rename(columns={'code':'Code'}, inplace=True)
data_coded

Unnamed: 0,Rank,Code,Country,# of Contestants,Total Score,Gold Awards,Silver Awards,Bronze Awards,Honorable Mentions
16,1,KOR,Republic of Korea,10,289,1,2,4,3
33,2,USA,United States of America,10,269,1,2,4,3
30,3,THA,Thailand,10,228,1,2,4,3
21,4,PER,Peru,10,223,1,2,4,3
28,5,TWN,Taiwan,10,220,1,2,4,3
14,6,JPN,Japan,10,208,1,2,4,3
24,7,RUS,Russia,10,205,1,2,4,3
26,8,SGP,Singapore,10,205,1,2,4,3
4,9,BRA,Brazil,10,190,1,2,4,3
12,10,HKG,Hong Kong,10,173,1,2,4,3


Now the information is exactly in the form that we need. We save the work.

In [9]:
data_coded.to_csv('reports/by_country_ranked_%s.csv' % year,index=False)