## Python Programming Practice Exam

```
Scoring:
============
 0-11: fail (1),
12-14: pass (2),
15-17: satisfactory (3),
18-20: good (4),
21-24: excellent (5).
```

### Problem 1
<p style="text-align: right">(4 points)</p>

In [1]:
words = 'őszbe,csavarodott,a,természet,feje,dérré,vált,a,harmat,hull,a,fák,levele'

The string `words` contains comma separated words. Write a program that prints the longest word (the one that contains the most characters)! The program should not only work for the given `words` string, but also for arbitrary input of the same format.

In [2]:
# solution 1 (old school)
maxlen = -1
for word in words.split(','):
    l = len(word)
    if l > maxlen:
        maxlen = l
        longest = word
longest, maxlen

('csavarodott', 11)

In [3]:
# solution 2 ("Pythonic")
max(words.split(','), key=len)

'csavarodott'

### Problem 2
<p style="text-align: right">(8 points)</p>

The file [points.txt](points.txt) contains the coordinates of points in the plane. Write a program that prints the two points that are closest to each other and prints their distance. The program should not only work for the given [points.txt](points.txt), but also for arbitrary input of the same format.

$\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}$

In [4]:
# solution 1 (vanilla Python)

# load data to dictionary
points = {}
for line in open('points.txt'):
    tok = line.split()
    points[tok[0]] = float(tok[1]), float(tok[2])

# compare every pair
names = sorted(points)
result = []
for i in range(len(points)):
    for j in range(i + 1, len(points)):
        x1, y1 = points[names[i]]
        x2, y2 = points[names[j]]
        dist = ((x1 - x2)**2 + (y1 - y2)**2)**0.5
        result.append((dist, names[i], names[j]))

# get closest pair
min(result)

(0.44721359549995715, 'G', 'K')

In [5]:
# solution 2 (based on pandas and itertools)
import pandas as pd
import itertools
points = pd.read_csv('points.txt', sep=' ', names=['name', 'x', 'y'], index_col='name')
result = []
for (n1, p1), (n2, p2) in itertools.combinations(points.iterrows(), 2):
    dist = ((p1.x - p2.x)**2 + (p1.y - p2.y)**2)**0.5
    result.append((dist, n1, n2))
min(result)

(0.44721359549995715, 'G', 'K')

### Problem 3
<p style="text-align: right">(12 points)</p>

The file [epsom.txt](epsom.txt) contains data about the winners of Epsom Derby. Write a program that reads the data and answers the following questions:

**(a)** Is it true that no horse won the derby two times?<br>
**(b)** What was the year of the fastest time?<br>
**(c)** Who are the 7 best jockeys based on the number of victories?

In [8]:
import pandas as pd
df = pd.read_csv('epsom.txt', sep='\t', comment='#')
df

Unnamed: 0,Year,Winner,Jockey,Trainer,Owner,Dist.,Time
0,1780,Diomed,Sam Arnull,R. Teasdale,Sir Charles Bunbury,,
1,1781,Young Eclipse,Charles Hindley,,Dennis O'Kelly,,
2,1782,Assassin,Sam Arnull,Frank Neale,3rd Earl of Egremont,,
3,1783,Saltram,Charles Hindley,Frank Neale,John Parker,,
4,1784,Serjeant,John Arnull,,Dennis O'Kelly,,
...,...,...,...,...,...,...,...
236,2015,Golden Horn,Frankie Dettori,John Gosden,Anthony Oppenheimer,3½,2:32.32
237,2016,Harzand,Pat Smullen,Dermot Weld,Aga Khan IV,1½,2:40.09
238,2017,Wings of Eagles,Padraig Beggy,Aidan O'Brien,Smith / Magnier / Tabor,¾,2:33.02
239,2018,Masar,William Buick,Charlie Appleby,Godolphin,1½,2:34.93


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 241 entries, 0 to 240
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Year     241 non-null    int64 
 1   Winner   241 non-null    object
 2   Jockey   241 non-null    object
 3   Trainer  239 non-null    object
 4   Owner    241 non-null    object
 5   Dist.    202 non-null    object
 6   Time     175 non-null    object
dtypes: int64(1), object(6)
memory usage: 13.3+ KB


In [11]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,241.0,1899.435685,69.288625,1780.0,1840.0,1899.0,1959.0,2019.0


In [16]:
# (a) Is it true that no horse won the derby two times?
(df['Winner'].value_counts() == 1).all()

True

In [36]:
# (b) What was the year of the fastest time?
se = df['Time']
df[se == se[se.notnull()].min()]['Year'].values[0]

1945

In [45]:
# (c) Who are the 7 best jockeys based on the number of victories?
#df.groupby('Jockey').size().sort_values()[::-1]
df['Jockey'].value_counts()[:7]

Lester Piggott    9
Jem Robinson      6
Steve Donoghue    6
Bill Clift        5
John Arnull       5
Fred Archer       5
Frank Buckle      5
Name: Jockey, dtype: int64

In [46]:
df[df['Jockey'] == 'Lester Piggott']

Unnamed: 0,Year,Winner,Jockey,Trainer,Owner,Dist.,Time
175,1954,Never Say Die,Lester Piggott,Joseph Lawson,Robert Sterling Clark,2,2:35.8
178,1957,Crepello,Lester Piggott,Noel Murless,Sir Victor Sassoon,1½,2:35.4
181,1960,St. Paddy,Lester Piggott,Noel Murless,Sir Victor Sassoon,3,2:35.8
189,1968,Sir Ivor,Lester Piggott,Vincent O'Brien,Raymond R. Guest,1½,2:38.73
191,1970,Nijinsky,Lester Piggott,Vincent O'Brien,"Charles W. Engelhard, Jr.",2½,2:34.68
193,1972,Roberto,Lester Piggott,Vincent O'Brien,John W. Galbreath,shd,2:36.09
197,1976,Empery,Lester Piggott,Maurice Zilber,Nelson Bunker Hunt,3,2:35.69
198,1977,The Minstrel,Lester Piggott,Vincent O'Brien,Robert Sangster,nk,2:36.44
204,1983,Teenoso,Lester Piggott,Geoff Wragg,Eric Moller,3,2:49.07
