# Libraries

In [1]:
import pandas as pd
import seaborn as sns
sns.set_style('darkgrid')
import matplotlib.pyplot as plt
%matplotlib inline

# Data

In [1]:
# Convert csv to DataFrame
data19=pd.read_csv('../input/popularity-of-cambridge-university-colleges/Cam_college_stats/Cam_stats_2019.csv')

# Print number of rows and number of columns in dataset
print(data19.shape)

# Have a look at the first 5 entries
data19.head()

In [1]:
# remove last row
data19.drop([31], axis=0, inplace=True)

# Analysis

First, we'll plot how many times each college was put down as an applicants first choice. 

In [1]:
# Figure size
plt.figure(figsize=(12,5))

# Barchart
sns.barplot(x='Code', y='1st choice', data=data19, 
            order=data19.sort_values('1st choice',ascending=False).Code)

# Title
plt.title('Most popular Cambridge colleges for graduate students in 2019')

* From this we see **Trinity, King's and St John's** are the three colleges with the most people selecting them as their first preference. 
* In contrast, **Murray Edwards, Robinson and Girton** get the lowest number of 1st preferences.

We can also look at how *difficult* each college is to get into by looking at the ratio of 1st preferences to 'No of applicants admitted to their 1st choice'.

In [1]:
# Admission rate
data19['admission rate']=data19['No of applicants admitted to their 1st choice']/data19['1st choice']

# Figure size
plt.figure(figsize=(12,5))

# Barplot
sns.barplot(x='Code', y='admission rate', data=data19,
           order=data19.sort_values('admission rate',ascending=False).Code)

# Title
plt.title('Admission rate for Cambridge colleges in 2019')

* From this we see that **Homerton, Sidney Sussex and Wolfson** were the colleges most likely to accept students who put them down as their first choice (in 2019).
* In contrast, **Trinity and King's** were the hardest colleges to get into (in 2019). This is not a surprise as they are the two most popular colleges.

Finally we'll look at how 'big' each college is, i.e. how many students they let in in 2019. We'll keep the same ordering as in the first plot to see if popularity is correlated with college size.

In [1]:
# Figure size
plt.figure(figsize=(12,5))

# Barplot
sns.barplot(x='Code', y='Total admitted', data=data19,
           order=data19.sort_values('1st choice',ascending=False).Code)

# Title
plt.title('Number of postgrad students admitted to Cambridge colleges in 2019')