# Another allotment problem using Pandas.

## Problem at hand.

Consider a national level examination is to be held. The test centres are scattered across the country and each requiring a set of invigilators.
The volunteers are required to give in a set of choices for the cities they are fine going to.
The task is to assign a volunteer to each test centre such that each centre gets a volunteer and the volunteers are also happy.

### Scenario

One of such All India Exams is GATE, volunteers at IIT Bombay were asked to give in a set of 10 choices of cities they are ready to go to for invigilation. GATE is held in 2 slots, i.e. 2 weekends and volunteers can also provide choices on which weekend they are ready to go for invigilation.

Constraints:
- Each volunteers can be allotted to only one invigilator post
- A centre can only be alloted a fixed number of volunteers.

### Data available

We have available 2 files:
- One which contains a list of volunteers along with their choices for the centres they fine going to.
- The second file has a list of centres and the number of invigilators required for each of them.

### Lets Play!!!
Lets read in the data available.
Pandas make it very easy to read in CSV files into an object called 'DataFrames'.


In [None]:
# Import Pandas
import pandas as pd
# Reading in the volunteer data and taking a peek.
volunteerDF = pd.read_csv('GATEprefs.csv')
volunteerDF.head()

In [None]:
# Now we have the data as DataFrames, which can be easily indexed to retrieve data.
# Lets slice out the 3rd volunteers data
volunteerDF.loc[3]

In [None]:
# Counting the number of volunteers in the dataset.
volunteerCount = len(volunteerDF.index)
volunteerCount

In [None]:
# Extracting the set of people ready to go on the first weekend.
volunteerDF[volunteerDF['GATE_1st Weekend'] == "yes"].head()

In [None]:
# Lets see how many people are interested in going to GOA? :P
len(volunteerDF[volunteerDF['GATE_City1'] == "Goa"])

In [None]:
# Importing a few libraries for plotting
%matplotlib inline
import numpy as np

In [None]:
# Distribution of volunteers ready to go on 1st weekend
volunteerDF['GATE_1st Weekend'].value_counts().plot(kind='pie')

In [None]:
# Distribution of volunteers ready to go on 1st weekend
volunteerDF['GATE_2nd Weekend'].value_counts().plot(kind='pie')

In [None]:
# Quick stats on the first choice of volunteers.
volunteerDF.GATE_City1.describe()

In [None]:
# Bar chart of choice1, gives popularity of a city in a way.
volunteerDF.GATE_City1.value_counts().plot(figsize=(20,5), kind='bar')

In [None]:
# Using Choice2
volunteerDF.GATE_City2.value_counts().plot(figsize=(20,5), kind='bar')

In [None]:
# Making a dataframe of volunteers opting for Mumbai as their first choice.
mumbaiVolunteersDF = volunteerDF[volunteerDF['GATE_City1'] == "Mumbai"]

In [None]:
# Viewing the new dataframe.
mumbaiVolunteersDF.tail()

In [None]:
# Plotting the city popularity for choice2 among volunteers who opted mumbai as first choice.
mumbaiVolunteersDF.GATE_City2.value_counts().plot(figsize=(10,5), kind='bar', alpha=0.5)

In [None]:
# Making a dataframe of volunteers opting for Hyderabad as their first choice.
hyderabadVolunteersDF = volunteerDF[volunteerDF['GATE_City1'] == "Hyderabad"]

In [None]:
# Viewing the new dataframe.
hyderabadVolunteersDF.tail()

In [None]:
# Getting a data frame of just the EE1 students.
hyderabadVolunteersDF.groupby('GATE_City2').count()

In [None]:
# Plotting the city popularity for choice2 among volunteers who opted hyderabad as first choice.
hyderabadVolunteersDF.GATE_City2.value_counts().plot(figsize=(10,5), kind='bar', alpha=0.5)

In [None]:
# Saving a file.
mumbaiVolunteersDF.to_csv('mumbaiVolunteersPrefs.csv')

## Conclusion :
* Huge datasets can be handled very easily and efficiently.
* Pandas gives us a very simple and intuitive interface to deal with data.
* Pandas is optimized for this and is hence much faster than using lists or other default python language constructs.

# THANK YOU!