https://www.dataquest.io/mission/113/challenge-summarizing-data/

## 2: College Majors And Employment

The American Community Survey is a survey run by the US Census Bureau that collects data on everything from the affordability of housing to employment rates for different industries. For this challenge, you'll be using the data derived from the American Community Survey for years 2010-2012. The team at FiveThirtyEight has cleaned the dataset and made it available on their [Github repo](https://github.com/fivethirtyeight/data/tree/master/college-majors).

Here's a quick overview of the files we'll be working with:

[all-ages.csv](https://github.com/fivethirtyeight/data/blob/master/college-majors/all-ages.csv) - employment data by major for all ages <br />
[recent-grads.csv](https://github.com/fivethirtyeight/data/blob/master/college-majors/recent-grads.csv) - employment data by major for just  recent college graduates <br />

In [3]:
%sh
# download source file
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv
ls -l

In [4]:
import pandas as pd

all_ages = pd.read_csv("all-ages.csv")
print all_ages.columns
print all_ages.head(3)

recent_grads = pd.read_csv("recent-grads.csv")
print recent_grads.columns
print recent_grads.head(3)

## 3: Summarizing Major Categories
  
In both of these datasets, majors are grouped into categories. There are multiple rows with a common value for `Major_category` but different values for `Major`. We would like to know the total number of people in each `Major_category` for both datasets.

In [6]:
all_ages_major_categories = {}
recent_grads_major_categories = {}

def calculate_major_cat_totals(df):
  counts_dictionary = {}
  for cat in df["Major_category"].value_counts().index:
    counts_dictionary[cat] = df["Total"][df["Major_category"] == cat].sum()
  return counts_dictionary

all_ages_major_categories = calculate_major_cat_totals(all_ages)
recent_grads_major_categories = calculate_major_cat_totals(recent_grads)

print all_ages_major_categories
print recent_grads_major_categories

## 4: Low Wage Jobs Rates

The [press likes to talk a lot](http://bit.ly/1fNLmaT) about how many college grads are unable to get higher wage, skilled jobs and end up working lower wage, unskilled jobs instead. As a data person, it is your job to be skeptical of any broad claims and analyze relevant data to obtain a more nuanced view. Let's run some basic calculations to explore that idea further.

In [8]:
low_wage_percent = recent_grads["Low_wage_jobs"].astype(float).sum() / recent_grads["Total"].sum()

print low_wage_percent

## 5: Comparing Datasets

Both `all_ages` and `recent_grads` datasets have 173 rows, corresponding to the 173 college major codes. This enables us to do some comparisons between the two datasets and perform some initial calculations to see how similar or different the statistics of recent college graduates are from those of the entire population.

In [10]:
# All majors, common to both DataFrames
majors = recent_grads['Major'].value_counts().index
recent_grads_lower_emp_count = 0
all_ages_lower_emp_count = 0

for major in majors:
  recent_unemp = recent_grads["Unemployment_rate"][recent_grads["Major"] == major].values[0]
  all_unemp = all_ages["Unemployment_rate"][all_ages["Major"] == major].values[0]
  if recent_unemp < all_unemp:
    recent_grads_lower_emp_count += 1
  elif recent_unemp > all_unemp:
    all_ages_lower_emp_count += 1

print "Recent grads fare better: ", recent_grads_lower_emp_count
print "All ages fare better: ", all_ages_lower_emp_count