For this analysis, I will investigate what kinds of schools generally provide better return-on-investment in different regions.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

college_type = pd.read_csv('../input/salaries-by-college-type.csv')
college_region = pd.read_csv('../input/salaries-by-region.csv')

The first place to start with any data analysis is to look at the data to see what there is to work with.

In [None]:
college_type.head(5)

In [None]:
college_region.head(5)

The two input files "salaries-by-college-type.csv" and "salaries-by-region.csv" have very similar information, so let's try to merge the information into 1 data frame by using the merge function with the key being 'School Name'.

In [None]:
#Combine the college data
cols = ['School Name','Region']
college_combined = pd.merge(left=college_type, right=college_region[cols], how='inner',on='School Name')
college_combined.head()

Next, let's look at how the starting and mid salaries vary across colleges based on their region and type. 

Thanks to [this kernel](https://www.kaggle.com/cdelany7/exploration-of-college-salaries-by-major) on how to turn the salaries into numeric values

In [None]:
dollar_cols = ['Starting Median Salary','Mid-Career Median Salary']

for x in dollar_cols:
    college_combined[x] = college_combined[x].str.replace("$","")
    college_combined[x] = college_combined[x].str.replace(",","")
    college_combined[x] = pd.to_numeric(college_combined[x])


In [None]:
pivotinfo = pd.pivot_table(college_combined,index=['Region'],columns=['School Type'], values =['Starting Median Salary'])
sns.heatmap(pivotinfo, annot=True)

Engineering schools look like a good investment across all regions, specifically in California. 

Overall, the difference is smaller when comparing between a liberal arts, party, or state school. Party schools slightly beat out state schools in every region. Liberal arts schools look least attractive in the Midwest and Western regions. 

It would be interesting to cross this insight with cost of living information to see how much "value" you can really get in each region after factoring in cost of living.

In [None]:
pivotinfo = pd.pivot_table(college_combined,index=['Region'],columns=['School Type'], values =['Mid-Career Median Salary'])
sns.heatmap(pivotinfo, annot=True)

Interestingly, the mid-career salaries start to show much more significant differences between school types. Going to a state school results in the lowest mid-career salary across almost every region.

The earning potential of an Ivy League education also becomes apparent later in a career. It would be interesting to see more data on why each group is successful (i.e. better networking, school name recognition, etc.).

# Conclusion
Engineering generally leads to the highest salaries across all regions early in the career and later in the career as well. Liberal Arts and Party schools generally leed to strong salaries as well, but they may not appear until later in your career. California schools generally lead to higher salaries comparied to other regions.
