**Import libraries**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

**Loading dataset**

In [None]:
college_raw = pd.read_csv("../input/forbes-americas-top-colleges-2019/ForbesAmericasTopColleges2019.csv")

## **1. Understanding the data**

In [None]:
college_raw.shape

There are 650 colleges with 17 different attributes (before cleaning).

In [None]:
college_raw.head()

In [None]:
college_raw.describe(include = 'all')

In [None]:
college_raw.nunique()

## **2. Cleaning the data**

Removing columns we will not focus on and finding null values

In [None]:
college_raw.isnull().sum()

Columns that are not relevant to this analysis: City, State, SAT & ACT scores. However, I will drop undergraduate population and focus on total population instead.

In [None]:
college_raw2 = college_raw.drop(['City', 'State', 'SAT Lower', 'SAT Upper', 'ACT Lower', 'ACT Upper', 'Website', 'Undergraduate Population'], axis = 1)

In [None]:
college_raw2.describe(include = 'all')

In [None]:
college_raw2.isnull().sum()

Now we drop the rows will null values to make the count uniform across all columns

In [None]:
college = college_raw2.dropna(axis = 0)

In [None]:
college.describe(include = 'all')

In [None]:
college.shape

Now we are left with the columns we will analyze. There are now 628 rows of data and 9 columns.

## **3. Analysis**

### **3.1** What can we find between the relationship of the Rank and Student Population within these colleges

In [None]:
sns.lmplot(x="Rank", y="Student Population", hue="Public/Private",
             data=college)

Some information we can gather from this graph:
* The lower the ranking, the lower the Student Population
* Higher ranked colleges have higher Student Populations
* Overall, Public colleges have higher Student Populations

### **3.2** How does a college's Rank and Acceptance Rate relate?

In [None]:
sns.lineplot(x="Rank", y="Acceptance Rate",
             hue="Public/Private", 
             data=college)

This shows that colleges that are ranked lower have higher acceptance rates & we can see how that differs between Private vs. Public colleges. 

### **3.3** What relationship can we find between Rank and Cost?

In [None]:
sns.lmplot(y='Total Annual Cost', x='Rank', hue="Public/Private", data=college)

Overall, we notice that higher ranked colleges have higher Total Annual Cost, and there is a decrease in cost as our x-value goes further to the right (as the college's rank becomes lower).

### **3.4** How do alumni salaries and college rank relate?

In [None]:
sns.lmplot(y='Alumni Salary', x='Rank', hue="Public/Private", data=college)

We make two conclusions based on the above scatterplot;
1. Schools with higher rankings tend to produce higher earning alumni
2. Private schools have more higher earning alumni than the public schools in the US

### **3.5** How do Average Grant Aid and Rank relate?

In [None]:
sns.lmplot(x='Rank', y='Average Grant Aid', hue="Public/Private", data=college)

From the scatterplot, we can conclude that colleges with a higher rank are able to provide more Student aids/grants, with private colleges taking the lead.

### **4. Conclusion**

We've analyzed the relationship between different attributes in this dataset. 
We can come to a conclusion on how the rank of a college affects its student population, acceptance rate, alumni earning potential, cost and student aid. 