# America's Top College Rankings 2019 (Forbes) Analysis usin SQL

## About the dataset

Starting in 2008, every year Forbes Magazine publishes a list of America's best colleges. Schools are ranked based on alumni salary (20%), student satisfaction (20%), debt (20%), American leaders (15%), on time graduation rate (12.5%), and academic success (12.5%).
___

## About this notebook

This notebook consists of brief data analysis and data visualtion on the given dataset. Majority of the analysis ponders upon <code>Public vs Private</code> institutions.

### Task performed in this notebook

1. Importing the CSV file into SQL database
2. SQL querries to analyze the data

### Packages/libraries used

| S.No | Package/library | Use
| --- | --- | --- |
|1. | Pandas | Data manipulation and analysis |
|2. | SQLITE | Lightweight SQL database |
|3. | Plotly | Graphs and charts |

In [None]:
import pandas as pd
import sqlite3
import plotly.express as px
import os

In [None]:
# Reading the CSV file

data = pd.read_csv("../input/forbes-americas-top-colleges-2019/ForbesAmericasTopColleges2019.csv")

# Establishing the SQL connection

conn = sqlite3.connect("college.db")

# Inserting the CSV file into SQL DB as SQL tables

data.to_sql("college", conn)

In [None]:
pd.read_sql('select * from college;', conn)

In [None]:
# Private vs Public

pr_pu = pd.read_sql('select "Public/Private" as "College Type", count(*) as "Colleges in list" from college group by "College Type";', conn)
fig_1 = px.pie(pr_pu, values="Colleges in list", names="College Type", title="Private vs Public Colleges")
fig_1.update_traces(textposition="inside", textinfo="label+percent")
fig_1.show()

In [None]:
# Average cost of studying in Public vs Private

cost = pd.read_sql('select "Public/Private" as "College Type", avg("Total Annual Cost") as "Average Annual Cost" from college group by "College Type";', conn)
px.bar(cost, x="College Type", y="Average Annual Cost", title="Average Cost of study Publivs vs Private", color="Average Annual Cost")

In [None]:
# States with maximum number of colleges

state = pd.read_sql('select State, count(*) as "Number of colleges" from college group by "State" order by "Number of colleges" desc;', conn)
px.bar(state, x="State", y="Number of colleges", title="Number of college from each state", color="Number of colleges")

In [None]:
# Average acceptance rate Public Vs Private

acc = pd.read_sql('select "Public/Private" as "College type", avg("Acceptance Rate") as "Average acceptance rate" from college group by "College Type";', conn)
fig_2 = px.pie(acc, values="Average acceptance rate", names="College type", title="Average acceptance rate Public Vs Private")
fig_2.update_traces(textposition="inside", textinfo="label+percent")
fig_2.show()

In [None]:
# Average alumni salary Public Vs Private

alu_sal = pd.read_sql('select "Public/Private" as "College type", avg("Alumni Salary") as "Average Alumni Salary" from college group by "College type";', conn)
px.bar(alu_sal, y="Average Alumni Salary", x="College type", title="Average alumni salary Public Vs Private", color="Average Alumni Salary")


In [None]:
# Student population Public Vs Private

stu_pop = pd.read_sql('select "Public/Private" as "College type", avg("Student Population") as "Average Student Population" from college group by "College type";', conn)
px.bar(stu_pop, x="College type", y="Average Student Population", title="Average Student Population Public Vs Private", color="Average Student Population")

In [None]:
# Minimum average requires SAT score for admission Public vs Private

sat = pd.read_sql('select "Public/Private" as "College type", avg("SAT Lower") as "Minimum average SAT score required for admission" from college group by "College type";', conn)
px.bar(sat, x="College type", y="Minimum average SAT score required for admission", color="Minimum average SAT score required for admission", title="Minimum average SAT score required for admission Public Vs Private")

In [None]:
# Average aid grant Public vs Private

aid = pd.read_sql('select "Public/Private" as "College type", avg("Average Grant Aid") as "Average Grant Aid" from college group by "College type";', conn)
px.bar(aid, x="College type", y="Average Grant Aid", color="Average Grant Aid", title="Average Grant Aid Public Vs Private")

In [None]:
# Student vs College rank (Concentration of students)

college = pd.read_sql('select * from college;', conn)

px.scatter(college, x="Rank", y="Student Population", color="Public/Private", size="Student Population", title="Student Populatioon vs College ranking")

In [None]:
# Closing the conection

conn.close()