#**Guide to Picking a College Major**

#**Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called **modules** that enable us to be able to

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from IPython.display import Image
import warnings
warnings.simplefilter('ignore', FutureWarning)

#**Context**

This assignment contains the data behind the story The Economic Guide To Picking A College Major. The data is from American Community Survey 2010-2012 Public Use Microdata Series. With this dataset, you'll have the power to explore college programs and their graduates like never before and create stories of your own!

In [2]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://media.istockphoto.com/id/1470208665/photo/multi-ethnic-group-of-latin-and-african-american-college-students-smiling-diversity-portrait.jpg?s=2048x2048&w=is&k=20&c=zicp2F74iFTRKjJUwFBgs_Mb_Xd5vvkvdmYSVoekb1I='

# Display the image
Image(url=image_url)

#**About the Dataset**

This dataset contains 172 rows corresponding to a random sample of people with at least some college. The variables provided are listed below:

**Variables**

| Column name                 | Description                                                |
|-----------------------------|------------------------------------------------------------|
| Index                       | A number assigned to an individual                         |
| Major_code                  | The code associated with the major (Integer)               |
| Major                       | The specific major of the field of study (String)          |
| Major_category              | The category of the major (String)                         |
| Grad_total                  | The total number of graduates from the major (Integer)     |
| Grad_sample_size            | The sample size of graduates from the major (Integer)      |
| Grad_employed               | The number of graduates employed (Integer)                 |
| Grad_full_time_year_round   | The number of graduates employed full-time year-round (Integer) |
| Grad_unemployed             | The number of graduates unemployed (Integer)               |
| Grad_unemployment_rate      | The unemployment rate of graduates (Float)                 |
| Grad_median                 | The median salary of graduates (Integer)                   |
| Grad_P25                    | The 25th percentile salary of graduates (Integer)          |
| Grad_P75                    | The 75th percentile salary of graduates (Integer)          |
| Nongrad_total               | The total number of non-graduates from the major (Integer) |
| Nongrad_employed            | The number of non-graduates employed (Integer)             |
| Nongrad_full_time_year_round| The number of non-graduates employed full-time year-round (Integer) |
| Nongrad_unemployed          | The number of non-graduates unemployed (Integer)           |
| Nongrad_unemployment_rate   | The unemployment rate of non-graduates (Float)             |
| Nongrad_median              | The median salary of non-graduates (Integer)               |
| Nongrad_P25                 | The 25th percentile salary of non-graduates (Integer)      |
| Nongrad_P75                 | The 75th percentile salary of non-graduates (Integer)      |
| Grad_share                  | The share of graduates in the major (Float)                |
| Grad_premium                | The difference between the median salary of graduates and non-graduates (Integer) |


*Attribution:  FiveThirtyEight.com*

We can view a snippet of the data by first importing it directly from the url below[link text](https://).

**Data**

In [3]:
file_path = "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/grad-students.csv"
df = pd.read_csv(file_path)


Next, we can display the data by typing the name of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [4]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)
df

Unnamed: 0,Major_code,Major,Major_category,Grad_total,Grad_sample_size,Grad_employed,Grad_full_time_year_round,Grad_unemployed,Grad_unemployment_rate,Grad_median,Grad_P25,Grad_P75,Nongrad_total,Nongrad_employed,Nongrad_full_time_year_round,Nongrad_unemployed,Nongrad_unemployment_rate,Nongrad_median,Nongrad_P25,Nongrad_P75,Grad_share,Grad_premium
0,5601,CONSTRUCTION SERVICES,Industrial Arts & Consumer Services,9173,200,7098,6511,681,0.087543,75000.0,53000,110000.0,86062,73607,62435,3928,0.050661,65000.0,47000,98000.0,0.096320,0.153846
1,6004,COMMERCIAL ART AND GRAPHIC DESIGN,Arts,53864,882,40492,29553,2482,0.057756,60000.0,40000,89000.0,461977,347166,250596,25484,0.068386,48000.0,34000,71000.0,0.104420,0.250000
2,6211,HOSPITALITY MANAGEMENT,Business,24417,437,18368,14784,1465,0.073867,65000.0,45000,100000.0,179335,145597,113579,7409,0.048423,50000.0,35000,75000.0,0.119837,0.300000
3,2201,COSMETOLOGY SERVICES AND CULINARY ARTS,Industrial Arts & Consumer Services,5411,72,3590,2701,316,0.080901,47000.0,24500,85000.0,37575,29738,23249,1661,0.052900,41600.0,29000,60000.0,0.125878,0.129808
4,2001,COMMUNICATION TECHNOLOGIES,Computers & Mathematics,9109,171,7512,5622,466,0.058411,57000.0,40600,83700.0,53819,43163,34231,3389,0.072800,52000.0,36000,78000.0,0.144753,0.096154
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
168,5203,COUNSELING PSYCHOLOGY,Psychology & Social Work,51812,724,38468,28808,1420,0.035600,50000.0,36000,65000.0,16781,12377,8502,835,0.063200,40000.0,25000,50000.0,0.755354,0.250000
169,5202,CLINICAL PSYCHOLOGY,Psychology & Social Work,22716,355,16612,12022,782,0.044958,70000.0,47000,95000.0,6519,4368,3033,357,0.075556,46000.0,30000,70000.0,0.777014,0.521739
170,6106,HEALTH AND MEDICAL PREPARATORY PROGRAMS,Health,114971,1766,78132,58825,1732,0.021687,135000.0,70000,294000.0,26320,16221,12185,1012,0.058725,51000.0,35000,87000.0,0.813718,1.647059
171,2303,SCHOOL STUDENT COUNSELING,Education,19841,260,11313,8130,613,0.051400,56000.0,42000,70000.0,2232,1328,980,169,0.112892,42000.0,27000,51000.0,0.898881,0.333333


#**ASSIGNMENT 1 - Descriptive Statistics: Graphical and Numerical Summary**

**INSTRUCTIONS**

Use Python to analyze the data set and complete each of the following. As appropriate, copy the SPSS output and paste it in the correct part below. For problems that require a written response, type the answer below.

##**QUESTION 1**

Determine whether the three variables below are qualitative or quantitative. If they are quantitative, specify whether they are continuous or discrete.

| Variable         | Qual or Quant | Dis, Con, or Neither |
|------------------|---------------|----------------------|
| **Major**              | Qual or Quant  | Dis, Con, or Neither              |
| **Grad_unemployment_rate**           | Qual or Quant   | Dis, Con, or Neither              |
| **Grad_Median**           | Qual or Quant  | Dis, Con, or Neither           |


##**QUESTION 2**

Construct a frequency table, relative frequency table, and relative frequency bar chart to describe the distribution of Major_category. State any fact that jumps out to you.

In [5]:
#Frequency table
freq_table = pd.value_counts(df['Major_category'])
freq_table

Engineering                            29
Education                              16
Humanities & Liberal Arts              15
Biology & Life Science                 14
Business                               13
Health                                 12
Computers & Mathematics                11
Agriculture & Natural Resources        10
Physical Sciences                      10
Social Science                          9
Psychology & Social Work                9
Arts                                    8
Industrial Arts & Consumer Services     7
Law & Public Policy                     5
Communications & Journalism             4
Interdisciplinary                       1
Name: Major_category, dtype: int64

In [6]:
#Relative frequency table
freq_table/len(df)

Engineering                            0.167630
Education                              0.092486
Humanities & Liberal Arts              0.086705
Biology & Life Science                 0.080925
Business                               0.075145
Health                                 0.069364
Computers & Mathematics                0.063584
Agriculture & Natural Resources        0.057803
Physical Sciences                      0.057803
Social Science                         0.052023
Psychology & Social Work               0.052023
Arts                                   0.046243
Industrial Arts & Consumer Services    0.040462
Law & Public Policy                    0.028902
Communications & Journalism            0.023121
Interdisciplinary                      0.005780
Name: Major_category, dtype: float64

In [12]:
dfrf

Unnamed: 0,Major_category
Engineering,29
Education,16
Humanities & Liberal Arts,15
Biology & Life Science,14
Business,13
Health,12
Computers & Mathematics,11
Agriculture & Natural Resources,10
Physical Sciences,10
Social Science,9


In [17]:
dfrf = pd.DataFrame(freq_table)
fig = px.bar(x=dfrf.index, y=dfrf['Major_category'], barmode='group',
             title='Frequency Distribution Bar Chart')
fig.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

In [18]:
!pip install --upgrade nbformat



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


## **QUESTIONS 3-6**

For questions 3-6: Find your variable based on your last name and use that variable when answering questions #3 to #6.  

| Last Name | Variable                  |
|-----------|---------------------------|
| A-F       | Grad_unemployment_rate    |
| G-M       | Grad_median               |
| N-S       | Nongrad_unemployment_rate |
| T-Z       | Nongrad_median            |


###**QUESTION 3**

Construct a histogram for your variable. Use Number of Intervals = 12.

In [None]:
fig = px.histogram(x=df['Grad_unemployment_rate'],nbins = 12)
fig.show()

###**QUESTION 4**

Construct a boxplot for your variable.  

In [None]:
px.box(x=df['Grad_unemployment_rate'])

###**QUESTION 5**

Calculate the following summary statistics for your variable: minimum, maximum, mean, median, standard deviation, Q1, and Q3. Paste the output below.

###**QUESTION 6**

Use information from questions #3, #4, and #5 to describe your variable in terms of shape, center, spread, and outliers. Interpret your findings.

##**QUESTION 7**

Calculate and state the unemployment rate for graduates, the median salary for graduates, the unemployment rate for non-graduates, and the median salary for non-graduates for your major or intended major. Compare the results.

##**QUESTION 8**

Generate a paragraph of at least 100 words to address one of the following questions:

### **QUESTION 8a**

Discuss how analyzing your chosen data set using statistical methods could help you become better prepared for future courses in your major?

### **QUESTION 8b**

Discuss how analyzing your chosen data set using statistical methods could be instrumental in becoming better prepared for your future career?