<h1>Salary of Different Majors in 2020<h1>

<img src="https://images.unsplash.com/photo-1607013407627-6ee814329547?ixlib=rb-1.2.1&ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&auto=format&fit=crop&w=1144&q=80" width="300" height="400">

University is very expensive. Some students may pursue a major based on the pay, while some students might pursue careers to make a difference. Does the major you pick pay you back? In 2020, payscale conducted a survey to find out the pay of different majors and also if the job felt like it had an impact in making the world a better place.


Description of the elements

Major: Major of the program

Degree: The type of degree the major possesses

Early-Career Pay: Median salary of an alumni with 0-5 years of experience

Mid-Career Pay: Median salary of an alumni with 10+ years of experience

% High Meaning: % of Alumni that say their work makes the world a better place

In [1]:
import pandas as pd #import pandas library
import matplotlib.pyplot as plt #import matplotlib library
import seaborn as sns #import seaborn library
import plotly.express as px #import plotly library


In [2]:
df = pd.read_csv('major_data.csv') #Save the csv file to variable

In [3]:
df.head() #Examine the first 5 elements

Unnamed: 0,Major,Degree,Early-Career Pay,Mid-Career Pay,% High Meaning
0,Petroleum Engineering,Bachelors,"$92,300","$182,000%",69%
1,Electrical Engineering & Computer Science (EECS),Bachelors,"$101,200","$152,300%",46%
2,Applied Economics and Management,Bachelors,"$60,900","$139,600%",67%
3,Operations Research,Bachelors,"$78,400","$139,600%",52%
4,Public Accounting,Bachelors,"$60,000","$138,800%",49%


In [4]:
print(f"There are {df.shape[0]} rows and {df.shape[1]} columns")

There are 834 rows and 5 columns


In [5]:
df.columns

Index(['Major', 'Degree', 'Early-Career Pay', 'Mid-Career Pay',
       '% High Meaning'],
      dtype='object')

Cleaning Data

In [6]:
df.isna().any() #Check for any missing data

Major               False
Degree              False
Early-Career Pay    False
Mid-Career Pay      False
% High Meaning      False
dtype: bool

In [7]:
df.duplicated().any() #Check for any duplicated values

False

In [8]:
df.info() #Check if the data type is correct

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 834 entries, 0 to 833
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Major             834 non-null    object
 1   Degree            834 non-null    object
 2   Early-Career Pay  834 non-null    object
 3   Mid-Career Pay    834 non-null    object
 4   % High Meaning    834 non-null    object
dtypes: object(5)
memory usage: 32.7+ KB


In [9]:
#Early-Career Pay, Mid-Career Pay, % High Meaning needs to be converted to numeric
df['Early-Career Pay'] = pd.to_numeric(df['Early-Career Pay'].astype(str).str.replace('$','').astype(str).str.replace(',',''))
df['Mid-Career Pay'] = pd.to_numeric(df['Mid-Career Pay'].astype(str).str.replace('$','').astype(str).str.replace(',','').astype(str).str.replace('%',''))
df['% High Meaning'] = pd.to_numeric(df['% High Meaning'].astype(str).str.replace('%','').astype(str).str.replace('-','0'))/100


In [10]:
df

Unnamed: 0,Major,Degree,Early-Career Pay,Mid-Career Pay,% High Meaning
0,Petroleum Engineering,Bachelors,92300,182000,0.69
1,Electrical Engineering & Computer Science (EECS),Bachelors,101200,152300,0.46
2,Applied Economics and Management,Bachelors,60900,139600,0.67
3,Operations Research,Bachelors,78400,139600,0.52
4,Public Accounting,Bachelors,60000,138800,0.49
...,...,...,...,...,...
829,Early Childhood Education,Bachelors,34100,43300,0.78
830,Mental Health,Bachelors,35200,42500,0.00
831,Medical Assisting,Bachelors,35100,42300,0.00
832,Addictions Counseling,Bachelors,38800,42200,0.00


<h1>Which jobs have the highest pay in their early-career?</h1>
    
The majors that paid the most when alumnis comes straight out of university are mainly engineering positions. These majors tend to focus on either natural resources or research in medicine.

In [25]:
top_10_early = df[['Major','Early-Career Pay']].sort_values('Early-Career Pay', ascending=False).head(10)
fig = px.bar(top_10_early, x=top_10_early['Major'], y=top_10_early['Early-Career Pay'], 
             title='Top 10 Early Career Pay' , color=top_10_early['Major'])
fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Pay')
fig.show()

<h1>Which jobs have the highest pay in their mid-career?<h1>
 

In [12]:
top_10_early = df[['Major','Mid-Career Pay']].sort_values('Mid-Career Pay', ascending=False).head(10)
fig = px.bar(top_10_early, x=top_10_early['Major'], y=top_10_early['Mid-Career Pay'], 
             title='Top 10 Mid Career Pay', color=top_10_early['Major'])
fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Pay')
fig.show()

<h1>Which majors have the least amount of pay when starting their early career?</h1>
The majors that have the least amount of pay are jobs thats require communication with patients or interaction of youths. Perhaps the reason is that some of the majors requires additional education, or experience to increase value to their pay.

In [26]:
bottom_10_early = df[['Major','Early-Career Pay']].sort_values('Early-Career Pay', ascending=True).head(10)

fig = px.bar(bottom_10_early, x=bottom_10_early['Major'], y=bottom_10_early['Early-Career Pay'], 
             title='Least Amount of Early Career Pay', color=bottom_10_early['Major'])
fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Pay')
fig.show()

In [32]:
df[['Major', 'Early-Career Pay', 'Mid-Career Pay']].sort_values('Early-Career Pay', ascending=True).head(10)

Unnamed: 0,Major,Early-Career Pay,Mid-Career Pay
707,Developmental Psychology,31000,62000
571,Painting & Printmaking,32800,71100
814,Voice & Opera,32900,50800
829,Early Childhood Education,34100,43300
828,Child & Family Studies,34100,43600
785,Rehabilitation Services,34100,55700
815,Elementary Teaching,34400,50700
823,Educational Psychology,34600,47600
805,Psychology & Human Services,35000,53300
831,Medical Assisting,35100,42300


<h1>Least amount paid mid-career</h1>
The majors with the least amount pay during mid-careers are metalsmithing and majors that assist in counseling. Although compared to the previous question in which majors in early-careers make the least. Majors in psychology are no longer in this position.

In [17]:

bottom_10_mid = df[['Major','Mid-Career Pay']].sort_values('Mid-Career Pay', ascending=True).head(10)

fig = px.bar(bottom_10_mid, x=bottom_10_mid['Major'], y=bottom_10_mid['Mid-Career Pay'], 
             title='Bottom 10 Mid Career Pay', color=bottom_10_mid['Major'])
fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Pay')
fig.show()

<h1>Which majors have a high satisfication? </h1>
The majors that have the highest satification are mainly within the medical field.

In [58]:

top_10_meaning = df[['Major','% High Meaning']].sort_values('% High Meaning', ascending=False).head(10)

fig = px.bar(top_10_meaning, x=top_10_meaning['Major'], y=top_10_meaning['% High Meaning'], 
             title='Top 10 Highly Satisfying Majors',color=top_10_meaning['Major'])
fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Satisfaction')
fig.show()


<h1>Which jobs have the least satisfication? </h1>
Since 

In [62]:
bottom_10_meaning = df[['Major','% High Meaning']].sort_values('% High Meaning', ascending=True).head(10)

fig = px.bar(bottom_10_meaning, x=bottom_10_meaning['Major'], y=bottom_10_meaning['% High Meaning'], 
             title='Least Satisfying Major')

fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Satisfication')
fig.show()


Since there are a lot of majors where there is a 0% High Meaning. Perhaps the survey was not conducted by payscale. As there are many majors that have a 0% satisfaction rating.

In [20]:
df[df['% High Meaning']==0]

Unnamed: 0,Major,Degree,Early-Career Pay,Mid-Career Pay,% High Meaning
12,Aerospace Studies,Bachelors,50300,130300,0.0
38,Instrumentation & Control Engineering,Bachelors,69000,120500,0.0
47,Bioscience,Bachelors,48400,118300,0.0
57,Materials Science,Bachelors,62700,116200,0.0
96,Automotive Engineering,Bachelors,60500,108700,0.0
123,Robotics & Automation,Bachelors,66500,104100,0.0
155,Organic Chemistry,Bachelors,46000,100700,0.0
156,Business Operations,Bachelors,60500,100600,0.0
157,Business Computer Science,Bachelors,50200,100500,0.0
159,Safety Science,Bachelors,58800,100300,0.0


Since we have the early career pay and mid career pay, we can analyze the spread of the pay and determine which majors have a wide or narrow range.

In [21]:
df['Spread'] = df['Mid-Career Pay'] - df['Early-Career Pay']

In [22]:
df

Unnamed: 0,Major,Degree,Early-Career Pay,Mid-Career Pay,% High Meaning,Spread
0,Petroleum Engineering,Bachelors,92300,182000,0.69,89700
1,Electrical Engineering & Computer Science (EECS),Bachelors,101200,152300,0.46,51100
2,Applied Economics and Management,Bachelors,60900,139600,0.67,78700
3,Operations Research,Bachelors,78400,139600,0.52,61200
4,Public Accounting,Bachelors,60000,138800,0.49,78800
...,...,...,...,...,...,...
829,Early Childhood Education,Bachelors,34100,43300,0.78,9200
830,Mental Health,Bachelors,35200,42500,0.00,7300
831,Medical Assisting,Bachelors,35100,42300,0.00,7200
832,Addictions Counseling,Bachelors,38800,42200,0.00,3400


<h1>Which majors have the widest spread?</h1>

In [67]:
wide_spread = df[['Major','Spread']].sort_values('Spread', ascending=False).head(10)
wide_spread

Unnamed: 0,Major,Spread
0,Petroleum Engineering,89700
12,Aerospace Studies,80000
4,Public Accounting,78800
2,Applied Economics and Management,78700
27,Building Science,71400
47,Bioscience,69900
30,Foreign Affairs,69600
11,Actuarial Mathematics,68600
6,Quantitative Business Analysis,68300
7,Pharmacy,66900


In [70]:
wide_10_spread = df[['Major','Spread']].sort_values('Spread', ascending=False).head(10)

fig = px.bar(wide_10_spread, x=wide_10_spread['Major'], y=wide_10_spread['Spread'], 
             title='Majors with the widest spreads', color=wide_10_spread['Major'])

fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Spread Pricing')
fig.show()

<h1>Which majors have the narrowest spread?</h1>
The majors with the least amount of spread is metal smithing, followed by careers that focus on counselling on rehabilitation.

In [24]:
df.tail()

Unnamed: 0,Major,Degree,Early-Career Pay,Mid-Career Pay,% High Meaning,Spread
829,Early Childhood Education,Bachelors,34100,43300,0.78,9200
830,Mental Health,Bachelors,35200,42500,0.0,7300
831,Medical Assisting,Bachelors,35100,42300,0.0,7200
832,Addictions Counseling,Bachelors,38800,42200,0.0,3400
833,Metalsmithing,Bachelors,38300,38400,0.32,100


In [73]:
narrow_10_spread = df[['Major','Spread']].sort_values('Spread', ascending=True).head(10)

fig = px.bar(narrow_10_spread, x=narrow_10_spread['Major'], y=narrow_10_spread['Spread'], 
             title='Majors with the narrowest spreads', color=narrow_10_spread['Major'])

fig.update_layout(xaxis_title='Major', 
                          yaxis_title='Spread Pricing')
fig.show()