# Demographic Data Analysis Project  
This project uses Pandas to analyze demographic data and answer specific questions regarding race, education, salary, and occupations.

* How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)  
* What is the average age of men?  
* What is the percentage of people who have a Bachelor's degree?  
* What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?  
* What percentage of people without advanced education make more than 50K?  
* What is the minimum number of hours a person works per week?  
* What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?  
* What country has the highest percentage of people that earn >50K and what is that percentage?  
* Identify the most popular occupation for those who earn >50K in India.  

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('adult-data.csv')
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       32561 non-null  object
 2   fnlwgt          32561 non-null  int64 
 3   education       32561 non-null  object
 4   education-num   32561 non-null  int64 
 5   marital-status  32561 non-null  object
 6   occupation      32561 non-null  object
 7   relationship    32561 non-null  object
 8   race            32561 non-null  object
 9   sex             32561 non-null  object
 10  capital-gain    32561 non-null  int64 
 11  capital-loss    32561 non-null  int64 
 12  hours-per-week  32561 non-null  int64 
 13  native-country  32561 non-null  object
 14  salary          32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)

In [5]:
race_count = df['race'].value_counts()
race_count

race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: count, dtype: int64

What is the average age of men?

In [6]:
mens_df = df[df['sex'] == 'Male']
average_age_men = round(mens_df['age'].mean(), 1)
average_age_men

np.float64(39.4)

What is the percentage of people who have a Bachelor's degree?

In [7]:
bachelors = df.loc[df['education'] == 'Bachelors']
percentage_bachelors = round((len(bachelors) / len(df)) * 100, 1)
percentage_bachelors

16.4

What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

In [8]:
higher_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]
higher_education.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
5,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
8,31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K


In [9]:
higher_education_rich = higher_education[higher_education['salary'] == '>50K']
higher_education_rich.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
8,31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
9,42,Private,159449,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,>50K
11,30,State-gov,141297,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
19,43,Self-emp-not-inc,292175,Masters,14,Divorced,Exec-managerial,Unmarried,White,Female,0,0,45,United-States,>50K
20,40,Private,193524,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,60,United-States,>50K


In [10]:
higher_education_rich = round((len(higher_education_rich) / len(higher_education)) * 100, 1)
higher_education_rich

46.5

What percentage of people without advanced education make more than 50K?

In [11]:
lower_education = df[~(df['education'].isin(['Bachelors', 'Masters', 'Doctorate']))]
lower_education.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
10,37,Private,280464,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,>50K


In [12]:
lower_education_rich = lower_education[lower_education['salary'] == '>50K']
lower_education_rich.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
10,37,Private,280464,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,>50K
14,40,Private,121772,Assoc-voc,11,Married-civ-spouse,Craft-repair,Husband,Asian-Pac-Islander,Male,0,0,40,?,>50K
27,54,?,180211,Some-college,10,Married-civ-spouse,?,Husband,Asian-Pac-Islander,Male,0,0,60,South,>50K
38,31,Private,84154,Some-college,10,Married-civ-spouse,Sales,Husband,White,Male,0,0,38,?,>50K


In [13]:
lower_education_rich = round((len(lower_education_rich) / len(lower_education)) * 100, 1)
lower_education_rich

17.4

What is the minimum number of hours a person works per week?

In [14]:
min_work_hours = df['hours-per-week'].min()
min_work_hours

np.int64(1)

What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

In [15]:
peo_min_hours = df[df['hours-per-week'] == min_work_hours]
peo_min_hours.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
189,58,State-gov,109567,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,1,United-States,>50K
1036,66,Self-emp-inc,150726,9th,5,Married-civ-spouse,Exec-managerial,Husband,White,Male,1409,0,1,?,<=50K
1262,69,?,195779,Assoc-voc,11,Widowed,?,Not-in-family,White,Female,0,0,1,United-States,<=50K
5590,78,?,363134,HS-grad,9,Widowed,?,Not-in-family,White,Female,0,0,1,United-States,<=50K
5632,45,?,189564,Masters,14,Married-civ-spouse,?,Wife,White,Female,0,0,1,United-States,<=50K


In [17]:
peo_min_hours_rich = peo_min_hours[peo_min_hours['salary'] == '>50K']
peo_min_hours_rich

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
189,58,State-gov,109567,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,1,United-States,>50K
20072,65,?,76043,HS-grad,9,Married-civ-spouse,?,Husband,White,Male,0,0,1,United-States,>50K


In [18]:
rich_percentage = round((len(peo_min_hours_rich) / len(peo_min_hours)) * 100, 1)
rich_percentage

10.0

What country has the highest percentage of people that earn >50K and what is that percentage?

In [19]:
country_salary = df.groupby('native-country')['salary'].value_counts(normalize=True).unstack()
country_salary

salary,<=50K,>50K
native-country,Unnamed: 1_level_1,Unnamed: 2_level_1
?,0.749571,0.250429
Cambodia,0.631579,0.368421
Canada,0.677686,0.322314
China,0.733333,0.266667
Columbia,0.966102,0.033898
Cuba,0.736842,0.263158
Dominican-Republic,0.971429,0.028571
Ecuador,0.857143,0.142857
El-Salvador,0.915094,0.084906
England,0.666667,0.333333


In [20]:
country_salary['>50K_percentage'] = country_salary['>50K'] * 100
country_salary.head()

salary,<=50K,>50K,>50K_percentage
native-country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,0.749571,0.250429,25.042882
Cambodia,0.631579,0.368421,36.842105
Canada,0.677686,0.322314,32.231405
China,0.733333,0.266667,26.666667
Columbia,0.966102,0.033898,3.389831


In [21]:
highest_earning_country = country_salary['>50K_percentage'].idxmax()
highest_earning_country

'Iran'

In [22]:
highest_earning_country_percentage = round(country_salary['>50K_percentage'].max(), 1)
highest_earning_country_percentage

np.float64(41.9)

Identify the most popular occupation for those who earn >50K in India.

In [23]:
rich_in_india = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')]
rich_in_india.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
11,30,State-gov,141297,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
968,48,Private,164966,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
1327,52,Private,168381,HS-grad,9,Widowed,Other-service,Unmarried,Asian-Pac-Islander,Female,0,0,40,India,>50K
7258,42,State-gov,102343,Prof-school,15,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,72,India,>50K
7285,54,State-gov,93449,Masters,14,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K


In [24]:
occup_of_rich_in_india = rich_in_india.groupby('occupation')['native-country'].count()
occup_of_rich_in_india

occupation
Adm-clerical         1
Exec-managerial      8
Other-service        2
Prof-specialty      25
Sales                1
Tech-support         2
Transport-moving     1
Name: native-country, dtype: int64

In [25]:
top_IN_occupation = occup_of_rich_in_india.idxmax()
top_IN_occupation

'Prof-specialty'

In [26]:
print("Number of each race:\n", race_count) 
print("Average age of men:", average_age_men)
print(f"Percentage with Bachelors degrees: {percentage_bachelors}%")
print(f"Percentage with higher education that earn >50K: {higher_education_rich}%")
print(f"Percentage without higher education that earn >50K: {lower_education_rich}%")
print(f"Min work time: {min_work_hours} hours/week")
print(f"Percentage of rich among those who work fewest hours: {rich_percentage}%")
print("Country with highest percentage of rich:", highest_earning_country)
print(f"Highest percentage of rich people in country: {highest_earning_country_percentage}%")
print("Top occupations in India:", top_IN_occupation)

Number of each race:
 race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: count, dtype: int64
Average age of men: 39.4
Percentage with Bachelors degrees: 16.4%
Percentage with higher education that earn >50K: 46.5%
Percentage without higher education that earn >50K: 17.4%
Min work time: 1 hours/week
Percentage of rich among those who work fewest hours: 10.0%
Country with highest percentage of rich: Iran
Highest percentage of rich people in country: 41.9%
Top occupations in India: Prof-specialty
