 ##   Demographic Data Analyzer

> ### Introduction: 


In this challenge I analyzed demographic data using Pandas. I am given a dataset of demographic data that was extracted from the 1994 Census database.

I used Pandas to answer the following questions:

How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
What is the average age of men?
What is the percentage of people who have a Bachelor's degree?
What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
What percentage of people without advanced education make more than 50K?
What is the minimum number of hours a person works per week?
What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
What country has the highest percentage of people that earn >50K and what is that percentage?
Identify the most popular occupation for those who earn >50K in India.

In [7]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
 # Read data from file
df = pd.read_csv('adult.data.csv') 

In [3]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


#### How many of each race are represented in this dataset? 

In [18]:
race_count = df['race'].value_counts()
race_count




White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

#### What is the average age of men?

In [37]:
 average_age_men = df[df['sex']=='Male']['age'].mean()
round(average_age_men, 1)  

39.4

 #### What is the percentage of people who have a Bachelor's degree?

In [35]:
num_of_bach = len(df[df["education"]=='Bachelors']) 
total_num = len(df) 
percentage_bachelors = (num_of_bach/total_num) * 100
percentage_bachelors

16.44605509658794

 #### What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
 
#### What percentage of people without advanced education make more than 50K?

 

In [42]:
#with and without `Bachelors`, `Masters`, or `Doctorate`
higher_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]
lower_education = df[~df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]



Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
10,37,Private,280464,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,>50K


In [48]:
# percentage with salary >50K
non_percent_higher= len(higher_education[higher_education['salary'] == '>50K'])
non_percent_lower= len(lower_education[lower_education['salary'] == '>50K'])

higher_education_rich = round((non_percent_higher/len(higher_education) * 100), 1)
lower_education_rich =  round((non_percent_lower/len(lower_education) * 100), 1)
higher_education_rich




46.5

In [50]:
lower_education_rich

17.4

 #### What is the minimum number of hours a person works per week (hours-per-week feature)?

In [52]:
min_work_hours =df['hours-per-week'].min()
min_work_hours
   


1

####   What percentage of the people who work the minimum number of hours per week have a salary of >50K?

In [58]:
num_min_workers = df[df['hours-per-week']==min_work_hours]

rich_percentage = len(num_min_workers[num_min_workers['salary'] == '>50K'])/len(num_min_workers) *100

rich_percentage

10.0

 #### What country has the highest percentage of people that earn >50K?

In [67]:
country_count = df['native-country'].value_counts()
country_rich_count =  df[df['salary'] == '>50K']['native-country'].value_counts()
highest_earning_country = (country_rich_count/country_count * 100).idxmax()

highest_earning_country_percentage =  (country_rich_count/country_count * 100).max()


In [68]:
highest_earning_country

'Iran'

In [69]:
highest_earning_country_percentage

41.86046511627907

#### Identify the most popular occupation for those who earn >50K in India.

In [79]:
people_in_india = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')]
top_IN_occupation = people_in_india['occupation'].value_counts().idxmax()
top_IN_occupation

'Prof-specialty'