In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. 
You must use Pandas to answer the following questions:

* How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
* What is the average age of men?
* What is the percentage of people who have a Bachelor's degree?
* What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
* What percentage of people without advanced education make more than 50K?
* What is the minimum number of hours a person works per week?
* What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
* What country has the highest percentage of people that earn >50K and what is that percentage?
* Identify the most popular occupation for those who earn >50K in India.

Use the starter code in the file demographic_data_analyzer. Update the code so all variables set to "None" are set to the appropriate calculation or code. Round all decimals to the nearest tenth.

In [3]:
import numpy as np
import pandas as pd

In [8]:
df = pd.read_csv('adult.data.csv')

In [17]:
sample_size = len(df)
print(sample_size)

32561


In [9]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [10]:
df.describe()

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week
count,32561.0,32561.0,32561.0,32561.0,32561.0,32561.0
mean,38.581647,189778.4,10.080679,1077.648844,87.30383,40.437456
std,13.640433,105550.0,2.57272,7385.292085,402.960219,12.347429
min,17.0,12285.0,1.0,0.0,0.0,1.0
25%,28.0,117827.0,9.0,0.0,0.0,40.0
50%,37.0,178356.0,10.0,0.0,0.0,40.0
75%,48.0,237051.0,12.0,0.0,0.0,45.0
max,90.0,1484705.0,16.0,99999.0,4356.0,99.0


How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)

In [50]:
df['race'].value_counts()

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

What is the average age of men?

In [74]:
round(sum(df['age'][df['sex'] == 'Male']) / sum(df['sex'] == 'Male'), 1)

39.4

What is the percentage of people who have a Bachelor's degree?

In [138]:
round(sum(df['education-num'] == 13) / sample_size * 100, 1)

16.4

What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

In [150]:
round(sum((df['education-num'] >= 13) & (df['salary'] == '>50K')) / sum(df['education-num'] >= 13) * 100, 1)

48.5

What percentage of people without advanced education make more than 50K?

In [68]:
round(sum((df['education-num'] < 13) & (df['salary'] == '>50K')) / sum(df['education-num'] < 13) * 100, 1)

16.1

What is the minimum number of hours a person works per week?

In [77]:
df['hours-per-week'].min()

1

What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

In [85]:
sum((df['hours-per-week'] == 1) & (df['salary'] == '>50K')) / sum((df['hours-per-week'] == 1)) * 100

10.0

What country has the highest percentage of people that earn >50K and what is that percentage?

In [142]:
res = (df['native-country'][df['salary'] == '>50K'].value_counts() / df['native-country'].value_counts()).sort_values(ascending=False)
print(round(res[0] * 100, 1))
print(res.index[0])

41.9
Iran


Identify the most popular occupation for those who earn >50K in India.

In [134]:
df['occupation'][(df['salary'] == '>50K') & (df['native-country'] == 'India')].value_counts().index[0]

'Prof-specialty'