### Assignment

# Demographic Data Analyzer

In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. Here is a sample of what the data looks like:

|    |   age | workclass        |   fnlwgt | education   |   education-num | marital-status     | occupation        | relationship   | race   | sex    |   capital-gain |   capital-loss |   hours-per-week | native-country   | salary   |
|---:|------:|:-----------------|---------:|:------------|----------------:|:-------------------|:------------------|:---------------|:-------|:-------|---------------:|---------------:|-----------------:|:-----------------|:---------|
|  0 |    39 | State-gov        |    77516 | Bachelors   |              13 | Never-married      | Adm-clerical      | Not-in-family  | White  | Male   |           2174 |              0 |               40 | United-States    | <=50K    |
|  1 |    50 | Self-emp-not-inc |    83311 | Bachelors   |              13 | Married-civ-spouse | Exec-managerial   | Husband        | White  | Male   |              0 |              0 |               13 | United-States    | <=50K    |
|  2 |    38 | Private          |   215646 | HS-grad     |               9 | Divorced           | Handlers-cleaners | Not-in-family  | White  | Male   |              0 |              0 |               40 | United-States    | <=50K    |
|  3 |    53 | Private          |   234721 | 11th        |               7 | Married-civ-spouse | Handlers-cleaners | Husband        | Black  | Male   |              0 |              0 |               40 | United-States    | <=50K    |
|  4 |    28 | Private          |   338409 | Bachelors   |              13 | Married-civ-spouse | Prof-specialty    | Wife           | Black  | Female |              0 |              0 |               40 | Cuba             | <=50K    |


You must use Pandas to answer the following questions:
* How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (`race` column)
* What is the average age of men?
* What is the percentage of people who have a Bachelor's degree?
* What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
* What percentage of people without advanced education make more than 50K?
* What is the minimum number of hours a person works per week?
* What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
* What country has the highest percentage of people that earn >50K and what is that percentage?
* Identify the most popular occupation for those who earn >50K in India. 

Use the starter code in the file `demographic_data_analyzer`. Update the code so all variables set to "None" are set to the appropriate calculation or code. Round all decimals to the nearest tenth.

Unit tests are written for you under `test_module.py`.

### Development

For development, you can use `main.py` to test your functions. Click the "run" button and `main.py` will run.

### Testing 

We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button.

### Submitting

Copy your project's URL and submit it to freeCodeCamp.

### Dataset Source

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

In [2]:
import pandas as pd
df = pd.read_csv('adult.data.csv')
df.head(5)


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)

In [24]:
df.groupby(['race']).race.count().sort_values(ascending=False)


race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

What is the average age of men?

In [79]:
round(df[df['sex']=='Male'].age.mean(),1)

39.4

What is the percentage of people who have a Bachelor's degree?

In [6]:
round(df[df['education'] == 'Bachelors'].education.count()*100/df.education.count(), 1)

16

What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

In [81]:
round(df[(df.education.isin(('Bachelors', 'Masters','Doctorate'))) & (df['salary'] == '>50K')].age.count()*100/df.age.count(),1)

10.7

percentage with salary >50K

In [21]:
round(df[(df.education.isin(('Bachelors', 'Masters','Doctorate'))) & (df['salary'] == '>50K')].age.count()*100/df[df.education.isin(('Bachelors', 'Masters','Doctorate'))].age.count(),1)

46.5

What percentage of people without advanced education make more than 50K?


In [22]:
round(df[(~df.education.isin(('Bachelors', 'Masters','Doctorate'))) & (df['salary'] == '>50K')].age.count()*100/df[~df.education.isin(('Bachelors', 'Masters','Doctorate'))].age.count(),1)

17.4

In [13]:
round(df[(~df.education.isin(('Bachelors', 'Masters','Doctorate'))) & (df['salary'] == '>50K')].age.count()*100/df[df['salary'] == '>50K'].count(),1)

age               55.5
workclass         55.5
fnlwgt            55.5
education         55.5
education-num     55.5
marital-status    55.5
occupation        55.5
relationship      55.5
race              55.5
sex               55.5
capital-gain      55.5
capital-loss      55.5
hours-per-week    55.5
native-country    55.5
salary            55.5
dtype: float64

What is the minimum number of hours a person works per week?


In [38]:
df['hours-per-week'].min()

1

What percentage of the people who work the minimum number of hours per week

In [18]:
round(df[df['hours-per-week'] == df['hours-per-week'].min()].age.count()/df.age.count(), 5)

0.00061

What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?


In [19]:
round(df[df['hours-per-week'] == df['hours-per-week'].min() & (df['salary'] == '>50K')].age.count()*100/df[df['hours-per-week'] == df['hours-per-week'].min()].count(), 1)

age               10.0
workclass         10.0
fnlwgt            10.0
education         10.0
education-num     10.0
marital-status    10.0
occupation        10.0
relationship      10.0
race              10.0
sex               10.0
capital-gain      10.0
capital-loss      10.0
hours-per-week    10.0
native-country    10.0
salary            10.0
dtype: float64

What country has the highest percentage of people that earn >50K and what is that percentage?


In [70]:
cntry = pd.DataFrame()
cntry['ppl'] = df.groupby('native-country').age.count()
cntry['>50K'] = df[df['salary'] == '>50K'].groupby('native-country').salary.count()
cntry['percent'] = cntry['>50K']*100/cntry['ppl']
cntry.loc[cntry['percent'].idxmax()]

ppl        43.000000
>50K       18.000000
percent    41.860465
Name: Iran, dtype: float64

Identify the most popular occupation for those who earn >50K in India.

In [78]:
df[(df['native-country'] == 'India') & (df['salary'] == '>50K')].groupby('occupation').age.count().idxmax()

'Prof-specialty'