### Analyzing Missing Values in a Dataset of US Legislators
#### We have a dataset of US legislators categorized into Republicans (REP) and Senators (SEN). There are missing values in the dataset, and we want to identify which features have missing values for each 'type' category. We'll use pandas .groupby() to group the data by 'type' and then calculate missing values for each feature within each group.

In [51]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [28]:
df = pd.read_csv('./datasets/legislators-historical.csv',
                 usecols=['gender','last_name','first_name', 'birthday','type','state','party']
            )
df.head(4)

Unnamed: 0,last_name,first_name,birthday,gender,type,state,party
0,Bassett,Richard,1745-04-02,M,sen,DE,Anti-Administration
1,Bland,Theodorick,1742-03-21,M,rep,VA,
2,Burke,Aedanus,1743-06-16,M,rep,SC,
3,Carroll,Daniel,1730-07-22,M,rep,MD,


#### We have 10,146 rows categorized as Republican (rep) and 1,829 rows categorized as Senator (sen). Among these, some rows contain missing entries for other features.

In [42]:
# Group by type
group_by_type = df.groupby('type')

# bird eyview for missing values
print(df.type.value_counts(),'\n\n')
group_by_type.count()

type
rep    10146
sen     1829
Name: count, dtype: int64 




Unnamed: 0_level_0,last_name,first_name,birthday,gender,state,party
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
rep,10146,10146,9653,10146,10146,9920
sen,1829,1829,1769,1829,1829,1821


#### Counting the missing values

In [53]:
missing_values_by_type = group_by_type.apply(lambda x: x.isnull().sum())
missing_values_by_type

Unnamed: 0_level_0,last_name,first_name,birthday,gender,type,state,party
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
rep,0,0,493,0,0,0,226
sen,0,0,60,0,0,0,8
