# US - Baby Names

### Introduction:

We are going to use a subset of [US Baby Names](https://www.kaggle.com/kaggle/us-baby-names) from Kaggle.  
In the file it will be names from 2004 until 2014


### Step 1. Import the necessary libraries

In [2]:
import pandas as pd

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv). 

### Step 3. Assign it to a variable called baby_names.

In [3]:
baby_names = pd.read_csv('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv')
baby_names.head()

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41


### Step 4. See the first 10 entries

In [4]:
baby_names.head(10)

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41
5,11354,11355,Abigail,2004,F,AK,37
6,11355,11356,Olivia,2004,F,AK,33
7,11356,11357,Isabella,2004,F,AK,30
8,11357,11358,Alyssa,2004,F,AK,29
9,11358,11359,Sophia,2004,F,AK,28


### Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [5]:
baby_names.drop(['Unnamed: 0', 'Id'], axis=1, inplace=True)
baby_names.head()

Unnamed: 0,Name,Year,Gender,State,Count
0,Emma,2004,F,AK,62
1,Madison,2004,F,AK,48
2,Hannah,2004,F,AK,46
3,Grace,2004,F,AK,44
4,Emily,2004,F,AK,41


### Step 6. Is there more male or female names in the dataset?

In [6]:
male_cnt = baby_names[baby_names['Gender'] == 'M']['Gender'].count()
female_cnt = baby_names[baby_names['Gender'] == 'F']['Gender'].count()
print(male_cnt, female_cnt)

457549 558846


### Step 7. Group the dataset by name and assign to names

In [7]:
names = baby_names.groupby(['Name'])

### Step 8. How many different names exist in the dataset?

In [9]:
names.nunique()

Unnamed: 0_level_0,Year,Gender,State,Count
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aaban,2,1,1,1
Aadan,3,1,2,3
Aadarsh,1,1,1,1
Aaden,9,1,45,43
Aadhav,1,1,1,1
...,...,...,...,...
Zyra,5,1,3,4
Zyrah,2,1,2,2
Zyren,1,1,1,1
Zyria,7,1,3,3


### Step 9. What is the name with most occurrences?

In [15]:
names.agg({'Count': 'sum'}).sort_values(by='Count',ascending=False)

Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Jacob,242874
Emma,214852
Michael,214405
Ethan,209277
Isabella,204798
...,...
Eniola,5
Atlantis,5
Marci,5
Simarpreet,5


### Step 10. How many different names have the least occurrences?

In [18]:
names.agg({'Count': 'sum'})[names.agg({'Count': 'sum'})['Count'] == 5]

Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Aadarsh,5
Aadin,5
Aaima,5
Aalaya,5
Aaminah,5
...,...
Zyien,5
Zyire,5
Zykeriah,5
Zykierra,5


### Step 11. What is the median name occurrence?

In [19]:
names.Count.median()

Name
Aaban       6.0
Aadan       5.5
Aadarsh     5.0
Aaden      10.0
Aadhav      6.0
           ... 
Zyra        6.0
Zyrah       5.5
Zyren       6.0
Zyria       6.0
Zyriah      6.0
Name: Count, Length: 17632, dtype: float64

### Step 12. What is the standard deviation of names?

In [11]:
names.std()

  names.std()


Unnamed: 0_level_0,Year,Count
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Aaban,0.707107,0.000000
Aadan,2.872281,0.957427
Aadarsh,,
Aaden,2.044322,21.154974
Aadhav,,
...,...,...
Zyra,2.115701,1.154701
Zyrah,1.414214,0.707107
Zyren,,
Zyria,2.685351,0.737865


### Step 13. Get a summary with the mean, min, max, std and quartiles.

In [13]:
names.describe()

Unnamed: 0_level_0,Year,Year,Year,Year,Year,Year,Year,Year,Count,Count,Count,Count,Count,Count,Count,Count
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Aaban,2.0,2013.500000,0.707107,2013.0,2013.25,2013.5,2013.75,2014.0,2.0,6.000000,0.000000,6.0,6.00,6.0,6.00,6.0
Aadan,4.0,2009.750000,2.872281,2008.0,2008.00,2008.5,2010.25,2014.0,4.0,5.750000,0.957427,5.0,5.00,5.5,6.25,7.0
Aadarsh,1.0,2009.000000,,2009.0,2009.00,2009.0,2009.00,2009.0,1.0,5.000000,,5.0,5.00,5.0,5.00,5.0
Aaden,196.0,2010.015306,2.044322,2005.0,2008.00,2010.0,2011.00,2014.0,196.0,17.479592,21.154974,5.0,6.00,10.0,20.00,158.0
Aadhav,1.0,2014.000000,,2014.0,2014.00,2014.0,2014.00,2014.0,1.0,6.000000,,6.0,6.00,6.0,6.00,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zyra,7.0,2012.142857,2.115701,2008.0,2011.50,2013.0,2013.50,2014.0,7.0,6.000000,1.154701,5.0,5.00,6.0,6.50,8.0
Zyrah,2.0,2012.000000,1.414214,2011.0,2011.50,2012.0,2012.50,2013.0,2.0,5.500000,0.707107,5.0,5.25,5.5,5.75,6.0
Zyren,1.0,2013.000000,,2013.0,2013.00,2013.0,2013.00,2013.0,1.0,6.000000,,6.0,6.00,6.0,6.00,6.0
Zyria,10.0,2008.900000,2.685351,2005.0,2007.25,2008.0,2010.50,2014.0,10.0,5.900000,0.737865,5.0,5.25,6.0,6.00,7.0
