# US - Baby Names

### Introduction:

We are going to use a subset of [US Baby Names](https://www.kaggle.com/kaggle/us-baby-names) from Kaggle.  
In the file it will be names from 2004 until 2014


### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv). 

### Step 3. Assign it to a variable called baby_names.

In [2]:
baby_names = pd.read_csv('US_Baby_Names_right.csv')

### Step 4. See the first 10 entries

In [4]:
baby_names.head(10)

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41
5,11354,11355,Abigail,2004,F,AK,37
6,11355,11356,Olivia,2004,F,AK,33
7,11356,11357,Isabella,2004,F,AK,30
8,11357,11358,Alyssa,2004,F,AK,29
9,11358,11359,Sophia,2004,F,AK,28


### Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [16]:
baby_names.drop(columns=['Unnamed: 0','Id'],axis=1)

Unnamed: 0,Name,Year,Gender,State,Count
0,Emma,2004,F,AK,62
1,Madison,2004,F,AK,48
2,Hannah,2004,F,AK,46
3,Grace,2004,F,AK,44
4,Emily,2004,F,AK,41
...,...,...,...,...,...
1016390,Seth,2014,M,WY,5
1016391,Spencer,2014,M,WY,5
1016392,Tyce,2014,M,WY,5
1016393,Victor,2014,M,WY,5


### Step 6. Is there more male or female names in the dataset?

In [18]:
baby_names.Gender.value_counts()

F    558846
M    457549
Name: Gender, dtype: int64

### Step 7. Group the dataset by name and assign to names

In [19]:
names = baby_names.groupby('Name')

### Step 8. How many different names exist in the dataset?

In [20]:
len(names)

17632

### Step 9. What is the name with most occurrences?

In [27]:
baby_names.Name.value_counts().sort_values(ascending=False).head()

Riley     1112
Avery     1080
Jordan    1073
Peyton    1064
Hayden    1049
Name: Name, dtype: int64

### Step 10. How many different names have the least occurrences?

In [28]:
baby_names.Name.value_counts().sort_values(ascending=False).tail()

Zailynn      1
Zailee       1
Dominga      1
Mohamadou    1
Coalton      1
Name: Name, dtype: int64

### Step 11. What is the median name occurrence?

In [44]:
median_indx = baby_names.Name.value_counts().sort_values(ascending=False).median()

# baby_names.Name.value_counts().sort_values(ascending=False)[baby_names.Name.value_counts().sort_values(ascending=False) == int(median_indx)]

Anely      8
Camiya     8
Nechuma    8
Avree      8
Maysen     8
          ..
Kaliana    8
Nalleli    8
Jeilyn     8
Kaelee     8
Jakori     8
Name: Name, Length: 360, dtype: int64

### Step 12. What is the standard deviation of names?

In [45]:
names.std()

Unnamed: 0_level_0,Unnamed: 0,Id,Year,Count
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aaban,1.463004e+03,1.463004e+03,0.707107,0.000000
Aadan,2.181189e+06,2.181189e+06,2.872281,0.957427
Aadarsh,,,,
Aaden,1.615565e+06,1.615565e+06,2.044322,21.154974
Aadhav,,,,
...,...,...,...,...
Zyra,2.273476e+06,2.273476e+06,2.115701,1.154701
Zyrah,3.097243e+06,3.097243e+06,1.414214,0.707107
Zyren,,,,
Zyria,1.704179e+06,1.704179e+06,2.685351,0.737865


### Step 13. Get a summary with the mean, min, max, std and quartiles.

In [46]:
names.describe()

Unnamed: 0_level_0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Unnamed: 0,Id,Id,...,Year,Year,Count,Count,Count,Count,Count,Count,Count,Count
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Aaban,2.0,3.866900e+06,1.463004e+03,3865866.0,3866383.25,3866900.5,3867417.75,3867935.0,2.0,3.866902e+06,...,2013.75,2014.0,2.0,6.000000,0.000000,6.0,6.00,6.0,6.00,6.0
Aadan,4.0,1.789515e+06,2.181189e+06,691905.0,694243.50,702439.0,1797710.75,5061278.0,4.0,1.789516e+06,...,2010.25,2014.0,4.0,5.750000,0.957427,5.0,5.00,5.5,6.25,7.0
Aadarsh,1.0,1.728030e+06,,1728030.0,1728030.00,1728030.0,1728030.00,1728030.0,1.0,1.728031e+06,...,2009.00,2009.0,1.0,5.000000,,5.0,5.00,5.0,5.00,5.0
Aaden,196.0,2.831898e+06,1.615565e+06,147897.0,1364269.50,2867679.5,4155574.75,5618550.0,196.0,2.831899e+06,...,2011.00,2014.0,196.0,17.479592,21.154974,5.0,6.00,10.0,20.00,158.0
Aadhav,1.0,7.096060e+05,,709606.0,709606.00,709606.0,709606.00,709606.0,1.0,7.096070e+05,...,2014.00,2014.0,1.0,6.000000,,6.0,6.00,6.0,6.00,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zyra,7.0,2.505571e+06,2.273476e+06,556978.0,563486.00,1069863.0,4924247.00,4936691.0,7.0,2.505572e+06,...,2013.50,2014.0,7.0,6.000000,1.154701,5.0,5.00,6.0,6.50,8.0
Zyrah,2.0,2.743536e+06,3.097243e+06,553455.0,1648495.75,2743536.5,3838577.25,4933618.0,2.0,2.743538e+06,...,2012.50,2013.0,2.0,5.500000,0.707107,5.0,5.25,5.5,5.75,6.0
Zyren,1.0,5.074229e+06,,5074229.0,5074229.00,5074229.0,5074229.00,5074229.0,1.0,5.074230e+06,...,2013.00,2013.0,1.0,6.000000,,6.0,6.00,6.0,6.00,6.0
Zyria,10.0,2.978703e+06,1.704179e+06,1238175.0,1467783.50,2139380.0,4910147.50,4918965.0,10.0,2.978704e+06,...,2010.50,2014.0,10.0,5.900000,0.737865,5.0,5.25,6.0,6.00,7.0
