# Top Baby Names by U.S. State
Codecademy Portfolio Project by Leah Fulmer ([Github](https://github.com/leahmfulmer))<br>
With gratitude to Mitchell Newberry ([GitHub](https://github.com/mnewberry))

#### Preamble:

Over the past couple years, I have become obsessed with peoples' names. To me, names are both **gifts** that we take with us for our entire lives (or for however long we want them) and **prophecies** that reveal our inherent gifts and/or life lessons. Because of this, I've wrestled with my own name. <br>

Leah in the Bible is deeply unloved by everyone in her life, especially her husband. Through this hardship, Leah learns to put her faith in God alone because He loves her in ways that her community, especially her husband, cannot. Leah's story teaches me to give my heart to God completely and reminds me of the gratitude I feel for the deeply loving relationships in my life. I hope to treat the names in this project as **sacred intentions** for our relationships with God.

For the purposes of this project, I'm going to focus on trends in relationship to Bible names. 

*This Jupyter Notebook prepares (e.g., wrangles, combines) the data for usage in Tableau Public.*

#### Project Objectives:

* Identify, clean, and analyze any dataset of interest.
* Combine multiple datasets for meaningful understanding.
* Present interactive visual analysis through [Tableau Dashboard](https://public.tableau.com/app/profile/leahmfulmer/viz/BiblicalBabyNames_17144097669840/BiblicalBabyNames).

#### Table of Contents :
[Section 1: Loading and Examining the Data](#data)<br>
[Section 2: Wrangling and Tidying the Data](#tidy)<br>
[Section 3: Include Biblical Name Meanings](#meanings)<br>
[Section 4: Questions About the Data](#questions)<br>
[Section 5: Playing with Groups, Pivot Tables](#play)<br>
[Section 6: Conclusions](#conclusions)<br>

### Section 1: Loading and Examining the Data <a id="data"></a>

In [15]:
# Import modules

import pandas as pd

In [16]:
# Load and examine top_names_by_state

top_names_by_state = pd.read_csv("TopBabyNamesbyState.csv")
top_names_by_state.head()

Unnamed: 0,State,Gender,Year,Top Name,Occurences
0,AK,F,1910,Mary,14
1,AK,F,1911,Mary,12
2,AK,F,1912,Mary,9
3,AK,F,1913,Mary,21
4,AK,F,1914,Mary,22


In [17]:
# Load and examine bible_names

bible_names = pd.read_csv("bible-names.txt")
bible_names_list = bible_names.Name.tolist()

### Section 2: Wrangling and Tidying the Data<a id="tidy"></a>

In [18]:
# Rename columns and identify Bible names within top_names_by_state

top_names = top_names_by_state.rename(columns={"Top Name": "TopName"})
top_names["Bible"] = top_names.apply(lambda row: True if row.TopName in bible_names_list else False, axis=1)
# top_names.to_csv("top_names_bible.csv")
top_names.head()

Unnamed: 0,State,Gender,Year,TopName,Occurences,Bible
0,AK,F,1910,Mary,14,True
1,AK,F,1911,Mary,12,True
2,AK,F,1912,Mary,9,True
3,AK,F,1913,Mary,21,True
4,AK,F,1914,Mary,22,True


In [19]:
# Isolate Bible names
# Note: This step removes nicknames based off of Biblical names
#       (e.g., "Eliza" is removed although derived from "Elizabeth").

bible_only = top_names[top_names.Bible == True]
bible_only.head(100)
bible_only.to_csv("popular_bible_names.csv")

### Section 3: Include Biblical Name Meanings<a id="meanings"></a>

In [20]:
# Identify unique bible names among top names

unique_bible_names_list = bible_only.TopName.unique()
dict = {"TopName": unique_bible_names_list}
unique_bible_names = pd.DataFrame(dict)

# Insert name meanings, summarized from Google
unique_bible_names["Meaning"] = ["beloved", \
       "favor", "God is gracious", \
       "closely follows God", "beloved", \
       "who is like God?", "healer",\
        "closely follows God", "perennial, enduring",\
        "God is my salvation", "increaser",\
        "God is my judge" , "gift of God", "bee", "princess",\
        "life, lively", "green herb, tender shoot",\
        "comfort, rest", "of the plain",
        "God has heard", "the Lord is my God",
        "son of the right hand", "praised",\
        "manly, brave", "God is my salvation", "friend"]

# Write to file
# Note: The subscript "updated" comes from after choosing more succint descriptions.
unique_bible_names.to_csv("name_meanings_updated.csv")

### Section 4: Questions About the Data<a id="questions"></a>

#### What regions does the study cover?

In [7]:
unique_regions = top_names.State.unique()
print("The data covers a total of {} regions: 50 states, \
plus the District of Columbia.".format(len(unique_regions)))

The data covers a total of 51 regions: 50 states, plus the District of Columbia.


#### How many unique names are there? How many of those names are Bible names?

In [8]:
unique_names = top_names.TopName.unique()

unique_bible_names = []
unique_nonbible_names = []

for name in unique_names:
    if name in bible_names_list:
        unique_bible_names.append(name)
    else:
        unique_nonbible_names.append(name)
        
print("There are {} unique names in this study. {} of them are Bible names, {} are not in the Bible."\
      .format(len(unique_names), len(unique_bible_names), len(unique_nonbible_names)))


There are 95 unique names in this study. 26 of them are Bible names, 69 are not in the Bible.


#### What are the most popular names throughout the study? Are they Bible names?

In [9]:
name_counts = top_names.TopName.value_counts().reset_index().rename(columns={"index": "Name", "Top": "Count"})
name_counts["Bible"] = name_counts.apply(lambda row: True if row.Name in bible_names_list else False, axis=1)
name_counts.to_csv("total_name_counts.csv")
name_counts.head(10)

Unnamed: 0,Name,TopName,Bible
0,Mary,1935,True
1,Michael,1404,True
2,James,899,True
3,Robert,885,False
4,Jennifer,677,False
5,John,535,True
6,Lisa,360,False
7,Jacob,359,True
8,Ashley,322,False
9,Linda,300,False


**Analysis:** <br>

According to the table above, half of the top ten names in the U.S. (1910-2012) are Bible names. However, this analysis does not account for names that have biblical origins, although not in the Bible themselves. "Lisa", although not in the Bible itself, is likely derived from "Elizabeth", which is a Bible name.

### Section 5: Playing with Groups, Pivot Tables<a id="play"></a>

In [10]:
# Separate names by gender

# Identify separate datasets
girls = top_names[top_names.Gender == "F"]
boys = top_names[top_names.Gender == "M"]

# Write to file
girls.to_csv("girls.csv")
boys.to_csv("boys.csv")

# Examine
boys.head()

Unnamed: 0,State,Gender,Year,TopName,Occurences,Bible
103,AK,M,1910,John,8,True
104,AK,M,1911,John,15,True
105,AK,M,1912,John,16,True
106,AK,M,1913,John,19,True
107,AK,M,1914,John,17,True


In [11]:
# Group names by gender and year

girls_by_year = girls.groupby(["TopName", "Year"]).sum("Occurences").reset_index()
girls_by_year["Bible"] = girls_by_year.apply(lambda row: True if row.Bible > 0 else False, axis=1)
girls_by_year.to_csv("girls_by_year.csv")

boys_by_year = boys.groupby(["TopName", "Year"]).sum("Occurences").reset_index()
boys_by_year["Bible"] = boys_by_year.apply(lambda row: True if row.Bible > 0 else False, axis=1)
boys_by_year.to_csv("boys_by_year.csv")

# Test
test = boys_by_year[boys_by_year.TopName == "James"]
test.head()

Unnamed: 0,TopName,Year,Occurences,Bible
141,James,1910,4631,True
142,James,1911,4888,True
143,James,1912,7990,True
144,James,1913,9114,True
145,James,1914,13329,True


In [12]:
# Make a pivot table to show individual name changes over time

over_time = pd.pivot_table(top_names, columns='Year', index="TopName", values='Occurences', aggfunc='sum', fill_value=0).reset_index()
over_time["Bible"] = over_time.apply(lambda row: True if row.TopName in bible_names_list else False, axis=1)
over_time.to_csv("individual_names_over_time.csv")
over_time.head(10)

Year,TopName,1910,1911,1912,1913,1914,1915,1916,1917,1918,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,Bible
0,Addison,0,0,0,0,0,0,0,0,0,...,0,0,0,396,0,128,0,0,0,False
1,Aiden,0,0,0,0,0,0,0,0,0,...,0,0,0,58,0,140,0,0,0,False
2,Alexander,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1341,2283,914,1133,181,False
3,Alexis,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,False
4,Alyssa,0,0,0,0,0,0,0,0,0,...,112,117,0,0,0,0,0,0,0,False
5,Amanda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,False
6,Andrew,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,True
7,Angel,0,0,0,0,0,0,0,0,0,...,0,618,642,655,0,0,0,0,0,False
8,Angela,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,False
9,Anthony,0,0,0,0,0,0,0,0,0,...,252,223,277,1685,824,302,0,192,0,False


In [13]:
# Examine all data for a selected name

print(top_names[top_names.TopName == 'Addison'])

     State Gender  Year  TopName  Occurences  Bible
3393    KS      F  2007  Addison         238  False
6071    NE      F  2007  Addison         158  False
6073    NE      F  2009  Addison         128  False


In [14]:
# Group by name and year

by_year = top_names.groupby(["TopName", "Year"]).sum("Occurences").reset_index()
by_year["Bible"] = by_year.apply(lambda row: True if row.Bible > 0 else False, axis=1)
by_year.to_csv("individual_names_by_year.csv")

# Test
test = by_year[by_year.TopName == "Addison"]
test.head()

Unnamed: 0,TopName,Year,Occurences,Bible
0,Addison,2007,396,False
1,Addison,2009,128,False


### Section 6: Conclusions<a id="conclusions"></a>

* **The dataset is informative, but limited.** Only the most popular name per state per year is reported. Therefore, while one can easily report how many babies were named Mary in Arkansas in 1920, it is impossible to report how many babies were named Mary throughout the 1920s. Even if Mary were the second-most popular name in a particular state for a particular year, the number of occurrences is not reported. This can skew the analysis of a name's popularity over time by reporting zero occurrences when there may have been many -- *but not the most* -- occurrences of that name in reality.<br><br>
* It's time to take these data to Tableau Public for some [*interactive visualizations!*](https://public.tableau.com/app/profile/leahmfulmer/viz/BiblicalBabyNames_17144097669840/BiblicalBabyNames)