
# Grand Questions:
1. How does your name at your birth year compare to its use historically?
2. If you talked to someone named Brittany on the phone, what is your guess of their age? What ages would you not guess?
3. Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names.
4. Think of a unique name from a famous movie. Plot that name and see how increases line up with the movie release.


In [4]:
# Imports
import altair as alt
import numpy as np
import pandas as pd

## 1. How does your name at your birth year compare to its use historically?

In [5]:
# Read in data
url = 'https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv'
df = pd.read_csv(url)

In [6]:
df.head()
df.year = pd.to_datetime(df.year, format='%Y')
df_agg = df.groupby('name')
df_agg.head()
my_name = df.query("name == 'Isabel'")
my_name.head()

Unnamed: 0,name,year,AK,AL,AR,AZ,CA,CO,CT,DC,...,TN,TX,UT,VA,VT,WA,WI,WV,WY,Total
168720,Isabel,1910-01-01,0.0,0.0,0.0,12.0,20.0,0.0,6.0,0.0,...,0.0,23.0,0.0,8.0,0.0,0.0,8.0,0.0,0.0,314.0
168721,Isabel,1911-01-01,0.0,0.0,5.0,0.0,20.0,7.0,6.0,0.0,...,0.0,24.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,376.0
168722,Isabel,1912-01-01,0.0,0.0,0.0,0.0,42.0,9.0,5.0,0.0,...,5.0,29.0,0.0,12.0,0.0,0.0,14.0,0.0,0.0,499.0
168723,Isabel,1913-01-01,0.0,7.0,0.0,11.0,35.0,6.0,11.0,0.0,...,0.0,62.0,0.0,10.0,0.0,0.0,23.0,0.0,0.0,607.0
168724,Isabel,1914-01-01,0.0,0.0,0.0,12.0,31.0,5.0,7.0,0.0,...,0.0,30.5,0.0,10.0,0.0,8.0,17.0,8.0,0.0,691.5


In [7]:
numOfIsabelByBirthYear = df.query('name == "Isabel" & year == 2001') # 3468 Isabel's in 2001
numOfIsabelByBirthYear.head()

Unnamed: 0,name,year,AK,AL,AR,AZ,CA,CO,CT,DC,...,TN,TX,UT,VA,VT,WA,WI,WV,WY,Total
168811,Isabel,2001-01-01,6.0,17.0,19.0,97.0,768.0,76.0,32.0,26.0,...,35.0,365.0,49.0,51.0,8.0,104.0,78.0,0.0,9.0,3486.0


In [8]:
isabel_chart = (alt.Chart(my_name, title="Popularity")
    .encode(
        alt.X('year(year):T', title = "Year"),
        alt.Y('Total:Q',title="Number of Names")
    )
    .mark_bar().properties(width=700,height=350)
)

In [9]:
my_year = pd.DataFrame({
    'year' : [2001],
    'Total' : [my_name.query("year == 2001").Total.values[0]],
    'label' : ["Birth Year"]})

In [10]:
my_year.Total = my_year.Total.astype("int64",copy=True)
my_year.year = pd.to_datetime(my_year.year,format='%Y')

In [11]:
text_overlay = (alt.Chart(my_year).mark_text(align='right',dy=-10,baseline='middle')
    .encode
    (
        x = alt.X('year'),
        y = alt.Y('Total:Q'),
        text = 'label'
    )
)

In [12]:
my_point = (alt.Chart(my_year).mark_circle(color = 'red')
    .encode
    (
        x = alt.X('year'                                        ),
        y = alt.Y('Total:Q')
    )
)

In [13]:
isabel_point = isabel_chart  + text_overlay + my_point
isabel_point.save("my_point.png")

The name Isabel has been used a total of 3486 times in the year 2001

<!-- ![](my_point.png) -->

## 2. If you talked to someone named Brittany on the phone, what is your guess of their age? What ages would you not guess?

## 3. Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names.

In [27]:
# Subset the data
mmpp = df.query('name in ["Mary","Martha","Peter","Paul"] & year > 1919 & year < 2005')

In [29]:
# Chart the subset
mmppChart = (alt.Chart(mmpp, title='Q3. Mary, Martha, Peter & Paul by Year')
              .encode(
                  alt.X('year(year):T', title = "Year"),
                  alt.Y('Total',title="Number of Names"),
                  color = 'name')
              .mark_line()).properties(width=800,height=450)
mmppChart.save('mmppChart.png')


![](mmpp_chart.png)

## 4. Think of a unique name from a famous movie. Plot that name and see how increases line up with the movie release.

In [47]:
chracterName = df.query('name == "Bella" & year >= 2007 & year <= 2020') 
chracterNameChart = (alt.Chart(chracterName)
                        .encode(
                             alt.X('year', title = "Year"),
                             alt.Y('Total', title="Number of Names"))
                        .mark_line()).properties(width=600,height=450, title="Q4. Bella Swan from Twilight")
chracterNameChart.save('bella_swan_chart.png') # Save chart

![](bella_chart.png)