I was looking at gender neutral names across states and years and found a couple of very strange things that are probably errors in the underlying data.

First is the case of Kentucky in 2004. There seem to be a lot of boys reported as girls and vice-versa.

Second is the case of DC in 1989-1990 where a lot of girls got written down as boys.

See plots below for details.

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import cm

In [None]:
ds = pd.read_csv('../input/StateNames.csv',index_col='Id')

## Kentucky 2004

Every name suddenly became gender neutral in Kentucky in 2004. There are boys named Elizabeth, Emily and Emma and girls named Joseph, Michael and Nicholas.

See plot below with boy and girl counts. The 2004 bar is highlighted.

In [None]:
ds = ds.set_index(['Name','Year','State','Gender']).unstack().fillna(0).astype(int)
ds.columns = ['CountF','CountM']
ds['CountTotal'] = ds.CountF + ds.CountM

In [None]:
df = ds.loc[pd.IndexSlice[:,list(range(2001,2010)),'KY'],:]\
       .groupby(level=0)\
       .filter(lambda x: x.CountTotal.max()>230)\
       .reset_index()

In [None]:
fig, ax = plt.subplots(12,4,sharex=True,figsize=(12,24))
ax = ax.flatten()
for (name, group), a in zip(df.groupby('Name'),ax):
    a.bar(group.Year-0.4, group.CountTotal, color='#ff8888',label='F')
    a.bar(group.Year-0.4, group.CountM, color='#8888ff',label='M')
    a.bar(2004-0.4, group.CountTotal.loc[group.Year==2004], color='#ff5555',label='F')
    a.bar(2004-0.4, group.CountM.loc[group.Year==2004], color='#5555ff',label='M')
    a.set_title(name)
fig.suptitle('Counts of boys and girls born in KY');

## DC 1989

It looks like a lot of girls were reported as boys in DC in 1989 and to a lesser extent in 1990.

In [None]:
df = ds.loc[pd.IndexSlice[:,list(range(1985,1995)),'DC'],:]\
       .groupby(level=0)\
       .filter(lambda x: x.CountTotal.max()>70)\
       .reset_index()

In [None]:
fig, ax = plt.subplots(12,4,sharex=True,figsize=(12,24))
ax = ax.flatten()
for (name, group), a in zip(df.groupby('Name'),ax):
    a.bar(group.Year-0.4, group.CountTotal, color='#ff8888',label='F')
    a.bar(group.Year-0.4, group.CountM, color='#8888ff',label='M')
    a.bar(1989-0.4, group.CountTotal.loc[group.Year==1989], color='#ff5555',label='F')
    a.bar(1989-0.4, group.CountM.loc[group.Year==1989], color='#5555ff',label='M')
    a.set_title(name)
fig.suptitle('Counts of boys and girls born in DC');

Total births in DC by gender. The plot below shows an unnatural peak of boys count and a dip in the count of girls in 1989.

In [None]:
ax = ds.loc[pd.IndexSlice[:,list(range(1970,2010)),'DC'],
       ['CountF','CountM','CountTotal']].groupby(level=1).sum().plot()
ax.set_title('Births in DC')
ax.vlines(1989,0,25000,linewidth=1,alpha=0.7);