# Exploring with Matplotlib

In [None]:
import pandas as pd
%pylab inline

Use the `titanic_new.csv` from the `data` folder to explore data about the Titanic passengers and make a simple (?) chart.


In [None]:
df = pd.read_csv('data/titanic_new.csv')

# first look at the data
df.head()

In [None]:
# how many rows, columns?
df.shape

In [None]:
# how many people in each class?
df.Pclass.value_counts()

In [None]:
# how many people of each gender?
df.Sex.value_counts()

In [None]:
# how many people survived or did not?
df.Survived.value_counts()

In [None]:
# get all the column names for easy copy/pasting
df.columns

In [None]:
# NOTES: I want Survived, Pclass, Sex. 
# I would like to to stack up M & F for each Pclass. 
# Y axis will be how many people.
# X axis will be Pclass. Column chart. Stacked bar. Stack will separate M & F.

# make a short list of just columns I want in chart
column_names  = ['Survived', 'Pclass', 'Sex']

# make a new dataframe with only those columns
df2 = df[column_names]

# check a sample to see how df2 looks - sample gets random rows
df2.sample(8)

In [None]:
# figured this out from https://stackoverflow.com/questions/23415500/pandas-plotting-a-stacked-bar-chart 

df2.groupby(['Pclass','Sex','Survived'])['Survived'].count().plot(kind='bar')


In [None]:
# a different way - note what happened to the order
df2.groupby(['Survived','Pclass','Sex'])['Survived'].count().plot(kind='bar')


Now I have figured out how to show the data I want: how many males, females, in each passenger class (1, 2 or 3), survived or not (1 or 0). But they are all separated into their own columns. I want to make them stack. How?

Note - can you recognize the tall column (3, male, 0)? It is males in 3rd class (3) who did not survive (0).

Next I'm going to make a simple chart showing ONLY survivors. Notice how I get **only the rows that have 1 in the Survived column** — and I store that as **a new dataframe** named `sur`.


In [None]:
# using df2, make a new dataframe that includes only survivors
sur = df2[df2.Survived == 1]

# using that new dataframe, make a new bar chart
sur.groupby(['Pclass','Sex'])['Survived'].count().plot.bar()

Note how the next chart shows the opposite of the chart above. Above: Survivors only. Below: Dead only. Note the difference in the vertical Y axis.

In [None]:
# using df2, make a new dataframe that includes only the dead
died = df2[df2.Survived == 0]

# using that new dataframe, make a new bar chart
died.groupby(['Pclass','Sex'])['Survived'].count().plot.bar()

Great — but still not a stacked bar chart like I wanted.

In [None]:
# back to something I had earlier - but without the chart
df2.groupby(['Pclass','Sex','Survived'])['Survived'].count()

In [None]:
# what if I save that as all_grps
all_grps = df2.groupby(['Pclass','Sex','Survived'])['Survived'].count()

# can I use the counts from that dataframe? Yes!
# this gives me the count for 3rd class, male, 0 (did not survive)
all_grps[3][2]

In [None]:
# this gives me the count for 1st class, female, 1 (survived)
all_grps[1][1]

That was interesting, but it did not seem very helpful. Oh, well.

On to a new attempt.

In [None]:
# my new attempt, after more Googling: 
# only people who died, using new dataframe "died" from earlier
died.groupby(['Pclass','Sex'])['Survived'].count().unstack().plot.bar(stacked=True)

In [None]:
# aha, stacks with male and female! Now try -
# only people who survived, using new dataframe "sur" from earlier
sur.groupby(['Pclass','Sex'])['Survived'].count().unstack().plot.bar(stacked=True)

I would *really* like to get the survivors and the dead into one chart. What I envision is 6 bars, showing stacked male and female, with separate bars for survived and died *in each class.* But I have not quite figured that out.

I hope this gives you a decent idea of how to *explore* the dataset with Pandas, and how to work out getting the chart you desire.
