CSC 505 Assignment 2

Author: Gregory Purvine

Github: gnpurvin

10/22/18

# Pandas Assignment

## Part 1

In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting

Hint! Find the highly rated movies which appeals to both genders 'M' and 'F'


In [160]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [161]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames, engine='python')
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames, engine='python')
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames, engine='python')

    return users, ratings, movies

In [162]:
users, ratings, movies = get_movie_data()

In [163]:
#print users.head()
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [164]:
#print ratings.head()
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [165]:
#print movies.head()
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [166]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [167]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [168]:
#print movies.head()
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


##### Join the tables with `pd.merge`

In [169]:
movies_rated = movies.merge(ratings, on='movie_id')
movies_rated.head()

Unnamed: 0,movie_id,title,genres,year,short_title,user_id,rating,timestamp
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,1,5,978824268
1,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,6,4,978237008
2,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,8,4,978233496
3,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,9,5,978225952
4,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,10,5,978226474


##### What's the highest rated movie?

In [170]:
movies_rated.groupby(['title', 'movie_id'])['rating'].sum().sort_values(ascending=False).head()

title                                                  movie_id
American Beauty (1999)                                 2858        14800
Star Wars: Episode IV - A New Hope (1977)              260         13321
Star Wars: Episode V - The Empire Strikes Back (1980)  1196        12836
Star Wars: Episode VI - Return of the Jedi (1983)      1210        11598
Saving Private Ryan (1998)                             2028        11507
Name: rating, dtype: int64

## The highest rated movie is American Beauty (1999).

###### What is a good rated movie for date night

- Hint - highly rated movie by 
    - both genders,
    - based on genre preferences,
    - age group can also be combined

In [171]:
movies_rated_users = movies_rated.merge(users, on='user_id')
movies_rated_users.head()

Unnamed: 0,movie_id,title,genres,year,short_title,user_id,rating,timestamp,gender,age,occupation,zip
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story,1,5,978824268,F,1,10,48067
1,48,Pocahontas (1995),Animation|Children's|Musical|Romance,1995,Pocahontas,1,5,978824351,F,1,10,48067
2,150,Apollo 13 (1995),Drama,1995,Apollo 13,1,5,978301777,F,1,10,48067
3,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,1977,Star Wars: Episode IV - A New Hope,1,4,978300760,F,1,10,48067
4,527,Schindler's List (1993),Drama|War,1993,Schindler's List,1,5,978824195,F,1,10,48067


In [172]:
mrumen = movies_rated_users[movies_rated_users['gender'] == 'M']
mruwomen = movies_rated_users[movies_rated_users['gender'] == 'F']

Highest rated movies by men

In [173]:
mrumen.groupby(['short_title'])['rating'].sum().sort_values(ascending=False).head(10)

short_title
American Beauty                                   10790
Star Wars: Episode IV - A New Hope                10537
Star Wars: Episode V - The Empire Strikes Back    10175
Saving Private Ryan                                9141
Star Wars: Episode VI - Return of the Jedi         9074
Matrix, The                                        9056
Terminator 2: Judgment Day                         9025
Raiders of the Lost Ark                            8779
Silence of the Lambs, The                          8203
Braveheart                                         8153
Name: rating, dtype: int64

Highest rated movies by women

In [174]:
mruwomen.groupby(['short_title'])['rating'].sum().sort_values(ascending=False).head(10)

short_title
American Beauty                                   4010
Shakespeare in Love                               3337
Silence of the Lambs, The                         3016
Sixth Sense, The                                  2973
Shawshank Redemption, The                         2846
Schindler's List                                  2806
Star Wars: Episode IV - A New Hope                2784
Fargo                                             2771
Princess Bride, The                               2762
Star Wars: Episode V - The Empire Strikes Back    2661
Name: rating, dtype: int64

In [175]:
len(users[users['gender'] == 'M'])

4331

In [176]:
len(users[users['gender'] == 'F'])

1709

American Beauty was the highest rated movie among both men and women when sorted by gender. 
It's worth noting that the men have a given a much higher sum of ratings than the women have. There are over twice as many men as women in this dataset, so the ratings are skewed to support what men prefer to watch. 

In [177]:
movies_rated_users.genres.value_counts().head(10)

Comedy                     116883
Drama                      111423
Comedy|Romance              42712
Comedy|Drama                42245
Drama|Romance               29170
Action|Thriller             26759
Horror                      22563
Drama|Thriller              18248
Thriller                    17851
Action|Adventure|Sci-Fi     17783
Name: genres, dtype: int64

Some of the genres with the most movies rated in this data set are:
Comedy,
Drama,
Romance,
Action,
Horror,
Thriller

Romance movies highest rated

In [178]:
romance = movies_rated_users[movies_rated_users['genres'].str.contains("Romance")]
romance.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                 genres                             
Star Wars: Episode VI - Return of the Jedi  Action|Adventure|Romance|Sci-Fi|War    11598
Princess Bride, The                         Action|Adventure|Comedy|Romance         9976
Shakespeare in Love                         Comedy|Romance                          9778
Groundhog Day                               Comedy|Romance                          9005
Forrest Gump                                Comedy|Romance|War                      8969
Name: rating, dtype: int64

Romance movies highest rated by men

In [179]:
romancemen = mrumen[mrumen['genres'].str.contains("Romance")]
romancemen.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                 genres                             
Star Wars: Episode VI - Return of the Jedi  Action|Adventure|Romance|Sci-Fi|War    9074
Princess Bride, The                         Action|Adventure|Comedy|Romance        7214
Groundhog Day                               Comedy|Romance                         6547
Shakespeare in Love                         Comedy|Romance                         6441
Forrest Gump                                Comedy|Romance|War                     6364
Name: rating, dtype: int64

Romance movies highest rated by women

In [180]:
romancewomen = mruwomen[mruwomen['genres'].str.contains("Romance")]
romancewomen.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                 genres                             
Shakespeare in Love                         Comedy|Romance                         3337
Princess Bride, The                         Action|Adventure|Comedy|Romance        2762
Forrest Gump                                Comedy|Romance|War                     2605
Star Wars: Episode VI - Return of the Jedi  Action|Adventure|Romance|Sci-Fi|War    2524
Groundhog Day                               Comedy|Romance                         2458
Name: rating, dtype: int64

If you and your date are looking specifically for a Romance movie, Star Wars: Episode VI - Return of the Jedi is the highest rated option. While highly rated by both men and women, it is much higher rated by men than it is women. It has the highest sum rating of all romance movies among men. However, among women, the leading movie is Shakespeare in Love. The Princess Bride is in second place for both men and women, making it a better middle ground movie for both genders. 

Highest rated comedy movies

In [181]:
comedy = movies_rated_users[movies_rated_users['genres'].str.contains("Comedy")]
comedy.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title          genres                         
American Beauty      Comedy|Drama                       14800
Back to the Future   Comedy|Sci-Fi                      10307
Princess Bride, The  Action|Adventure|Comedy|Romance     9976
Shakespeare in Love  Comedy|Romance                      9778
Men in Black         Action|Adventure|Comedy|Sci-Fi      9492
Name: rating, dtype: int64

Highest rated drama movies

In [182]:
drama = movies_rated_users[movies_rated_users['genres'].str.contains("Drama")]
drama.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                           
American Beauty                                 Comedy|Drama                         14800
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War    12836
Saving Private Ryan                             Action|Drama|War                     11507
Silence of the Lambs, The                       Drama|Thriller                       11219
Fargo                                           Crime|Drama|Thriller                 10692
Name: rating, dtype: int64

Highest rated action movies

In [183]:
action = movies_rated_users[movies_rated_users['genres'].str.contains("Action")]
action.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                             
Star Wars: Episode IV - A New Hope              Action|Adventure|Fantasy|Sci-Fi        13321
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War      12836
Star Wars: Episode VI - Return of the Jedi      Action|Adventure|Romance|Sci-Fi|War    11598
Saving Private Ryan                             Action|Drama|War                       11507
Raiders of the Lost Ark                         Action|Adventure                       11257
Name: rating, dtype: int64

Highest rated thriller movies

In [184]:
thriller = movies_rated_users[movies_rated_users['genres'].str.contains("Thriller")]
thriller.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                 genres                
Silence of the Lambs, The   Drama|Thriller            11219
Matrix, The                 Action|Sci-Fi|Thriller    11178
Sixth Sense, The            Thriller                  10835
Terminator 2: Judgment Day  Action|Sci-Fi|Thriller    10751
Fargo                       Crime|Drama|Thriller      10692
Name: rating, dtype: int64

Highest rated horror movies

In [185]:
horror = movies_rated_users[movies_rated_users['genres'].str.contains("Horror")]
horror.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title   genres                       
Ghostbusters  Comedy|Horror                    8518
Alien         Action|Horror|Sci-Fi|Thriller    8419
Jaws          Action|Horror                    6940
Psycho        Horror|Thriller                  5328
Fly, The      Horror|Sci-Fi                    5297
Name: rating, dtype: int64

Most popular movies for each of the chosen genres:

#### Comedy: American Beauty

#### Drama: American Beauty

#### Romance: Star Wars Episode VI - The Return of the Jedi

#### Action: Star Wars Episode IV - A New Hope

#### Horror: Ghostbusters

#### Thriller: The Silence of the Lambs

Divide ratings into age groups

In [186]:
movies_rated_users['age'].value_counts()
eighteen = movies_rated_users[movies_rated_users['age'] == 18]
twentyfive = movies_rated_users[movies_rated_users['age'] == 25]
thirtyfive = movies_rated_users[movies_rated_users['age'] == 35]
fifty = movies_rated_users[movies_rated_users['age'] == 50]
fortyfive = movies_rated_users[movies_rated_users['age'] == 45]
fiftysix = movies_rated_users[movies_rated_users['age'] == 56]
one = movies_rated_users[movies_rated_users['age'] == 1]

Highest rated movies by 18 year olds

In [187]:
eighteen.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                             
American Beauty                                 Comedy|Drama                           3233
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War      2572
Matrix, The                                     Action|Sci-Fi|Thriller                 2521
Star Wars: Episode IV - A New Hope              Action|Adventure|Fantasy|Sci-Fi        2488
Star Wars: Episode VI - Return of the Jedi      Action|Adventure|Romance|Sci-Fi|War    2451
Name: rating, dtype: int64

Highest rated movies by 25 year olds

In [188]:
twentyfive.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                           
American Beauty                                 Comedy|Drama                         5777
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War    5163
Star Wars: Episode IV - A New Hope              Action|Adventure|Fantasy|Sci-Fi      5158
Silence of the Lambs, The                       Drama|Thriller                       4702
Matrix, The                                     Action|Sci-Fi|Thriller               4605
Name: rating, dtype: int64

Highest rated movies by 35 year olds

In [189]:
thirtyfive.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                           
Star Wars: Episode IV - A New Hope              Action|Adventure|Fantasy|Sci-Fi      2726
American Beauty                                 Comedy|Drama                         2526
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War    2484
Raiders of the Lost Ark                         Action|Adventure                     2287
Saving Private Ryan                             Action|Drama|War                     2213
Name: rating, dtype: int64

Highest rated movies by 45 year olds

In [190]:
fortyfive.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                           
American Beauty                                 Comedy|Drama                         1071
Star Wars: Episode IV - A New Hope              Action|Adventure|Fantasy|Sci-Fi      1066
Schindler's List                                Drama|War                             978
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War     935
Shakespeare in Love                             Comedy|Romance                        923
Name: rating, dtype: int64

Highest rated movies by 50 year olds

In [191]:
fifty.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                                     genres                           
American Beauty                                 Comedy|Drama                         1029
Star Wars: Episode IV - A New Hope              Action|Adventure|Fantasy|Sci-Fi       959
Godfather, The                                  Action|Crime|Drama                    899
Fargo                                           Crime|Drama|Thriller                  875
Star Wars: Episode V - The Empire Strikes Back  Action|Adventure|Drama|Sci-Fi|War     831
Name: rating, dtype: int64

Highest rated movies by 56 year olds

In [192]:
fiftysix.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title          genres            
American Beauty      Comedy|Drama          756
Schindler's List     Drama|War             633
Shakespeare in Love  Comedy|Romance        560
Godfather, The       Action|Crime|Drama    545
Saving Private Ryan  Action|Drama|War      537
Name: rating, dtype: int64

Highest rated movies by 1 year olds
(Likely invalid data)

In [193]:
one.groupby(['short_title', 'genres'])['rating'].sum().sort_values(ascending=False).head(5)

short_title                         genres                         
Sixth Sense, The                    Thriller                           466
Matrix, The                         Action|Sci-Fi|Thriller             442
Toy Story                           Animation|Children's|Comedy        439
Star Wars: Episode IV - A New Hope  Action|Adventure|Fantasy|Sci-Fi    431
Toy Story 2                         Animation|Children's|Comedy        416
Name: rating, dtype: int64

For all age groups except the 35 group and the 1 group(which can likely be discarded anyway, as the age data here is almost certainly unreliable), the highest rated movie was American Beauty. For the 35 age group, Star Wars Episode IV - A New Hope was the most popular. 

# Which is the best date movie?

The safest bet for a movie to watch on a date would be American Beauty. It had the highest sum rating for both men and women, and for all but one valid age group, and led in its genres. Unless you and/or your date are in the mood for a particular genre which American Beauty is not a part of, it is easily the safest guess for a movie that both you and your date will enjoy, no matter your gender or age. 

Considering the movie is for a date, romance may be the preferred genre. In that case, The Princess Bride is the best choice to try to ensure that both you and your date enjoy the movie equally. Star Wars - Episode VI and Shakespeare in Love are also great choices, though men may enjoy Star Wars more than women, and women may enjoy Shakespeare in Love more than men. 

## Part 2

Load the dataset in `titanic.xls`. It contains data on all the passengers that travelled on the Titanic.

In [159]:
from IPython.core.display import HTML
HTML(filename='../data/titanic.html')

0,1,2,3,4,5
Name,Labels,Units,Levels,Storage,NAs
pclass,,,3,integer,0
survived,Survived,,,double,0
name,Name,,,character,0
sex,,,2,integer,0
age,Age,Year,,double,263
sibsp,Number of Siblings/Spouses Aboard,,,double,0
parch,Number of Parents/Children Aboard,,,double,0
ticket,Ticket Number,,,character,0
fare,Passenger Fare,British Pound (\243),,double,1

0,1
Variable,Levels
pclass,1st
,2nd
,3rd
sex,female
,male
cabin,
,A10
,A11
,A14


In [195]:
# you would need xlrd - pip install xlrd
t_file = pd.ExcelFile('../data/titanic.xls')
t_df = t_file.parse("titanic", header=None)
t_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
1,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.338,B5,S,2,,"St Louis, MO"
2,1,1,"Allison, Master. Hudson Trevor",male,0.9167,1,2,113781,151.55,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,151.55,C22 C26,S,,135,"Montreal, PQ / Chesterville, ON"


### Women and children first?

*** 1. Use the `groupby` method to calculate the proportion of passengers that survived by sex. *** 

*** 2. Calculate the same proportion, but by class and sex. *** 

*** 3. Create age categories: children (under 14 years), adolescents (14-20), adult (21-64), and senior(65+), and calculate survival proportions by age category, class and sex. ***

column indices

In [297]:
pclassindex = 0
survivedindex = 1
nameindex = 2
sexindex = 3
ageindex = 4
sibspindex = 5
parchindex = 6
ticketindex = 7
fareindex = 8
cabinindex = 9
embarkedindex = 10
boatindex = 11
bodyindex = 12
homedestindex = 13

total number of men and women that survived

In [298]:
t_df.groupby(sexindex)[survivedindex].sum()

3
female         339
male           161
sex       survived
Name: 1, dtype: object

total women on the Titanic

In [299]:
females = t_df[t_df[sexindex] == "female"]
len(females)

466

Total men on the Titanic

In [300]:
males = t_df[t_df[sexindex] == "male"]
len(males)

843

In [301]:
339/466, 161/843

(0.7274678111587983, 0.19098457888493475)

#### 72.75% of the women on the Titanic survived

#### 19.10% of the men survived

Number of men that survived per class

In [302]:
males.groupby(pclassindex)[survivedindex].sum()

0
1    61
2    25
3    75
Name: 1, dtype: int64

Number of women that survived per class

In [303]:
females.groupby(pclassindex)[survivedindex].sum()

0
1    139
2     94
3    106
Name: 1, dtype: int64

In [304]:
males1 = males[males[pclassindex] == 1]
males2 = males[males[pclassindex] == 2]
males3 = males[males[pclassindex] == 3]

females1 = females[females[pclassindex] == 1]
females2 = females[females[pclassindex] == 2]
females3 = females[females[pclassindex] == 3]

In [305]:
61/len(males1), 25/len(males2), 75/len(males3)

(0.3407821229050279, 0.14619883040935672, 0.15212981744421908)

#### 34.08% of men in class 1 survived

#### 14.62% of men in class 2 survived

#### 15.21% of men in class 3 survived

In [306]:
139/len(females1), 94/len(females2), 106/len(females3)

(0.9652777777777778, 0.8867924528301887, 0.49074074074074076)

#### 96.53% of women in class 1 survived

#### 88.68% of women in class 2 survived

#### 49.07% of women in class 3 survived

Divide passengers into age groups

In [307]:
t_df[ageindex] = pd.to_numeric(t_df[ageindex], errors='coerce')
children = t_df[t_df[ageindex] < 14]
adolescents = t_df[(t_df[ageindex] >= 14) & (t_df[ageindex] <= 20)]
adults = t_df[(t_df[ageindex] >= 21) & (t_df[ageindex] <= 64)]
seniors = t_df[(t_df[ageindex] >= 65)]

In [308]:
boys = children.merge(males, on=nameindex)
girls = children.merge(females, on=nameindex)

boys.groupby(['0_x'])['1_x'].sum(), girls.groupby(['0_x'])['1_x'].sum()

(0_x
 1     5
 2    11
 3    12
 Name: 1_x, dtype: int64, 0_x
 1     0
 2    14
 3    15
 Name: 1_x, dtype: int64)

In [309]:
5/len(boys.merge(t_df[t_df[pclassindex] == 1 ])), 11/len(boys.merge(t_df[t_df[pclassindex] == 2 ])), 12/len(boys.merge(t_df[t_df[pclassindex] == 3 ]))


(1.0, 1.0, 0.32432432432432434)

#### All male children from classes 1 and 2 survived, while only 32.43% of those from class 3 survived

In [310]:
14/len(girls.merge(t_df[t_df[pclassindex] == 2 ])), 15/len(girls.merge(t_df[t_df[pclassindex] == 3 ]))


(1.0, 0.4838709677419355)

#### There were no female children in class 1. All female children in class 2 survived, while only 48.39% of those in class 3 survived

In [311]:
teenm = adolescents.merge(males, on=nameindex)
teenf = adolescents.merge(females, on=nameindex)

teenm.groupby(['0_x'])['1_x'].sum(), teenf.groupby(['0_x'])['1_x'].sum()

(0_x
 1    1
 2    2
 3    8
 Name: 1_x, dtype: int64, 0_x
 1    15
 2    12
 3    19
 Name: 1_x, dtype: int64)

In [312]:
1/len(teenm.merge(t_df[t_df[pclassindex] == 1 ])), 2/len(teenm.merge(t_df[t_df[pclassindex] == 2 ])), 8/len(teenm.merge(t_df[t_df[pclassindex] == 3 ]))


(0.2, 0.11764705882352941, 0.125)

#### 20% of the teenaged boys in class 1 survived. 11.76% of them in class 2 survived, and 12.5% of those in class 3 survived

In [313]:
15/len(teenf.merge(t_df[t_df[pclassindex] == 1 ])), 12/len(teenf.merge(t_df[t_df[pclassindex] == 2 ])), 19/len(teenf.merge(t_df[t_df[pclassindex] == 3 ]))


(1.0, 0.9230769230769231, 0.5428571428571428)

#### All teenaged girls in class 1 survived. 92.31% of those in class 2 survived. 54.29% of those in class 3 survived.

In [314]:
adultm = adults.merge(males, on=nameindex)
adultf = adults.merge(females, on=nameindex)

adultm.groupby(['0_x'])['1_x'].sum(), adultf.groupby(['0_x'])['1_x'].sum()

(0_x
 1    46
 2    10
 3    39
 Name: 1_x, dtype: int64, 0_x
 1    112
 2     66
 3     39
 Name: 1_x, dtype: int64)

In [315]:
46/len(adultm.merge(t_df[t_df[pclassindex] == 1 ])), 10/len(adultm.merge(t_df[t_df[pclassindex] == 2 ])), 39/len(adultm.merge(t_df[t_df[pclassindex] == 3 ]))


(0.34328358208955223, 0.078125, 0.156)

#### 34.33% of the adult men in class 1 survived. 7.81% of those in class 2 survived. 15.6% of those in class 3 survived

In [316]:
112/len(adultf.merge(t_df[t_df[pclassindex] == 1 ])), 66/len(adultf.merge(t_df[t_df[pclassindex] == 2 ])), 39/len(adultf.merge(t_df[t_df[pclassindex] == 3 ]))


(0.9655172413793104, 0.868421052631579, 0.42391304347826086)

#### 96.55% of the women in class 1 survived. 86.84% of those in class 2 survived, and 42.39% of those in class 3 survived

In [317]:
seniorm = seniors.merge(males, on=nameindex)
seniorf = seniors.merge(females, on=nameindex)

seniorm.groupby(['0_x'])['1_x'].sum(), seniorf.groupby(['0_x'])['1_x'].sum()

(0_x
 1    1
 2    0
 3    0
 Name: 1_x, dtype: int64, 0_x
 1    1
 Name: 1_x, dtype: int64)

In [318]:
seniorf.head()

Unnamed: 0,0_x,1_x,2,3_x,4_x,5_x,6_x,7_x,8_x,9_x,...,4_y,5_y,6_y,7_y,8_y,9_y,10_y,11_y,12_y,13_y
0,1,1,"Cavendish, Mrs. Tyrell William (Julia Florence...",female,76.0,1,0,19877,78.85,C46,...,76.0,1,0,19877,78.85,C46,S,6,,"Little Onn Hall, Staffs"


#### The only elderly female aboard was in class 1 and she survived. Only one of the elderly men aboard survived, and he was in class 1. 

In [319]:
seniorm.groupby('0_x')['3_x'].value_counts()

0_x  3_x 
1    male    7
2    male    2
3    male    3
Name: 3_x, dtype: int64

In [320]:
1/len(seniorm.merge(t_df[t_df[pclassindex] == 1 ]))

0.14285714285714285

#### 14.29% of the elderly men in class 1 survived, 0% of those in classes 2 and 3 survived