## 2021: Week 38 - Trilogy

Recently, I've been playing with data about the best movie trilogies of all time, according to IMDb. So I thought I'd create a Preppin' Data challenge to allow you all to do the same!

![img](https://1.bp.blogspot.com/--LsfG5paGCA/YUnvc86nCYI/AAAAAAAAA9A/2yvgv7PjYvUlZoIbzNoamdhy___L18nFQCLcBGAsYHQ/w361-h400/IMDb%2BTrilogies%2B%25282%2529.png)

### Input
There are 2 inputs for this challenge:
1. Top 30 Trilogies
![img](https://lh3.googleusercontent.com/-9oMAAJIswNM/YUnw0zlu8pI/AAAAAAAAA9I/2UhWz5WL1SIXtFdeJwALe1Y80FmldZHUwCLcBGAsYHQ/image.png)

2. Films
![img](https://lh3.googleusercontent.com/-Ug52cT9O70Y/YUnxDEi_nTI/AAAAAAAAA9M/F2udZ_-nVMMEW31UBlPrNq40oLq0G5lnwCLcBGAsYHQ/w400-h95/image.png)

### Requirement
- Input the data
- Split out the Number in Series field into Film Order and Total Films in Series
- Work out the average rating for each trilogy
- Work out the highest ranking for each trilogy
- Rank the trilogies based on the average rating and use the highest ranking metric to break ties (make sure you haven't rounded the numeric fields yet!)
    - We have noticed a slight error in the way that Tableau Prep is calculating this rank, so don't worry if your output is different to ours, we are investigating!
- Remove the word trilogy from the Trilogy field
- Bring the 2 datasets together by the ranking fields
- Output the data

### Output
![img](https://lh3.googleusercontent.com/-1h_wQ7twRnY/YUnz-XfcXNI/AAAAAAAAA9Y/3j7LZD5aEIICn3ZWWpceTicFuAI--RZUwCLcBGAsYHQ/w640-h128/image.png)

- 7 fields
    - Trilogy Ranking
    - Trilogy
    - Trilogy Average
    - Film Order
    - Title
    - Rating
    - Total Films in Series
- 90 rows (91 including headers)

In [17]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [18]:
### Input the data

In [19]:
data = pd.read_excel("./data/Trilogies Input.xlsx", sheet_name=[0, 1])
top_30 = data[0].copy()
films = data[1].copy()

In [20]:
top_30.head()

Unnamed: 0,Trilogy Ranking,Trilogy
0,1,Lord of the Rings trilogy
1,2,The Godfather trilogy
2,4,Star Wars trilogy
3,3,The Dark Knight trilogy
4,5,Dollars trilogy


In [21]:
films.head()

Unnamed: 0,Number in Series,Trilogy Grouping,Title,Rating
0,2/3,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Two Towers,8.7
1,1/3,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Fellowship of the Ring,8.8
2,3/3,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Return of the King,8.9
3,3/3,08985faab9f27113eef8adfc2200ac27,Babel,7.4
4,2/3,08985faab9f27113eef8adfc2200ac27,21 Grams,7.6


In [22]:
top_30 = top_30.sort_values(by="Trilogy Ranking",ascending=True)
top_30.head()

Unnamed: 0,Trilogy Ranking,Trilogy
0,1,Lord of the Rings trilogy
1,2,The Godfather trilogy
3,3,The Dark Knight trilogy
2,4,Star Wars trilogy
4,5,Dollars trilogy


In [23]:
### Split out the Number in Series field into Film Order and Total Films in Series

In [24]:
splits = films["Number in Series"].str.split("/").apply(pd.Series).rename(columns={0: "Film Order", 1: "Total Films in Series"})
films = pd.concat([films, splits], axis=1).drop("Number in Series", axis=1)
films.head()

Unnamed: 0,Trilogy Grouping,Title,Rating,Film Order,Total Films in Series
0,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Two Towers,8.7,2,3
1,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Fellowship of the Ring,8.8,1,3
2,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Return of the King,8.9,3,3
3,08985faab9f27113eef8adfc2200ac27,Babel,7.4,3,3
4,08985faab9f27113eef8adfc2200ac27,21 Grams,7.6,2,3


In [None]:
### Work out the average rating for each trilogy

In [28]:
avg_rate = films.groupby(["Trilogy Grouping"])["Rating"].mean()
avg_rate.name = "Trilogy Average"
avg_rate

Trilogy Grouping
06d49632c9dc9bcb62aeaef99612ba6b    8.800000
08985faab9f27113eef8adfc2200ac27    7.700000
13a893bc3f0877d224af0d73de3f0359    6.833333
161b0061457968be63e99de104b79892    6.933333
209104330778fc9558cd63c51f6205d5    6.933333
309fc7d3bc53bb63ac42e359260ac740    8.533333
34f1bcfc647cfa2931f5b1e78d8011d2    8.033333
4a6c252c118bb7c69d8ea2a3022429e8    7.466667
5b78583b25bb5e0c17412e0c18177954    6.766667
5b9a02dc8c23ddbd9b7c6ff35ce013c6    6.833333
67ae81acd95388b4876c74c7725f1a3b    7.900000
6d5ababb65e9ff214b73e891b4afe6e8    8.600000
712a52bc5e7f8dc3cb5de157dbb08151    8.333333
71f6794efc68c56fbee873fa5fb5503a    7.166667
854b85cbff2752fcb88606bca76f83c6    8.166667
86ce15a7f39fc14323b76c9b95c66165    7.866667
928727f61cbc2f7507785834c5e11d48    7.000000
92e4d2da3d1528bc9f6668bbc26d633e    7.866667
a734c94d6e046c4667fea57758c5b6f6    8.133333
aabfb4cb5917aab2819ec029a588256c    7.433333
abbf733a570a84955fc8eee18d7fc40b    6.766667
b43e28d1b129619d9a1e8186df0d2e18    7.

In [30]:
films = films.merge(avg_rate, how="left", left_on="Trilogy Grouping", right_on="Trilogy Grouping")
films.head()

Unnamed: 0,Trilogy Grouping,Title,Rating,Film Order,Total Films in Series,Trilogy Average
0,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Two Towers,8.7,2,3,8.8
1,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Fellowship of the Ring,8.8,1,3,8.8
2,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Return of the King,8.9,3,3,8.8
3,08985faab9f27113eef8adfc2200ac27,Babel,7.4,3,3,7.7
4,08985faab9f27113eef8adfc2200ac27,21 Grams,7.6,2,3,7.7


In [31]:
### Work out the highest ranking for each trilogy

(30,)

In [39]:
title_only = films["Title"].str.split(":").apply(pd.Series)[0]
title_only = title_only.str.strip()
title_only.head()

0    The Lord of the Rings
1    The Lord of the Rings
2    The Lord of the Rings
3                    Babel
4                 21 Grams
Name: 0, dtype: object

In [40]:
films["Name"] = title_only
films.head()

Unnamed: 0,Trilogy Grouping,Title,Rating,Film Order,Total Films in Series,Trilogy Average,Name
0,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Two Towers,8.7,2,3,8.8,The Lord of the Rings
1,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Fellowship of the Ring,8.8,1,3,8.8,The Lord of the Rings
2,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Return of the King,8.9,3,3,8.8,The Lord of the Rings
3,08985faab9f27113eef8adfc2200ac27,Babel,7.4,3,3,7.7,Babel
4,08985faab9f27113eef8adfc2200ac27,21 Grams,7.6,2,3,7.7,21 Grams


In [43]:
top_30["Trilogy"] = top_30["Trilogy"].str.split("trilo").apply(pd.Series)[0]
top_30["Trilogy"] = top_30["Trilogy"].str.strip()
top_30["Trilogy"]

0           Lord of the Rings
1               The Godfather
3             The Dark Knight
2                   Star Wars
4                     Dollars
5            Once Upon a Time
6                   Toy Story
7                  The Before
9          Back to the Future
10              The Vengeance
8               Three Colours
11                      Death
12                 The Hobbit
14    Three Flavours Cornetto
16                     Matrix
13                  Evil Dead
15            Captain America
17                 Millennium
18                   Iron Man
20                      X-Men
19                  Wolverine
21                    Ocean's
23              The Naked Gun
24                 Spider-Man
22                    Mad Max
25                    Prequel
27                     Mexico
28                        MIB
26                 Madagascar
29               The Hangover
Name: Trilogy, dtype: object

In [45]:
top_30

Unnamed: 0,Trilogy Ranking,Trilogy
0,1,Lord of the Rings
1,2,The Godfather
3,3,The Dark Knight
2,4,Star Wars
4,5,Dollars
5,6,Once Upon a Time
6,7,Toy Story
7,8,The Before
9,9,Back to the Future
10,10,The Vengeance


In [50]:
films.drop_duplicates(subset="Trilogy Grouping").sort_values(by="Rating", ascending=False)

Unnamed: 0,Trilogy Grouping,Title,Rating,Film Order,Total Films in Series,Trilogy Average,Name
0,06d49632c9dc9bcb62aeaef99612ba6b,The Lord of the Rings: The Two Towers,8.7,2,3,8.8,The Lord of the Rings
15,309fc7d3bc53bb63ac42e359260ac740,Star Wars: Return of the Jedi,8.3,6,9,8.533333,Star Wars
72,cd1075d848a5e0142bd3b5d66726041c,Batman Begins,8.2,1,3,8.533333,Batman Begins
36,712a52bc5e7f8dc3cb5de157dbb08151,A Fistful of Dollars,8.0,1,3,8.333333,A Fistful of Dollars
54,a734c94d6e046c4667fea57758c5b6f6,Toy Story 2,7.9,2,4,8.133333,Toy Story 2
18,34f1bcfc647cfa2931f5b1e78d8011d2,Before Midnight,7.9,3,3,8.033333,Before Midnight
33,6d5ababb65e9ff214b73e891b4afe6e8,The Godfather Part III,7.6,3,3,8.6,The Godfather Part III
51,92e4d2da3d1528bc9f6668bbc26d633e,Three Colors: White,7.6,2,3,7.866667,Three Colors
42,854b85cbff2752fcb88606bca76f83c6,A Fistful of Dynamite,7.6,2,3,8.166667,A Fistful of Dynamite
45,86ce15a7f39fc14323b76c9b95c66165,Sympathy for Mr. Vengeance,7.6,1,3,7.866667,Sympathy for Mr. Vengeance
