![star_wars_unsplash](star_wars_unsplash.jpg)

Lego is a household name across the world, supported by a diverse toy line, hit movies, and a series of successful video games. In this project, we are going to explore a key development in the history of Lego: the introduction of licensed sets such as Star Wars, Super Heroes, and Harry Potter.

The introduction of its first licensed series, Star Wars, was a hit that sparked a series of collaborations with more themed sets. The partnerships team has asked you to perform an analysis of this success, and before diving into the analysis, they have suggested reading the descriptions of the two datasets to use, reported below.

## The Data

You have been provided with two datasets to use. A summary and preview are provided below.

## lego_sets.csv

| Column     | Description              |
|------------|--------------------------|
| `"set_num"` | A code that is unique to each set in the dataset. This column is critical, and a missing value indicates the set is a duplicate or invalid! |
| `"name"` | The name of the set. |
| `"year"` | The date the set was released. |
| `"num_parts"` | The number of parts contained in the set. This column is not central to our analyses, so missing values are acceptable. |
| `"theme_name"` | The name of the sub-theme of the set. |
| `"parent_theme"` | The name of the parent theme the set belongs to. Matches the name column of the parent_themes csv file.
|

## parent_themes.csv

| Column     | Description              |
|------------|--------------------------|
| `"id"` | A code that is unique to every theme. |
| `"name"` | The name of the parent theme. |
| `"is_licensed"` | A Boolean column specifying whether the theme is a licensed theme. |

In [3]:
# Import pandas, read and inspect the datasets
import pandas as pd
pd.set_option("display.width", 1000)

lego_sets = pd.read_csv('data/lego_sets.csv')
print(lego_sets.head())

  set_num                        name  year  num_parts    theme_name parent_theme
0    00-1             Weetabix Castle  1970      471.0        Castle     Legoland
1  0011-2           Town Mini-Figures  1978        NaN  Supplemental         Town
2  0011-3  Castle 2 for 1 Bonus Offer  1987        NaN  Lion Knights       Castle
3  0012-1          Space Mini-Figures  1979       12.0  Supplemental        Space
4  0013-1          Space Mini-Figures  1979       12.0  Supplemental        Space


In [2]:
parent_themes = pd.read_csv('data/parent_themes.csv')
print(parent_themes.head())

    id     name  is_licensed
0    1  Technic        False
1   22  Creator        False
2   50     Town        False
3  112   Racers        False
4  126    Space        False


The team responsible for the Star Wars partnership has asked for specific information in preparation for their meeting:

- What percentage of all **licensed** sets ever released were Star Wars themed? Save your answer as a variable `the_force`, as an integer (e.g. 25).

In [4]:
licensed_themes = parent_themes[parent_themes['is_licensed']]['name']
print(licensed_themes.head())

7                    Star Wars
12                Harry Potter
16    Pirates of the Caribbean
17               Indiana Jones
18                        Cars
Name: name, dtype: object


In [5]:
licensed = lego_sets['parent_theme'].isin(licensed_themes)
licensed_sets = lego_sets[licensed]
print(licensed_sets.head())

    set_num                           name  year  num_parts               theme_name  parent_theme
44  10018-1                     Darth Maul  2001     1868.0                Star Wars     Star Wars
45  10019-1    Rebel Blockade Runner - UCS  2001        NaN  Star Wars Episode 4/5/6     Star Wars
54  10026-1        Naboo Starfighter - UCS  2002        NaN      Star Wars Episode 1     Star Wars
57  10030-1  Imperial Star Destroyer - UCS  2002     3115.0  Star Wars Episode 4/5/6     Star Wars
95  10075-1         Spider-Man Action Pack  2002       25.0               Spider-Man  Super Heroes


In [8]:
all_sets = len(licensed_sets)
star_wars_sets = len(licensed_sets[licensed_sets['parent_theme'] == 'Star Wars'])
the_force = int((star_wars_sets / all_sets) * 100)
print(f'The percentage of licensed sets that are Star Wars themed is {the_force}%.')

The percentage of licensed sets that are Star Wars themed is 45%.


- In which year was the highest number of Star Wars sets released? Save your answer as a variable `new_era`, as an integer (e.g. 2012).

In [13]:
licensed_pivot = licensed_sets.pivot_table(index='year', columns='parent_theme', values='set_num', aggfunc='count')

licensed_pivot_sorted = licensed_pivot.sort_values(by="Star Wars", ascending=False)["Star Wars"]
print(licensed_pivot_sorted)

year
2016    61.0
2015    58.0
2017    55.0
2014    45.0
2012    43.0
2009    39.0
2013    35.0
2003    32.0
2011    32.0
2010    30.0
2002    28.0
2005    28.0
2000    26.0
2008    23.0
2004    20.0
2007    16.0
2001    14.0
1999    13.0
2006    11.0
Name: Star Wars, dtype: float64


In [22]:
new_era = licensed_pivot_sorted.index[0]
print(f'The year when the most Star Wars sets were released was {new_era}.')

The year when the most Star Wars sets were released was 2016.
