# Lego Data Analysis: Unveiling the Star Wars Influence

## Introduction

Welcome to this Jupyter Notebook, where we embark on an exploratory journey through the fascinating world of Lego sets. In this analysis, we focus on unraveling the Star Wars influence on licensed Lego sets and investigating the popularity of this iconic theme over the years.

### The Lego Dataset

The dataset at our disposal provides a comprehensive overview of Lego sets, including details on themes, release years, and set information. Our objective is to delve into the dataset and answer specific questions that shed light on the prevalence and popularity of Star Wars-themed Lego sets.

### Questions to Explore

1. **What percentage of all licensed sets over released were Star Wars themed?**
   - We'll dive into the dataset to calculate the proportion of licensed sets that belong to the Star Wars theme. This analysis will provide insights into the extent of Star Wars' influence within the Lego licensing landscape.

2. **In which year was Star Wars not the most popular licensed theme (in terms of the number of sets released that year)?**
   - We'll examine the yearly distribution of licensed themes, identifying the year when Star Wars faced competition in terms of the sheer number of sets released. This exploration will reveal intriguing patterns in Lego's licensing history.

### Methodology

Our analysis employs Python and popular data analysis libraries such as Pandas, Matplotlib, and Seaborn. We'll load and preprocess the Lego dataset, perform statistical calculations, and create visualizations to present our findings in a clear and informative manner.

So, let's dive into the Lego dataset and unravel the stories behind the bricks, uncovering the dominance and evolution of the Star Wars theme among licensed Lego sets.


## Question 1: What percentage of all licensed sets over released were star wars themed?

In [50]:
# DataFrame Creation
import pandas as pd

legoSet=pd.read_csv('lego_sets.csv')
ptSet=pd.read_csv('parent_themes.csv')
print(legoSet.shape)
legoSet.head()

(11986, 6)


Unnamed: 0,set_num,name,year,num_parts,theme_name,parent_theme
0,00-1,Weetabix Castle,1970,471.0,Castle,Legoland
1,0011-2,Town Mini-Figures,1978,,Supplemental,Town
2,0011-3,Castle 2 for 1 Bonus Offer,1987,,Lion Knights,Castle
3,0012-1,Space Mini-Figures,1979,12.0,Supplemental,Space
4,0013-1,Space Mini-Figures,1979,12.0,Supplemental,Space


In [68]:
# Cleaning is important - Check for Null

NaN=legoSet[legoSet['set_num'].isnull()]
NaN.shape

(153, 6)

In [69]:
# Remove Null rows

legoSet=legoSet[legoSet['set_num'].isnull()==False] ## dropna(subset=['is_licensed'])
legoSet.shape

(11833, 6)

In [70]:
# Join DataFrame
df1=legoSet.merge(ptSet, how='left', left_on='parent_theme', right_on='name')
df1.drop(columns='name_y', inplace=True)
print(df1.shape)
df1.head()

(11833, 8)


Unnamed: 0,set_num,name_x,year,num_parts,theme_name,parent_theme,id,is_licensed
0,00-1,Weetabix Castle,1970,471.0,Castle,Legoland,411,False
1,0011-2,Town Mini-Figures,1978,,Supplemental,Town,50,False
2,0011-3,Castle 2 for 1 Bonus Offer,1987,,Lion Knights,Castle,186,False
3,0012-1,Space Mini-Figures,1979,12.0,Supplemental,Space,126,False
4,0013-1,Space Mini-Figures,1979,12.0,Supplemental,Space,126,False


In [71]:
# Count total license
df1=df1[df1['is_licensed']==True] 
print(df1.shape) ## 1132 are licensed sets
df1.head()

(1179, 8)


Unnamed: 0,set_num,name_x,year,num_parts,theme_name,parent_theme,id,is_licensed
44,10018-1,Darth Maul,2001,1868.0,Star Wars,Star Wars,158,True
45,10019-1,Rebel Blockade Runner - UCS,2001,,Star Wars Episode 4/5/6,Star Wars,158,True
54,10026-1,Naboo Starfighter - UCS,2002,,Star Wars Episode 1,Star Wars,158,True
57,10030-1,Imperial Star Destroyer - UCS,2002,3115.0,Star Wars Episode 4/5/6,Star Wars,158,True
95,10075-1,Spider-Man Action Pack,2002,25.0,Spider-Man,Super Heroes,482,True


In [72]:
# Count Star Wars License
df2=df1[df1['parent_theme']=='Star Wars']
print(df2.shape) ## 609 licensed Sets are Star Wars Theme
df2.head()

(609, 8)


Unnamed: 0,set_num,name_x,year,num_parts,theme_name,parent_theme,id,is_licensed
44,10018-1,Darth Maul,2001,1868.0,Star Wars,Star Wars,158,True
45,10019-1,Rebel Blockade Runner - UCS,2001,,Star Wars Episode 4/5/6,Star Wars,158,True
54,10026-1,Naboo Starfighter - UCS,2002,,Star Wars Episode 1,Star Wars,158,True
57,10030-1,Imperial Star Destroyer - UCS,2002,3115.0,Star Wars Episode 4/5/6,Star Wars,158,True
116,10123-1,Cloud City,2003,707.0,Star Wars Episode 4/5/6,Star Wars,158,True


In [73]:
# percentage calculation
percentage=(df2.shape[0]/df1.shape[0])*100
print(percentage)

51.653944020356235


### Ans: 51.6539 percentage of all licensed sets over released were star wars themed

## Question 2: In which year was Star Wars not the most popular licensed theme (in terms of number of sets released that year)?

In [76]:
df2.groupby('year').sum()


  df2.groupby('year').sum()


Unnamed: 0_level_0,num_parts,id,is_licensed
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1999,1384.0,2054,13
2000,2580.0,4108,26
2001,2949.0,2212,14
2002,4735.0,4424,28
2003,6660.0,5056,32
2004,1659.0,3160,20
2005,4730.0,4424,28
2006,2769.0,1738,11
2007,11361.0,2528,16
2008,6865.0,3634,23


### Ans: 2006