# Lego set analysis
This is the weekly brief where we'll try with just vanilla Python load a dataset and answer some questions:
- How many sets have been released during the year 2000?
- Which theme has more sets - City or Star Wars?
- What is the average (mean) amount of pieces in a set (excluding sets without pieces)?
- What are the top 10 themes containing the most sets?

## 1. Load and parse the CSV
In order to get the data in a format to work with we'll open the CSV file and use the built-in DictReader to convert it into a list of dictionaries.

In [1]:
import csv
from pprint import pprint

with open("lego_sets.csv", encoding="utf8") as csvfile:
    # DictReader returns a generator which means it will read the file line by line.
    # We want to go through the entire file and convert it into a list so that we can use it.
    # That's why we wrap it in `list()`    
    data = list(csv.DictReader(csvfile))
    
pprint(data[0])

{'Availability': '{Not specified}',
 'Category': 'Normal',
 'Current_Price': '',
 'Minifigures': '',
 'Name': 'PreSchool Set',
 'Num_Instructions': '0',
 'Owned': '10.0',
 'Packaging': '{Not specified}',
 'Pieces': '16.0',
 'Rating': '0.0',
 'Set_ID': '75-1',
 'Subtheme': '',
 'Theme': 'PreSchool',
 'Theme_Group': 'Pre-school',
 'Total_Quantity': '',
 'USD_MSRP': '',
 'Year': '1975'}


## 2. Transforming
Looking at the data sample above we see that everything is a string. In order to make things easier to work with let's do some transformation of the data and convert it into the right data types.

In [2]:
for lego_set in data:
    if lego_set["Pieces"]:
        lego_set["Pieces"] = float(lego_set["Pieces"])
        

## 3. Analysis
Now we can start doing some analysis.

### 3.1 How many sets have been released during the year 2000?

In [3]:
# Using a simple for-loop

count_2000 = 0

for lego_set in data:
    if lego_set["Year"] == "2000":
        count_2000 += 1
        
print(count_2000)

384


In [4]:
# Using a filtered list

filtered_data = [lego_set for lego_set in data if lego_set["Year"] == "2000"]

print(len(filtered_data))

384


In [5]:
# Using a defaultdict to group by year

from collections import defaultdict

group_by_year = defaultdict(int)

for lego_set in data:
    group_by_year[lego_set["Year"]] += 1
    
print(group_by_year["2000"])

384


### 3.2 Which theme has more sets - City or Star Wars?

In [6]:
# Using a simple for-loop

count_city = 0
count_starwars = 0

for lego_set in data:
    if lego_set["Theme"] == "City":
        count_city += 1
    elif lego_set["Theme"] == "Star Wars":
        count_starwars += 1
        
print(count_city)
print(count_starwars)

770
723


In [7]:
# Using a defaultdict to group by theme

from collections import defaultdict

group_by_theme = defaultdict(int)

for lego_set in data:
    group_by_theme[lego_set["Theme"]] += 1
    
print(f"Lego City - {group_by_theme['City']} sets")
print(f"Lego Star Wars - {group_by_theme['Star Wars']} sets")

Lego City - 770 sets
Lego Star Wars - 723 sets


### 3.3 What is the average (mean) amount of pieces in a set (excluding sets without pieces)?

In [8]:
# Using a for-loop

sum_pieces = 0
count_pieces = 0

for lego_set in data:
    if lego_set["Pieces"]: # Exclude all sets with no pieces defined
        sum_pieces += lego_set["Pieces"]
        count_pieces += 1

print(sum_pieces / count_pieces)

238.15334197088637


In [9]:
# Using a filtered list

filtered_data = [lego_set["Pieces"] for lego_set in data if lego_set["Pieces"]]

sum_pieces = sum(filtered_data)
count_pieces = len(filtered_data)

print(sum_pieces / count_pieces)

238.15334197088637


### 3.4 What are the top 10 themes containing the most sets?

In [10]:
from collections import defaultdict

group_by_theme = defaultdict(int)

for lego_set in data:
    group_by_theme[lego_set["Theme"]] += 1
    
sorted_themes = sorted(group_by_theme.items(), key=lambda x: x[1], reverse=True)

for theme, count in sorted_themes[:10]:
    print(f"{theme} - {count}")

Duplo - 1278
Gear - 1232
Collectable Minifigures - 804
City - 770
Star Wars - 723
Town - 648
Creator - 528
Technic - 481
Friends - 429
Basic - 402
