# Fun with Lego

In this folder there is a CSV file with all Lego sets from 1950 to 2017. Your homework is to use this file to answer the questions below. Just as for the in-class exercise, you should be able to solve everything without any non-standard libraries (except for the optional matplotlib exercise), but you are more than welcome to use any other libraries.

Answer the following questions and put your answer in a dictionary, with the first word of each line (e.g. 'all_pieces' or 'year_most') as the key and your answer as the value. Your homework must also include the code that you used to find the answers.

- 'all_pieces': If you had one of each of the sets, how many pieces of Lego would you have then?
- 'year_most': In which year was the highest number of sets released?
- 'average_pieces': What is the average number of pieces in all sets, rounded to 1 decimal?
- 'most_used_word': Which word is used most often in the names of the sets?

You can find more information about the dataset here:
https://www.kaggle.com/rtatman/lego-database

Optional matplotlib exercise:
- Plot the years from 1950-2017 on the x-axis and the median number of pieces of a set of the given year on the y-axis.

Optional themes exercises (for these, you will also need the themes.csv file):
Each set is part of a theme, and each theme is also part of one or more parent themes. For example, the set 60141-1 is part of theme 80 (Police), which in turn is part of theme 67 (Classic Town), which again in turn is part of theme 50 (Town). Theme 50, however, is a parent theme, and therefore there are no other themes 'above' it.
- Create a dictionary with all parent themes as keys, and a list of all their sub-themes as values. Here, you should only discern between a parent theme and any subtheme. Thus, theme 50 would be a parent theme, and both theme 80 and 67 should be listed on the same level.
- Create a dictionary with all parent themes as keys and the number of the sets that are part of it. Here, you have to make sure that each set is only counted once!





In [1]:
import csv
filename = 'legosets.csv'

with open(filename, encoding="utf-8") as legosets_file:
    reader = csv.reader(legosets_file)
    header = next(reader)
    print (header)
    
    lego_data = [row for row in reader]

['set_num', 'name', 'year', 'theme_id', 'num_parts']


In [2]:
## If you had one of each of the sets, how many pieces of Lego would you have?

all_pieces = 0
for row in lego_data:
    all_pieces = all_pieces + int(row[4])
    
all_pieces

1894089

In [3]:
## In which year was the highest number of sets released?
year_total = [row[2] for row in lego_data]

# Function to find out the most frequent item in a list
def most_frequent(list_of_items):
    list_of_frequency = []
    for item in set(list_of_items):
        count = list_of_items.count(item)
        list_of_frequency.append(count)
        if count == max(list_of_frequency):
            most_frequent_item = item
    return most_frequent_item

year_most = most_frequent(year_total)
year_most

'2014'

In [4]:
## What is the average number of pieces in all sets, rounded to 1 decimal?

set_total = [row[0] for row in lego_data]
num_of_sets = len(set_total)
average_pieces = round(all_pieces/num_of_sets, 1)

average_pieces

162.3

In [5]:
## Which word is used most often in the names of the sets?

import re

# Create function to split a string with multiple delimitors:
def multi_split(string):
    return re.split('[, \-!?:();/]+', string)

set_name_total = []

for row in lego_data:
    for word in multi_split(row[1]):
        set_name_total.append(word)
            
most_used_word = most_frequent(set_name_total)
most_used_word

'Set'

In [6]:
solution = {}
solution['all_pieces'] = all_pieces
solution['year_most'] = year_most
solution['average_pieces'] = average_pieces
solution['most_used_word'] = most_used_word
print(solution)

{'all_pieces': 1894089, 'year_most': '2014', 'average_pieces': 162.3, 'most_used_word': 'Set'}


In [7]:
## Open and read themes.csv
filename_theme = 'themes.csv'
with open(filename_theme) as legothemes_file:
    reader_theme = csv.reader(legothemes_file)
    header_theme = next(reader_theme)
    print(header_theme)
    
    theme_data = [line for line in reader_theme]
    print(theme_data)

['id', 'name', 'parent_id']
[['1', 'Technic', ''], ['2', 'Arctic Technic', '1'], ['3', 'Competition', '1'], ['4', 'Expert Builder', '1'], ['5', 'Model', '1'], ['6', 'Airport', '5'], ['7', 'Construction', '5'], ['8', 'Farm', '5'], ['9', 'Fire', '5'], ['10', 'Harbor', '5'], ['11', 'Off-Road', '5'], ['12', 'Race', '5'], ['13', 'Riding Cycle', '5'], ['14', 'Robot', '5'], ['15', 'Traffic', '5'], ['16', 'RoboRiders', '1'], ['17', 'Speed Slammers', '1'], ['18', 'Star Wars', '1'], ['19', 'Supplemental', '1'], ['20', 'Throwbot Slizer', '1'], ['21', 'Universal Building Set', '1'], ['22', 'Creator', ''], ['23', 'Basic Model', '22'], ['24', 'Airport', '23'], ['25', 'Castle', '23'], ['26', 'Construction', '23'], ['27', 'Race', '23'], ['28', 'Harbor', '23'], ['29', 'Train', '23'], ['30', 'Traffic', '23'], ['31', 'Creature', '23'], ['32', 'Robot', '23'], ['33', 'Food & Drink', '23'], ['34', 'Building', '23'], ['35', 'Cargo', '23'], ['36', 'Fire', '23'], ['37', 'Basic Set', '22'], ['38', 'Model', '22'

In [9]:
## Create a dictionary with all parent themes as keys, and a list of all their sub-themes as values.
theme_dict = {}
for line in theme_data:
    if line[2] != '':
        theme_dict.setdefault(line[2],[]).append(line[0])
        
print(theme_dict)

{'1': ['2', '3', '4', '5', '16', '17', '18', '19', '20', '21'], '5': ['6', '7', '8', '9', '10', '11', '12', '13', '14', '15'], '22': ['23', '37', '38', '48', '49'], '23': ['24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36'], '38': ['39', '40', '41', '42', '43', '44', '45', '46', '47'], '50': ['51', '52', '67', '86', '87', '88', '89', '90', '91', '92', '93', '94', '104', '105'], '52': ['53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '614'], '67': ['68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85'], '94': ['95', '96', '97', '98', '99', '100', '101', '102', '103'], '105': ['106', '107', '108', '109', '110', '111'], '112': ['113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '125'], '123': ['124'], '126': ['127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146'], '14

In [10]:
## Create a dictionary with all parent themes as keys and the number of the sets that are part of it

num_set_dict = {}
theme_id_list = [row[3] for row in lego_data]
print(theme_id_list)


['414', '84', '199', '143', '143', '143', '143', '186', '413', '413', '413', '366', '67', '413', '366', '366', '366', '366', '502', '366', '366', '366', '366', '366', '469', '186', '233', '233', '254', '254', '254', '254', '254', '254', '254', '243', '254', '254', '254', '238', '238', '238', '238', '238', '158', '174', '243', '236', '236', '404', '237', '324', '276', '237', '172', '239', '387', '174', '75', '75', '186', '147', '85', '206', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '254', '255', '254', '227', '227', '227', '229', '256', '256', '256', '488', '256', '256', '257', '227', '227', '500', '500', '500', '231', '227', '243', '254', '254', '254', '254', '254', '500', '500', '500', '459', '169', '276', '461', '239', '174', '243', '169', '246', '236', '174', '174', '169', '254', '254', '254', '254', '254', '254', '278', '276', '276', '276', '244', '276', '85', '239', '239', '53'