# 1. Introduction
Everyone loves Lego (unless you ever stepped on one). Did you know by the way that "Lego" was derived from the Danish phrase leg godt, which means "play well"? Unless you speak Danish, probably not.

In this project, we will analyze a fascinating dataset on every single lego block that has ever been built!

![lego_bricks1.jpeg](attachment:lego_bricks1.jpeg)

# Understanding Data
A comprehensive database of lego blocks is provided by Rebrickable. The data is available as csv files and the schema is shown below.

![downloads_schema.png](attachment:downloads_schema.png)

In [2]:
#Importing the Visualization and Computational Packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [11]:
#Importing the datasets - colors and sets
colors = pd.read_csv("datasets/colors.csv")
sets = pd.read_csv("datasets/sets.csv")

In [13]:
#Exploring the colors dataset
print ("Printing head of Colors dataset\n")
print (colors.head())
print ("\nPrinting tail of Colors dataset\n")
print (colors.tail())

Printing head of Colors dataset

   id            name     rgb is_trans
0  -1         Unknown  0033B2        f
1   0           Black  05131D        f
2   1            Blue  0055BF        f
3   2           Green  237841        f
4   3  Dark Turquoise  008F9B        f

Printing tail of Colors dataset

       id                          name     rgb is_trans
130  1004  Trans Flame Yellowish Orange  FCB76D        t
131  1005             Trans Fire Yellow  FBE890        t
132  1006        Trans Light Royal Blue  B4D4F7        t
133  1007                 Reddish Lilac  8E5597        f
134  9999                    [No Color]  05131D        f


In [15]:
#Distinct colors
num_colors = len(colors.name.unique())
print ("Number of Unique Colors = %s " % num_colors)

Number of Unique Colors = 135 


In [16]:
#Understaning summary of transparent and non-transparent colors
colors_summary = colors.groupby(colors['is_trans']).count()
print (colors_summary)

           id  name  rgb
is_trans                
f         107   107  107
t          28    28   28


In [26]:
# Splitting into transparent and non transparent dataframes
colors_transparent = colors.query("is_trans == 't'")
colors_ntransparent = colors.query("is_trans == 'f'")

In [51]:
#Count of entries by color in transparent set
transparent_entries = colors_transparent.groupby(colors['rgb']).agg({'name':'count'})
print (transparent_entries.head())


        name
rgb         
0020A0     1
635F52     2
68BCC5     1
84B68D     1
96709F     1


In [52]:
#Count of entries by color in non-transparent set
ntransparent_entries = colors_ntransparent.groupby(colors['rgb']).agg({'name':'count'})
print (ntransparent_entries.head())

        name
rgb         
000000     3
0033B2     1
0055BF     1
008F9B     1
05131D     2
