# Reading in Electronics Meta Data

## Reading JSON from meta_Electronics.json.gz

JSON Document Structure

```
{
  'asin': string (product id),
  'imUrl': string (url to hosted image on Amazon),
  'description': string (product description),
  'categories': [[strings]],
  'title': string (product name)
}
```

In [2]:
import gzip

META_ELECTRONICS_PATH = 'Datasets/meta_Electronics.json.gz'

def read_gzip_file(file_name):
    g = gzip.open(file_name, 'r')

    for line in g:
        document = eval(line)
        print (document)
        break # Break is included here to prevent overprinting of information.

read_gzip_file(META_ELECTRONICS_PATH)

{'asin': '0132793040', 'imUrl': 'http://ecx.images-amazon.com/images/I/31JIPhp%2BGIL.jpg', 'description': 'The Kelby Training DVD Mastering Blend Modes in Adobe Photoshop CS5 with Corey Barker is a useful tool for becoming familiar with the use of blend modes in Adobe Photoshop. For those who are serious about mastering all that Photoshop has to offer, mastering blend modes is just as important as mastering layers.In this DVD tutorial, seasoned expert Corey Barker explores the function of blend modes in a variety of scenarios such as image restoration, sharpening, adjustments, special effects and much more. Since every project scenario is different, Corey encourages you to experiment with these blend modes by giving you the skills and confidence you need.', 'categories': [['Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Monitor Accessories']], 'title': 'Kelby Training DVD: Mastering Blend Modes in Adobe Photoshop CS5 By Corey Barker'}


## Read Categories
Read categories from the electronics meta data. The code below reads the first ten elements in the electronics meta data and prints out their categories.

In [7]:
def read_categories():
    g = gzip.open(META_ELECTRONICS_PATH, 'r')
    
    i = 0
    for line in g:
        document = eval(line)
        print (document['categories'])
        i= i + 1
        if (i == 10):
            break

read_categories()

[['Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Monitor Accessories']]
[['Electronics', 'Computers & Accessories', 'Cables & Accessories', 'Monitor Accessories']]
[['Electronics', 'Computers & Accessories', 'PDAs, Handhelds & Accessories', 'PDAs & Handhelds']]
[['Electronics', 'Accessories & Supplies', 'Audio & Video Accessories', 'Remote Controls', 'TV Remote Controls']]
[['Electronics', 'GPS & Navigation', 'Vehicle GPS', 'Trucking GPS']]
[['Electronics', 'Accessories & Supplies', 'Audio & Video Accessories', 'Headphones']]
[['Electronics', 'eBook Readers & Accessories', 'Power Adapters']]
[['Electronics', 'eBook Readers & Accessories', 'Skins']]
[['Electronics', 'eBook Readers & Accessories', 'Covers']]
[['Electronics', 'eBook Readers & Accessories', 'Covers']]


## Generate Electronic Subcategory Dictionary
Read all the categories and essentially create a dictionary of dictionaries that contains the structure of all the categories. 


In [3]:
import gzip

def create_sub_category_dictionary():
    g = gzip.open(META_ELECTRONICS_PATH, 'r')
    
    # Initialize Sub Category Dictionary 
    sub_category_dictionary = {}

    for line in g:

        document = eval(line)
        
        # Grab the category information from the document
        categories = document['categories'] 
        
        # Each category is a list of list of strings. Iterate through them.
        for list in categories:
            
            # Keep a reference to the parent sub-directory to append to.
            current_category_element = sub_category_dictionary
            
            # Iterate through the list and check to see if each category has been found.
            for category in list:
                
                # Have not seen the category before, so add it as a child
                if category not in current_category_element:
                    current_category_element[category] = {}
                    
                # Category list is a hierachy, move to the just created category
                current_category_element = current_category_element[category]

    return sub_category_dictionary

print(create_sub_category_dictionary())

{'Electronics': {'Computers & Accessories': {'Cables & Accessories': {'Monitor Accessories': {'Screen Filters': {}, 'Screen Protectors': {}, 'Covers': {}}, 'Video Projector Accessories': {'Lamps': {}, 'Projector Bags & Cases': {'Projector Cases': {}, 'Projector Bags': {}}, 'Lenses': {}}, 'Mice': {}, 'Computer Cable Adapters': {'USB-to-USB Adapters': {}, 'DVI-HDMI Adapters': {}, 'Parallel Adapters': {}, 'Serial Adapters': {}, 'Firewire Adapters': {}, 'USB-to-VGA Adapters': {}, 'Gender Changers': {}, 'SCSI Adapters': {}}, 'Keyboards': {}, 'Memory Cards': {'Micro SD Cards': {}, 'SD & SDHC Cards': {}, 'SmartMedia Cards': {}, 'CompactFlash Cards': {}, 'Multimedia Cards': {}, 'Memory Sticks': {}, 'xD-Picture Cards': {}, 'MiniSD Cards': {}}, 'Computer Speakers': {}, 'Headsets & Microphones': {'PC Microphones': {}, 'PC Headsets': {}}, 'Cables & Interconnects': {'SATA Cables': {}, 'USB Cables': {}, 'VGA Cables': {}, 'Parallel Cables': {}, 'Modem Cables': {}, 'Serial Cables': {}, 'Ethernet Cable

## Displaying Categories

In [None]:
def prettyPrint(data):
    if data == {}:
        return
    for item in data:
        print('  {}'.format(item))
        prettyPrint(data[item])

def print_categories_sizes():
    sub_category_dictionary = create_sub_category_dictionary()
    prettyPrint(sub_category_dictionary)