# Amazon Reviews - Graphing Predictions for Analysis

In the previous stages of our data science pipeline, we've developed insights into the structure of our Amazon Reviews data, how NLP techniques can be used to classify different attributes (e.g. Product Category), and how we can then take the most appropriate method and scale it up.

We're now going to look at how we can use Amazon Neptune to develop a graphed version of the dataset we're exploring, and then use specific queries to retrieve analytical insights, which combine the predictions and the original datasets.

### Imports

The following imports are required in order to run different statistical tests and modelling techniques.

In [6]:
# NOTE: Uncomment the folllowing lines on first run of the notebook.
# !pip install --upgrade pip
# !pip install gremlin_python

[31mERROR: Could not find a version that satisfies the requirement gremlin_python (from versions: none)[0m
[31mERROR: No matching distribution found for gremlin_python[0m


In [6]:
import neptune
from neptune import Neptune
import boto3
import sagemaker
from s3_concat import S3Concat
import sys
import os
import re
import numpy as np
import pandas as pd
import subprocess
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri
import gzip
from io import BytesIO
import zipfile
import random
import json
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from sklearn.metrics import classification_report
import nltk
from fastparquet import write
from fastparquet import ParquetFile
import s3fs
import pyarrow.parquet as pq
import pickle
import glob
import ast 
import csv
import itertools
import dask.dataframe as dd
from dask.multiprocessing import get
import multiprocessing
import datetime

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from collections import OrderedDict

from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn import metrics

### Configs and Global Vars

Throughout the notebook we're going to store all our global variables (although all variables inside a notebook are global if they are not defined in a method), inside an array.

In [224]:
configs = {
    'aws_region' :  'us-east-1',
    'bucket_name': 'demos-amazon-reviews',
    'prefix' : 'preprocessed_reviews_csvs', #only use this if you want to have your files in a folder 
    'index_key' : 'review_date_str',
    'file_extension' :'.csv',
    'wordvecdata': 'wordvec-full-data',
    'models_dir': 'models',
    'label_column':'product_category',
    'text_column': 'review_body_processed',
    'class_labels_pickle_filename':'class_labels.pkl',
    'bt_model_name':'bt_model_demo_amazon_reviews'
}



In [None]:
#initilize empty
global_vars = {}

### Environment Setup

Setting up the environment involves ensuring all the corret session and IAM roles are configured. We also need to ensure the correct region and bucket is made available.

In [203]:
def setup_env(configs, global_vars):
    sess = sagemaker.Session()
    role = get_execution_role()
    AWS_REGION = configs['aws_region']
    s3 = boto3.resource('s3')
    s3_bucket = s3.Bucket(configs['bucket_name'])

    if s3_bucket.creation_date == None:
    # create S3 bucket because it does not exist yet
        print('Creating S3 bucket {}.'.format(bucket))
        resp = s3.create_bucket(
            ACL='private',
            Bucket=bucket
        )
    else:
        print('Bucket already exists')
        
    global_vars['role'] = role
    global_vars['sess'] = sess
    global_vars['s3'] = s3
    global_vars['s3_bucket'] = s3_bucket
    
    return global_vars

global_vars = setup_env(configs, global_vars)

Bucket already exists


In [9]:
neptune = Neptune()
neptune.remoteConnection(
    neptune_endpoint='demo-aws-reviews-cluster.cluster-c22rcgat36nr.us-east-1.neptune.amazonaws.com', 
    neptune_port=8282, 
    show_endpoint=True)


gremlin: wss://demo-aws-reviews-cluster.cluster-c22rcgat36nr.us-east-1.neptune.amazonaws.com:8282/gremlin


TypeError: must be str, not HTTPRequest

### Create Data Manifest

At this step, we need to create an index of all the files we're going to be using for this experiment and model building. Now, we don't want to download all of the data at once, or we're going to cause a lot of I/O activity for your Notebook Instance. 

What we're going to do is first create a path index to where the files live on S3. From there, we can do some sampling to get to see what the data looks like, do some basic sampling stats on the data, to get a better handle on how we should build a model, and then move to using all the data to build a robust model!

In [4]:
def create_dataset_manifest(configs, global_vars):
    interval_printer_idx = 100
    idx = 0
    1
    conn = global_vars['s3_bucket']
    file_format = configs['file_extension']
    index_key = configs['index_key']+'='
    s3_prefix = configs['prefix']+'/'
    manifest = []    
    for file in conn.objects.filter(Prefix=s3_prefix):
        path = file.key
#         print(file)
        if (file_format in path):
#             print(path)
            relative_path = path.replace(configs['prefix'],'')
            date = relative_path.split('/')[1].replace(index_key,'')

            man = {'idx':idx, 'path':relative_path, 'path_with_prefix':path, 'date':date}
            manifest.append(man)  
            idx += 1
            if (idx % interval_printer_idx) == 0:
                print('Processed {} files'.format(idx))
    print('Training Dataset Size {}'.format(len(manifest)))
    return manifest
            
manifest = create_dataset_manifest(configs, global_vars)   
    

Processed 100 files
Processed 200 files
Training Dataset Size 241


In [6]:
#sanity check that we have the right amount of data for a given file!
utils.count_s3_obj_lines(configs, global_vars, manifest[240])

5259983

## Transform and Upload Data to S3

In [204]:
def prep_data(df):
    '''
    Ensure that there are no labels/categories which only represent less than 1% of the total rows
    in the dataset. This will cause problems when trying to train the model'''
    df_len = df.shape[0]
    pct_min = 0.01
    min_product_category_row_count = df_len * pct_min #should be around 1% of the dataset, Imbalanced data will skew our modelling
    df = df.groupby('product_category').filter(lambda x : len(x)>min_product_category_row_count)
    return df



def prep_data_for_supervised_blazing_text_augmented(df, configs, labs, train_file_output_name, test_file_output_name, val_file_output_name):
    '''
        Prepare the input dataframe for use in AWS Supervised BlazingText service. 
        Load each of the df parts and transform the Review_Body, 
        transform it into a augmented manifest structure, and save the results to a tmp file (locally)
        return the updated label dictionary which will contain the mapping of label to idx.
    '''
    

    text_col = configs['text_column']
    label_col = configs['label_column']

    labels = df[label_col].tolist()
    #and tokenized words
    tmp = df[text_col]
    xs = []
    for entry in tmp:
        res = str(entry).strip('][').split(', ') 
        res = ' '.join(res)
        xs.append(res)
        
    #split the data into test and train for supervised mode
    X_train, X_test, y_train, y_test = train_test_split(
        xs, labels, test_size=0.2, random_state = 0)
    
    #then split our test into val and test
    X_test, X_val, y_test, y_val = train_test_split(
        X_test, y_test, test_size=0.2, random_state = 0)
    
    train_prepped = []
    #train
    for i in range(0, len(X_train)):
        src = str(X_train[i])
        if len(src)>10:
            
            label = str(y_train[i])
            if label in labs:
                lab_idx = labs[label]
            else:
                lab_idx = len(labs)
                labs[label] = lab_idx
                
            row = {'source':src,'label':lab_idx } 
            train_prepped.append(row)
    
    test_prepped = []
    #train
    for i in range(0, len(X_test)):
        src = str(X_test[i])
        if len(src)>10:
            
            label = str(y_test[i])
            if label in labs:
                lab_idx = labs[label]
            else:
                lab_idx = len(labs)
                labs[label] = lab_idx
            
            row = {'source':src,'label':lab_idx } 
            test_prepped.append(row)
            
    val_prepped = []
    #validate
    for i in range(0, len(X_val)):
        src = str(X_val[i])
        if len(src)>10:
            
            label = str(y_val[i])
            if label in labs:
                lab_idx = labs[label]
            else:
                lab_idx = len(labs)
                labs[label] = lab_idx
            
            row = {'source':src,'label':lab_idx } 
            val_prepped.append(row)
            
    
    with open(train_file_output_name, 'w') as outfile:
        for row in train_prepped:
            outfile.write(json.dumps(row)+'\n')

    with open(test_file_output_name, 'w') as outfile:
        for row in test_prepped:
            outfile.write(json.dumps(row)+'\n')
    
    with open(val_file_output_name, 'w') as outfile:
        for row in val_prepped:
            outfile.write(json.dumps(row)+'\n')
            
     
    return labs
        
        
def upload_corpus_to_s3(configs, global_vars, train_file , test_file, val_file):
    
    '''
    Upload Training, Test, and Validation datasets to S3 bucket
    '''
    
    train_prefix = 'train'
    test_prefix = 'test'
    val_prefix = 'validate'
    s3_bucket = global_vars['s3_bucket']
    
    sess = global_vars['sess']
    bucket = global_vars['s3_bucket']
   
    data_file_s3 = '{}/{}/{}'.format(configs['wordvecdata'], train_prefix, train_file)
    s3_bucket.upload_file(train_file, data_file_s3)   

    data_file_s3 = '{}/{}/{}'.format(configs['wordvecdata'], test_prefix, test_file)
    s3_bucket.upload_file(test_file, data_file_s3) 
    
    data_file_s3 = '{}/{}/{}'.format(configs['wordvecdata'], val_prefix, val_file)
    s3_bucket.upload_file(val_file, data_file_s3) 
    
    s3_train_data = 's3://{}/{}/{}'.format(configs['bucket_name'], configs['wordvecdata'], train_prefix)
    s3_test_data = 's3://{}/{}/{}'.format(configs['bucket_name'], configs['wordvecdata'], test_prefix)
    s3_val_data = 's3://{}/{}/{}'.format(configs['bucket_name'], configs['wordvecdata'], val_prefix)

    s3_output_location = 's3://{}/{}/output'.format(configs['bucket_name'], configs['wordvecdata'])
    
    configs['s3_w2v_train_data'] = s3_train_data
    configs['s3_w2v_test_data'] = s3_test_data
    configs['s3_w2v_validate_data'] = s3_val_data
    configs['s3_w2v_output_location'] = s3_output_location

    print('S3 Training Data Path {}'.format(s3_train_data))
    print('S3 Test Data Path {}'.format(s3_test_data))
    print('S3 Validate Data Path {}'.format(s3_val_data))

    print('S3 output Data Path {}'.format(s3_output_location))

    return configs

def remove_local_file(filename):
    
    os.remove(filename)
    
def download_transform_upload(configs, global_vars, manifest):
        
    #As we're dealing with a large dataset, we need to be strategic 
    labels = {}
    partNum = 0
    for entry in manifest:
        full_path = 's3://'+configs['bucket_name']+'/'+entry['path_with_prefix']
        df = pd.read_csv(full_path, header=0, error_bad_lines=False, escapechar="\\")
        print('Dataset Rows {}, Columns {}'.format(df.shape[0], df.shape[1]))
        df = prep_data(df)
        
        train_file = 'amazonreviews_part_{}.train'.format(partNum)
        test_file = 'amazonreviews_part_{}.test'.format(partNum)
        val_file = 'amazonreviews_part_{}.validate'.format(partNum)

        try:
            labels = prep_data_for_supervised_blazing_text_augmented(df, configs,labels, train_file, test_file, val_file)
            #upload new train file
            configs = upload_corpus_to_s3(configs, global_vars, train_file , test_file, val_file)         
            #delete local file
            remove_local_file(train_file)
            remove_local_file(test_file)
            remove_local_file(val_file)
            #increment part_number for filename
            partNum += 1
            print(labels)
        except Exception as e:
            print(e)
            print('Could not process File {}'.format(full_path))
            
    global_vars['labels'] = labels
    return global_vars

In [None]:
global_vars = download_transform_upload(configs, global_vars, manifest)

Dataset Rows 2, Columns 18
With n_samples=1, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
Could not process File s3://demos-amazon-reviews/preprocessed_reviews_csvs/review_date_str=1995-06/part-00078-68e43ae8-21e1-4196-882b-c61f318a06db.c000.csv
Dataset Rows 23, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0}
Dataset Rows 19, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0}
Dataset Rows 27, Columns 18


Dataset Rows 3472, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2}
Dataset Rows 2650, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2}
Dataset Rows 4682, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordve

b'Skipping line 1434: expected 18 fields, saw 20\n'


Dataset Rows 5214, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2}
Dataset Rows 6766, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2}
Dataset Rows 5199, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordve

  if self.run_code(code, result):


Dataset Rows 35460, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4}
Dataset Rows 44151, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}
Dataset Rows 79807, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/word

b'Skipping line 3060: expected 18 fields, saw 75\n'


Dataset Rows 90017, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}


b'Skipping line 79224: expected 18 fields, saw 23\n'


Dataset Rows 81109, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}
Dataset Rows 65477, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}


  if self.run_code(code, result):


Dataset Rows 71941, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}


b'Skipping line 10724: expected 18 fields, saw 19\n'
b'Skipping line 46605: expected 18 fields, saw 37\n'


Dataset Rows 64475, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}
Dataset Rows 66688, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}
Dataset Rows 70538, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-a

b'Skipping line 32153: expected 18 fields, saw 21\n'


Dataset Rows 72207, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}


b'Skipping line 4994: expected 18 fields, saw 20\n'


Dataset Rows 77716, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}
Dataset Rows 82284, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5}
Dataset Rows 81643, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-a

b'Skipping line 19642: expected 18 fields, saw 19\n'


Dataset Rows 70447, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9}
Dataset Rows 79025, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9}
Dataset Rows 75563, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-f

b'Skipping line 8011: expected 18 fields, saw 26\n'


Dataset Rows 71710, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9}
Dataset Rows 66602, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9}
Dataset Rows 78862, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-f

b'Skipping line 45262: expected 18 fields, saw 23\n'


Dataset Rows 75373, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12}


b'Skipping line 7114: expected 18 fields, saw 26\n'


Dataset Rows 81213, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12}
Dataset Rows 78506, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby'

b'Skipping line 79951: expected 18 fields, saw 22\n'


Dataset Rows 96123, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12}


b'Skipping line 4799: expected 18 fields, saw 30\n'


Dataset Rows 101887, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12}
Dataset Rows 91068, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby

b'Skipping line 165406: expected 18 fields, saw 20\n'


Dataset Rows 330471, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29}
Dataset Rows 296907, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/t

b'Skipping line 259826: expected 18 fields, saw 19\n'


Dataset Rows 264590, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29}


b'Skipping line 444: expected 18 fields, saw 19\n'
b'Skipping line 265664: expected 18 fields, saw 19\n'


Dataset Rows 268183, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29}
Dataset Rows 263428, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/t

b'Skipping line 53053: expected 18 fields, saw 25\n'


Dataset Rows 313183, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29}


b'Skipping line 258037: expected 18 fields, saw 49\n'


Dataset Rows 338142, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29}


b'Skipping line 62844: expected 18 fields, saw 20\n'


Dataset Rows 339632, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29}
Dataset Rows 355057, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/t

b'Skipping line 365118: expected 18 fields, saw 19\n'


Dataset Rows 435037, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30}


b'Skipping line 116476: expected 18 fields, saw 20\n'
b'Skipping line 183741: expected 18 fields, saw 21\n'
b'Skipping line 243671: expected 18 fields, saw 19\n'


Dataset Rows 515924, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30}


b'Skipping line 78688: expected 18 fields, saw 21\n'
b'Skipping line 104343: expected 18 fields, saw 22\n'
b'Skipping line 194984: expected 18 fields, saw 24\n'
b'Skipping line 255976: expected 18 fields, saw 22\n'
b'Skipping line 283054: expected 18 fields, saw 22\n'


Dataset Rows 407372, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30}


b'Skipping line 6946: expected 18 fields, saw 23\n'
b'Skipping line 345065: expected 18 fields, saw 26\n'
b'Skipping line 412831: expected 18 fields, saw 22\n'
b'Skipping line 431015: expected 18 fields, saw 20\n'


Dataset Rows 448239, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 267725: expected 18 fields, saw 19\n'
b'Skipping line 429699: expected 18 fields, saw 21\n'


Dataset Rows 430575, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}
Dataset Rows 429266, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-

b'Skipping line 146896: expected 18 fields, saw 19\n'


Dataset Rows 452754, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 43114: expected 18 fields, saw 26\n'
b'Skipping line 229635: expected 18 fields, saw 21\n'


Dataset Rows 478487, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 154403: expected 18 fields, saw 21\n'
b'Skipping line 452308: expected 18 fields, saw 20\n'


Dataset Rows 509161, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 53188: expected 18 fields, saw 19\n'
b'Skipping line 262789: expected 18 fields, saw 23\n'
b'Skipping line 388306: expected 18 fields, saw 19\n'
b'Skipping line 470556: expected 18 fields, saw 45\n'


Dataset Rows 533598, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 544948: expected 18 fields, saw 19\n'


Dataset Rows 545043, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 16264: expected 18 fields, saw 26\n'
b'Skipping line 172230: expected 18 fields, saw 22\n'
b'Skipping line 332015: expected 18 fields, saw 21\n'
b'Skipping line 371153: expected 18 fields, saw 19\n'
b'Skipping line 423263: expected 18 fields, saw 22\n'
b'Skipping line 485818: expected 18 fields, saw 44\n'


Dataset Rows 568545, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 179135: expected 18 fields, saw 19\n'
b'Skipping line 212258: expected 18 fields, saw 19\n'
b'Skipping line 646708: expected 18 fields, saw 19\n'
b'Skipping line 655386: expected 18 fields, saw 21\n'


Dataset Rows 743941, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 105653: expected 18 fields, saw 23\n'
b'Skipping line 849719: expected 18 fields, saw 20\n'


Dataset Rows 859212, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 666460: expected 18 fields, saw 20\n'


Dataset Rows 671965, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 12362: expected 18 fields, saw 19\n'
b'Skipping line 486586: expected 18 fields, saw 35\n'


Dataset Rows 725497, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 250802: expected 18 fields, saw 28\n'
b'Skipping line 372473: expected 18 fields, saw 21\n'
b'Skipping line 427570: expected 18 fields, saw 26\n'


Dataset Rows 670426, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}
Dataset Rows 685392, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-

b'Skipping line 166263: expected 18 fields, saw 20\n'
b'Skipping line 643436: expected 18 fields, saw 22\n'
b'Skipping line 684144: expected 18 fields, saw 19\n'


Dataset Rows 699200, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 207449: expected 18 fields, saw 20\n'
b'Skipping line 439715: expected 18 fields, saw 27\n'


Dataset Rows 757924, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 10196: expected 18 fields, saw 40\n'
b'Skipping line 179501: expected 18 fields, saw 20\n'


Dataset Rows 772142, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31}


b'Skipping line 169739: expected 18 fields, saw 19\n'
b'Skipping line 325163: expected 18 fields, saw 25\n'
b'Skipping line 426128: expected 18 fields, saw 24\n'
b'Skipping line 473867: expected 18 fields, saw 20\n'
b'Skipping line 830764: expected 18 fields, saw 20\n'
b'Skipping line 901447: expected 18 fields, saw 22\n'


Dataset Rows 915242, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 276669: expected 18 fields, saw 20\nSkipping line 288877: expected 18 fields, saw 19\n'
b'Skipping line 295408: expected 18 fields, saw 23\n'
b'Skipping line 846905: expected 18 fields, saw 22\n'


Dataset Rows 1025940, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 40451: expected 18 fields, saw 21\n'
b'Skipping line 519374: expected 18 fields, saw 21\n'
b'Skipping line 774344: expected 18 fields, saw 20\n'
b'Skipping line 799837: expected 18 fields, saw 44\n'
b'Skipping line 1195376: expected 18 fields, saw 30\n'


Dataset Rows 1304875, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 1323002: expected 18 fields, saw 19\n'
b'Skipping line 1504371: expected 18 fields, saw 26\n'
b'Skipping line 2172111: expected 18 fields, saw 24\n'


Dataset Rows 2388762, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 107662: expected 18 fields, saw 19\n'
b'Skipping line 182450: expected 18 fields, saw 19\n'
b'Skipping line 242822: expected 18 fields, saw 21\n'
b'Skipping line 553668: expected 18 fields, saw 21\n'
b'Skipping line 1107339: expected 18 fields, saw 19\n'
b'Skipping line 1394760: expected 18 fields, saw 19\n'
b'Skipping line 1564840: expected 18 fields, saw 19\n'
b'Skipping line 1800362: expected 18 fields, saw 25\n'
b'Skipping line 1942145: expected 18 fields, saw 20\n'


Dataset Rows 2826699, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 547502: expected 18 fields, saw 20\n'
b'Skipping line 641957: expected 18 fields, saw 19\n'
b'Skipping line 1181218: expected 18 fields, saw 32\n'
b'Skipping line 1368433: expected 18 fields, saw 20\n'
b'Skipping line 1715896: expected 18 fields, saw 39\n'
b'Skipping line 1950271: expected 18 fields, saw 20\n'
b'Skipping line 2112904: expected 18 fields, saw 21\n'


Dataset Rows 2227541, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 339715: expected 18 fields, saw 20\n'
b'Skipping line 374722: expected 18 fields, saw 21\n'
b'Skipping line 909974: expected 18 fields, saw 25\n'
b'Skipping line 1255391: expected 18 fields, saw 21\n'
b'Skipping line 1470915: expected 18 fields, saw 19\n'
b'Skipping line 1849468: expected 18 fields, saw 21\n'
b'Skipping line 2003151: expected 18 fields, saw 30\n'


Dataset Rows 2329497, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 595037: expected 18 fields, saw 21\n'
b'Skipping line 1106871: expected 18 fields, saw 22\n'
b'Skipping line 1467716: expected 18 fields, saw 22\nSkipping line 1470303: expected 18 fields, saw 21\n'
b'Skipping line 1648052: expected 18 fields, saw 27\n'
b'Skipping line 1688088: expected 18 fields, saw 20\n'


Dataset Rows 2157630, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 474146: expected 18 fields, saw 20\n'
b'Skipping line 644693: expected 18 fields, saw 21\nSkipping line 649363: expected 18 fields, saw 19\n'
b'Skipping line 1350362: expected 18 fields, saw 26\n'
b'Skipping line 1482286: expected 18 fields, saw 19\n'
b'Skipping line 1732499: expected 18 fields, saw 19\n'
b'Skipping line 2129025: expected 18 fields, saw 30\n'


Dataset Rows 2194397, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 25815: expected 18 fields, saw 19\n'
b'Skipping line 339384: expected 18 fields, saw 22\n'
b'Skipping line 1039049: expected 18 fields, saw 28\n'
b'Skipping line 1106344: expected 18 fields, saw 19\n'
b'Skipping line 1783866: expected 18 fields, saw 20\n'
b'Skipping line 1886230: expected 18 fields, saw 36\n'
b'Skipping line 2110220: expected 18 fields, saw 19\n'


Dataset Rows 2137520, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 110630: expected 18 fields, saw 19\n'
b'Skipping line 927906: expected 18 fields, saw 19\n'
b'Skipping line 1103799: expected 18 fields, saw 26\n'
b'Skipping line 1212568: expected 18 fields, saw 19\n'
b'Skipping line 1833049: expected 18 fields, saw 20\n'
b'Skipping line 2123686: expected 18 fields, saw 21\n'


Dataset Rows 2265414, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 45533: expected 18 fields, saw 19\n'
b'Skipping line 425180: expected 18 fields, saw 19\n'
b'Skipping line 704610: expected 18 fields, saw 19\n'
b'Skipping line 1471368: expected 18 fields, saw 20\n'
b'Skipping line 1534495: expected 18 fields, saw 22\n'
b'Skipping line 1886316: expected 18 fields, saw 20\nSkipping line 1891786: expected 18 fields, saw 21\n'


Dataset Rows 2363126, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 109611: expected 18 fields, saw 19\n'
b'Skipping line 305090: expected 18 fields, saw 48\n'
b'Skipping line 362402: expected 18 fields, saw 26\n'
b'Skipping line 890448: expected 18 fields, saw 20\n'
b'Skipping line 1287500: expected 18 fields, saw 20\n'
b'Skipping line 1663359: expected 18 fields, saw 21\n'
b'Skipping line 1800578: expected 18 fields, saw 19\n'
b'Skipping line 1803010: expected 18 fields, saw 19\n'


Dataset Rows 2093597, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 172951: expected 18 fields, saw 20\nSkipping line 176602: expected 18 fields, saw 22\n'
b'Skipping line 507022: expected 18 fields, saw 20\n'
b'Skipping line 603880: expected 18 fields, saw 19\n'
b'Skipping line 665048: expected 18 fields, saw 21\n'
b'Skipping line 997271: expected 18 fields, saw 19\n'
b'Skipping line 1191284: expected 18 fields, saw 19\n'
b'Skipping line 1544984: expected 18 fields, saw 24\n'
b'Skipping line 1578878: expected 18 fields, saw 20\n'


Dataset Rows 2303639, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 346786: expected 18 fields, saw 24\n'
b'Skipping line 376777: expected 18 fields, saw 20\n'
b'Skipping line 437764: expected 18 fields, saw 19\n'
b'Skipping line 601927: expected 18 fields, saw 23\n'
b'Skipping line 1017710: expected 18 fields, saw 23\n'
b'Skipping line 1350411: expected 18 fields, saw 20\n'
b'Skipping line 1723666: expected 18 fields, saw 20\n'
b'Skipping line 1943655: expected 18 fields, saw 19\n'


Dataset Rows 2147647, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 392761: expected 18 fields, saw 19\n'
b'Skipping line 597522: expected 18 fields, saw 19\n'
b'Skipping line 790320: expected 18 fields, saw 20\n'
b'Skipping line 1102921: expected 18 fields, saw 19\n'
b'Skipping line 1368570: expected 18 fields, saw 19\n'
b'Skipping line 1461203: expected 18 fields, saw 22\n'
b'Skipping line 1729183: expected 18 fields, saw 21\n'
b'Skipping line 1981788: expected 18 fields, saw 20\n'
b'Skipping line 2113686: expected 18 fields, saw 19\n'
b'Skipping line 2900715: expected 18 fields, saw 33\n'


Dataset Rows 2905588, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 11196: expected 18 fields, saw 22\n'
b'Skipping line 374306: expected 18 fields, saw 21\nSkipping line 385562: expected 18 fields, saw 24\n'
b'Skipping line 1237776: expected 18 fields, saw 19\n'
b'Skipping line 2111518: expected 18 fields, saw 21\n'
b'Skipping line 2385917: expected 18 fields, saw 19\n'
b'Skipping line 3008194: expected 18 fields, saw 20\n'
b'Skipping line 3095815: expected 18 fields, saw 19\n'
b'Skipping line 3195837: expected 18 fields, saw 21\n'


Dataset Rows 3587244, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 31529: expected 18 fields, saw 19\n'
b'Skipping line 37231: expected 18 fields, saw 20\n'
b'Skipping line 235882: expected 18 fields, saw 19\n'
b'Skipping line 314897: expected 18 fields, saw 20\n'
b'Skipping line 544557: expected 18 fields, saw 20\n'
b'Skipping line 1017556: expected 18 fields, saw 36\n'
b'Skipping line 1084260: expected 18 fields, saw 20\n'


Dataset Rows 2829855, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 243150: expected 18 fields, saw 32\n'
b'Skipping line 1468142: expected 18 fields, saw 22\n'
b'Skipping line 1867001: expected 18 fields, saw 19\n'
b'Skipping line 1953176: expected 18 fields, saw 23\n'


Dataset Rows 3022435, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 272799: expected 18 fields, saw 25\n'
b'Skipping line 344949: expected 18 fields, saw 20\n'
b'Skipping line 460948: expected 18 fields, saw 19\nSkipping line 474877: expected 18 fields, saw 28\n'
b'Skipping line 1078278: expected 18 fields, saw 19\n'
b'Skipping line 1493872: expected 18 fields, saw 31\n'
b'Skipping line 1599672: expected 18 fields, saw 20\n'
b'Skipping line 2284529: expected 18 fields, saw 22\n'
b'Skipping line 2592035: expected 18 fields, saw 22\n'


Dataset Rows 2682400, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 3530: expected 18 fields, saw 23\n'
b'Skipping line 406300: expected 18 fields, saw 19\n'
b'Skipping line 469217: expected 18 fields, saw 20\n'
b'Skipping line 1188257: expected 18 fields, saw 23\n'


Dataset Rows 2638420, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 839478: expected 18 fields, saw 19\n'
b'Skipping line 1072193: expected 18 fields, saw 21\n'
b'Skipping line 1470670: expected 18 fields, saw 19\n'
b'Skipping line 2020565: expected 18 fields, saw 21\n'
b'Skipping line 2113841: expected 18 fields, saw 19\nSkipping line 2115355: expected 18 fields, saw 23\n'


Dataset Rows 2708013, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 453892: expected 18 fields, saw 19\n'
b'Skipping line 973358: expected 18 fields, saw 27\n'
b'Skipping line 1318559: expected 18 fields, saw 22\n'
b'Skipping line 1393960: expected 18 fields, saw 21\n'
b'Skipping line 1856130: expected 18 fields, saw 19\n'
b'Skipping line 2109455: expected 18 fields, saw 28\n'
b'Skipping line 2248178: expected 18 fields, saw 22\n'
b'Skipping line 2390685: expected 18 fields, saw 20\nSkipping line 2390691: expected 18 fields, saw 24\n'
b'Skipping line 2799889: expected 18 fields, saw 19\n'
b'Skipping line 2980239: expected 18 fields, saw 21\n'
b'Skipping line 3301518: expected 18 fields, saw 20\n'
b'Skipping line 3845988: expected 18 fields, saw 21\n'


Dataset Rows 4064186, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 6397: expected 18 fields, saw 25\nSkipping line 10732: expected 18 fields, saw 19\n'
b'Skipping line 100070: expected 18 fields, saw 36\n'
b'Skipping line 490471: expected 18 fields, saw 26\n'
b'Skipping line 533342: expected 18 fields, saw 26\n'
b'Skipping line 716501: expected 18 fields, saw 21\n'
b'Skipping line 1090740: expected 18 fields, saw 19\n'
b'Skipping line 1281270: expected 18 fields, saw 19\n'
b'Skipping line 1789827: expected 18 fields, saw 21\n'
b'Skipping line 1822817: expected 18 fields, saw 20\n'
b'Skipping line 2099836: expected 18 fields, saw 19\n'
b'Skipping line 2421976: expected 18 fields, saw 23\n'
b'Skipping line 2861821: expected 18 fields, saw 26\n'
b'Skipping line 3687915: expected 18 fields, saw 27\n'
b'Skipping line 4023985: expected 18 fields, saw 23\n'


Dataset Rows 4146029, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 1309528: expected 18 fields, saw 21\n'
b'Skipping line 1501414: expected 18 fields, saw 20\n'
b'Skipping line 2159454: expected 18 fields, saw 24\n'
b'Skipping line 2625075: expected 18 fields, saw 19\nSkipping line 2628095: expected 18 fields, saw 24\n'
b'Skipping line 3017005: expected 18 fields, saw 23\n'


Dataset Rows 3956179, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 129716: expected 18 fields, saw 19\n'
b'Skipping line 335169: expected 18 fields, saw 21\n'
b'Skipping line 754967: expected 18 fields, saw 25\n'
b'Skipping line 1321485: expected 18 fields, saw 23\n'
b'Skipping line 1715179: expected 18 fields, saw 21\nSkipping line 1722715: expected 18 fields, saw 23\n'
b'Skipping line 3044482: expected 18 fields, saw 20\n'
b'Skipping line 3381768: expected 18 fields, saw 22\n'
b'Skipping line 3409741: expected 18 fields, saw 28\nSkipping line 3434558: expected 18 fields, saw 21\n'
b'Skipping line 3941093: expected 18 fields, saw 29\n'


Dataset Rows 4224033, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 213964: expected 18 fields, saw 20\n'
b'Skipping line 1057558: expected 18 fields, saw 20\n'
b'Skipping line 1435053: expected 18 fields, saw 30\n'
b'Skipping line 3199147: expected 18 fields, saw 27\n'


Dataset Rows 4139016, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 3019488: expected 18 fields, saw 21\n'
b'Skipping line 3798245: expected 18 fields, saw 21\n'
b'Skipping line 4444169: expected 18 fields, saw 22\n'
b'Skipping line 4581898: expected 18 fields, saw 24\n'
b'Skipping line 4728182: expected 18 fields, saw 19\n'
b'Skipping line 5203522: expected 18 fields, saw 19\n'
b'Skipping line 5270395: expected 18 fields, saw 20\n'
b'Skipping line 5326682: expected 18 fields, saw 23\n'


Dataset Rows 5371561, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 551412: expected 18 fields, saw 19\n'
b'Skipping line 790934: expected 18 fields, saw 20\n'
b'Skipping line 1195394: expected 18 fields, saw 22\n'
b'Skipping line 1443641: expected 18 fields, saw 29\n'
b'Skipping line 3011004: expected 18 fields, saw 19\n'
b'Skipping line 3825037: expected 18 fields, saw 19\n'
b'Skipping line 4594412: expected 18 fields, saw 24\n'
b'Skipping line 4940090: expected 18 fields, saw 22\n'
b'Skipping line 5249010: expected 18 fields, saw 25\n'


Dataset Rows 5498159, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 380807: expected 18 fields, saw 28\n'
b'Skipping line 560494: expected 18 fields, saw 19\n'
b'Skipping line 697659: expected 18 fields, saw 33\n'
b'Skipping line 1273621: expected 18 fields, saw 21\n'
b'Skipping line 1444690: expected 18 fields, saw 26\n'
b'Skipping line 1874542: expected 18 fields, saw 19\n'
b'Skipping line 2630979: expected 18 fields, saw 19\n'
b'Skipping line 3307929: expected 18 fields, saw 21\n'
b'Skipping line 3498867: expected 18 fields, saw 28\n'
b'Skipping line 3668380: expected 18 fields, saw 22\n'
b'Skipping line 3854370: expected 18 fields, saw 21\n'
b'Skipping line 3981179: expected 18 fields, saw 21\n'
b'Skipping line 4643530: expected 18 fields, saw 20\n'


Dataset Rows 5076247, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 163084: expected 18 fields, saw 23\n'
b'Skipping line 226951: expected 18 fields, saw 21\n'
b'Skipping line 1296093: expected 18 fields, saw 23\n'
b'Skipping line 2527761: expected 18 fields, saw 19\n'
b'Skipping line 3913350: expected 18 fields, saw 19\n'


Dataset Rows 5529430, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 4361: expected 18 fields, saw 19\n'
b'Skipping line 403825: expected 18 fields, saw 20\n'
b'Skipping line 1750832: expected 18 fields, saw 20\n'
b'Skipping line 2150291: expected 18 fields, saw 20\n'
b'Skipping line 2833078: expected 18 fields, saw 20\n'
b'Skipping line 3127176: expected 18 fields, saw 23\n'
b'Skipping line 4170957: expected 18 fields, saw 19\nSkipping line 4181051: expected 18 fields, saw 24\n'


Dataset Rows 4827214, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 10707: expected 18 fields, saw 19\n'
b'Skipping line 483511: expected 18 fields, saw 19\n'
b'Skipping line 568213: expected 18 fields, saw 19\n'
b'Skipping line 1199638: expected 18 fields, saw 19\n'
b'Skipping line 1606838: expected 18 fields, saw 19\n'
b'Skipping line 1909690: expected 18 fields, saw 19\n'
b'Skipping line 2009220: expected 18 fields, saw 24\n'
b'Skipping line 2045041: expected 18 fields, saw 20\n'
b'Skipping line 3004554: expected 18 fields, saw 19\n'
b'Skipping line 3759389: expected 18 fields, saw 23\n'
b'Skipping line 4318231: expected 18 fields, saw 32\n'
b'Skipping line 4479895: expected 18 fields, saw 19\n'


Dataset Rows 4739672, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 791124: expected 18 fields, saw 19\nSkipping line 815704: expected 18 fields, saw 21\n'
b'Skipping line 930360: expected 18 fields, saw 25\n'
b'Skipping line 1896268: expected 18 fields, saw 20\n'
b'Skipping line 2138116: expected 18 fields, saw 19\n'
b'Skipping line 2361543: expected 18 fields, saw 20\n'
b'Skipping line 3252623: expected 18 fields, saw 19\n'
b'Skipping line 3773000: expected 18 fields, saw 24\n'
b'Skipping line 4018364: expected 18 fields, saw 24\n'
b'Skipping line 4673395: expected 18 fields, saw 19\n'
b'Skipping line 4706059: expected 18 fields, saw 20\n'


Dataset Rows 4774088, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 782600: expected 18 fields, saw 20\n'
b'Skipping line 2904351: expected 18 fields, saw 19\n'
b'Skipping line 3172892: expected 18 fields, saw 42\n'
b'Skipping line 3224820: expected 18 fields, saw 51\n'
b'Skipping line 3501485: expected 18 fields, saw 21\n'
b'Skipping line 3528793: expected 18 fields, saw 19\n'
b'Skipping line 4126729: expected 18 fields, saw 19\n'
b'Skipping line 4201875: expected 18 fields, saw 20\n'
b'Skipping line 4788045: expected 18 fields, saw 20\n'


Dataset Rows 5144760, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


b'Skipping line 2354843: expected 18 fields, saw 20\n'
b'Skipping line 4076974: expected 18 fields, saw 20\n'
b'Skipping line 4685772: expected 18 fields, saw 23\n'
b'Skipping line 4883245: expected 18 fields, saw 27\n'


Dataset Rows 5259979, Columns 18
S3 Training Data Path s3://demos-amazon-reviews/wordvec-full-data/train
S3 Test Data Path s3://demos-amazon-reviews/wordvec-full-data/test
S3 Validate Data Path s3://demos-amazon-reviews/wordvec-full-data/validate
S3 output Data Path s3://demos-amazon-reviews/wordvec-full-data/output
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Camera': 6, 'Office_Products': 7, 'PC': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Sports': 17, 'Beauty': 18, 'Home_Entertainment': 19, 'Shoes': 20, 'Tools': 21, 'Apparel': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Digital_Music_Purchase': 25, 'Outdoors': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


In [238]:
#as blazingtext pipe only supports one augmented file for train and test, let's concat them all
def concat_augmented_files(configs, global_vars):
    
    #output filename
    concatenated_file_train = 'amazon_augmented_train.json'
    concatenated_file_test = 'amazon_augmented_test.json'
    concatenated_file_val = 'amazon_augmented_validate.json'

    
    #where all our files sit
    train_prefix = 'train'
    test_prefix = 'test'
    val_prefix = 'validate'
    
    s3_train_path = '{}/{}/'.format(configs['wordvecdata'], train_prefix)
    s3_test_path = '{}/{}/'.format(configs['wordvecdata'], test_prefix)
    s3_val_path = '{}/{}/'.format(configs['wordvecdata'], val_prefix)

    
    s3_concat_file_path_train = '{}/{}/{}'.format(configs['wordvecdata'], train_prefix, concatenated_file_train)
    s3_concat_file_path_test = '{}/{}/{}'.format(configs['wordvecdata'], test_prefix, concatenated_file_test)  
    s3_concat_file_path_val = '{}/{}/{}'.format(configs['wordvecdata'], val_prefix, concatenated_file_val)

    print(s3_concat_file_path_train)
    print(s3_concat_file_path_test)
    print(s3_concat_file_path_val)


    min_file_size = None

    #train file
    job_train = S3Concat(configs['bucket_name'], 
                         s3_concat_file_path_train, 
                         min_file_size,
                         content_type='application/json',
                         session=boto3.session.Session()
                        )
    
    job_train.add_files(s3_train_path)
    job_train.concat(small_parts_threads=32)

    
    #test file
    job_test = S3Concat(configs['bucket_name'], 
                         s3_concat_file_path_test, 
                         min_file_size,
                         content_type='application/json',
                         session=boto3.session.Session()
                        )
    
    job_test.add_files(s3_test_path)
    job_test.concat(small_parts_threads=32)
    
    
    #val file
    job_val = S3Concat(configs['bucket_name'], 
                         s3_concat_file_path_val, 
                         min_file_size,
                         content_type='application/json',
                         session=boto3.session.Session()
                        )
    
    job_val.add_files(s3_val_path)
    job_val.concat(small_parts_threads=32)
    
    
    configs['s3_w2v_train_file'] = s3_concat_file_path_train
    configs['s3_w2v_test_file'] = s3_concat_file_path_test
    configs['s3_w2v_validate_file'] = s3_concat_file_path_val

    return configs

configs = concat_augmented_files(configs, global_vars)


wordvec-full-data/train/amazon_augmented_train.json
wordvec-full-data/test/amazon_augmented_test.json
wordvec-full-data/validate/amazon_augmented_validate.json


### Save the Label Mapping

As our model is going to be trained using numerical labels which represent our product_category label (e.g. Books), we need to store our mapping (Label:idx) in order to obtain the correct mapping during inferencing.

In [153]:
def save_labels_lookup(labels, filename = 'class_labels.pkl'):
    
    pickle.dump(labels,open(filename, "wb" ) )
    
save_labels_lookup(global_vars['labels'])

## Model /Analysis Experimentation (Local Mode)

The purpose of this section is to perform some experimentations with different modelling techniques.

We're first going to perform some local experiments on the 1% sample of data to see which methods provide valuable insights for both customers (e.g. Amazon Customer), and operations (e.g. Amazon). 

We want to look at different type of insights, from understanding how customer reviews have changed over times, and whether there is predictability in the type of review, and the category of product it is related to. 

Let's start of by first gettign our data into a shape which we can use for analysis and modelling purposes

### Prep Data for Modelling Purposes

We're going to develop some dataframes which represent our Xs and Ys (features and labels).

Let's create some feature/label datasets which are shaped around the following labels:

- year_product-category
- product-category_star_rating

The features for this model will be only using the text of the reviews





### Word Embeddings Using BlazingText (Supervised)

BlazingText expects a single preprocessed text file with space separated tokens and each line of the file should contain a single sentence and the corresponding label(s) prefixed by "_ _label_ _".

As we're now using the complete dataset, we'll need to use the Augmented dataset structure and use `Pipe` mode in  order to allow for streaming of data, rather than loading all the data into memory in one go.

Augmented Data Structure

```json
{'source':'string', 'label':'string'}
{'source':'string', 'label':'string'}
```

Note, the structure are single json entries, per line

**Picking Hyperparameters**

As we're now working with a much larger dataset, we need to be conscience of the hyperparameters which we choose, as these can have a serious impact on how well our model performs

As we're looking at using Word2Vec in a supervised mode (e.g. with labelled data), we have the option of using several additional parameters in addition to the default set listed in the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html)

Some of the key hyperparameters which need to be considered are:

- Vector Dimension: The larger the vector, the more information is encoded, however, this requires a significant amount of resources. A vector size above 300 tends to yeild deminishing returns. Further reading can be found in the paper [Glove: Global Vectors for Word Representation](https://www.aclweb.org/anthology/D14-1162.pdf)
- 

In [187]:
def configure_estimator(configs, global_vars):
    
    region_name = configs['aws_region'] 
    sess = global_vars['sess']
    container = sagemaker.amazon.amazon_estimator.get_image_uri(region_name, "blazingtext", "latest")
    print('Using SageMaker BlazingText container: {} ({})'.format(container, region_name))

    bt_model = sagemaker.estimator.Estimator(container,
                                         global_vars['role'], 
                                         train_instance_count=1, 
                                         train_instance_type='ml.c5.18xlarge',
                                         train_volume_size = 150,
                                         train_max_run = 360000,
                                         input_mode= 'Pipe',
                                         output_path=configs['s3_w2v_output_location'],
                                         sagemaker_session=sess)
    
    bt_model.set_hyperparameters(mode="supervised",
                                 epochs=20,
                                 min_count=2,
                                 learning_rate=0.05,
                                 vector_dim=300,
                                 early_stopping=True,
                                 patience=4,
                                 min_epochs=10,
                                 word_ngrams=4,
                                subwords=True,)
    

    
   
    global_vars['bt_model'] = bt_model
    
    return global_vars

global_vars = configure_estimator(configs, global_vars)

Using SageMaker BlazingText container: 811284229777.dkr.ecr.us-east-1.amazonaws.com/blazingtext:latest (us-east-1)


In [188]:
def configure_data_channels(configs, global_vars):
    

    s3train_manifest = 's3://{}/{}'.format(configs['bucket_name'],configs['s3_w2v_train_file'])
    s3validation_manifest = 's3://{}/{}'.format(configs['bucket_name'],configs['s3_w2v_test_file'])
    
    attribute_names = ["source","label"]

    
    train_data = sagemaker.session.s3_input(s3train_manifest, 
                                            distribution='FullyReplicated', 
                                            content_type='application/jsonlines', 
                                            s3_data_type='AugmentedManifestFile',
                                            attribute_names=attribute_names,
                                            record_wrapping='RecordIO' 
                                           )
    
    validation_data = sagemaker.session.s3_input(s3validation_manifest, 
                                                 distribution='FullyReplicated', 
                                                 content_type='application/jsonlines', 
                                                 s3_data_type='AugmentedManifestFile',
                                                 attribute_names=attribute_names,
                                                 record_wrapping='RecordIO'
                                                )
    
    data_channels = {'train': train_data, 'validation': validation_data}
    
    global_vars['data_channels'] = data_channels

    return global_vars

global_vars = configure_data_channels(configs, global_vars)
                                        

In [None]:
def fit_model(configs, global_vars):
    
    bt_model = global_vars['bt_model']
    data_channels = global_vars['data_channels']
    bt_model.fit(inputs=data_channels, logs=True)
    
fit_model(configs, global_vars)

2020-05-04 02:07:45 Starting - Starting the training job...
2020-05-04 02:07:46 Starting - Launching requested ML instances......
2020-05-04 02:08:56 Starting - Preparing the instances for training...
2020-05-04 02:09:38 Downloading - Downloading input data..................................................................
2020-05-04 02:20:53 Training - Training image download completed. Training in progress.[34mArguments: train[0m
[34m[05/04/2020 02:20:54 INFO 140077378238272] nvidia-smi took: 0.0252711772919 secs to identify 0 gpus[0m
[34m[05/04/2020 02:20:54 INFO 140077378238272] Running single machine CPU BlazingText training using supervised mode.[0m
[34m[05/04/2020 02:20:54 INFO 140077378238272] Switching off subword embedding mode as it is only supported by cbow and skipgram.[0m
[34mRead 10M words[0m
[34mRead 20M words[0m
[34mRead 30M words[0m
[34mRead 40M words[0m
[34mRead 50M words[0m
[34mRead 60M words[0m
[34mRead 70M words[0m
[34mRead 80M words[0m
[34

[34mRead 2840M words[0m
[34mRead 2850M words[0m
[34mRead 2860M words[0m
[34mRead 2870M words[0m
[34mRead 2880M words[0m
[34mRead 2890M words[0m
[34mRead 2900M words[0m
[34mRead 2910M words[0m
[34mRead 2920M words[0m
[34mRead 2930M words[0m
[34mRead 2940M words[0m
[34mRead 2950M words[0m
[34mRead 2960M words[0m
[34mRead 2970M words[0m
[34mRead 2980M words[0m
[34mRead 2990M words[0m
[34mRead 3000M words[0m
[34mRead 3010M words[0m
[34mRead 3020M words[0m
[34mRead 3030M words[0m
[34mRead 3040M words[0m
[34mRead 3050M words[0m
[34mRead 3060M words[0m
[34mRead 3070M words[0m
[34mRead 3080M words[0m
[34mRead 3090M words[0m
[34mRead 3100M words[0m
[34mRead 3110M words[0m
[34mRead 3120M words[0m
[34mRead 3130M words[0m
[34mRead 3140M words[0m
[34mRead 3150M words[0m
[34mRead 3160M words[0m
[34mRead 3170M words[0m
[34mRead 3180M words[0m
[34mRead 3190M words[0m
[34mRead 3200M words[0m
[34mRead 3210M words[0m
[34mRead 32

#### Reference (pulled from cloudwatch logs)

Alpha: 0.0000 Progress: 100.00% Million Words/sec: 1.56 #####

Training finished.

Average throughput in Million words/sec: 1.56

Total training time in seconds: 44935.42

train_accuracy: 0.6559

Number of train examples: 102013750

validation_accuracy: 0.5614

Number of validation examples: 34008858

### Deploy the Endpoint

Now we have a trained model, to use it for inferencing, we need to deploy it as a SageMaker Model Endpoint.

In [None]:
def deploy_model(configs,global_vars):
    
    if global_vars['bt_model'] == None:
        
        container = sagemaker.amazon.amazon_estimator.get_image_uri(configs['region_name'], "blazingtext", "latest")
        print('Type in S3 Model output path WITH the model.tar.gz filename, e.g. s3://../../model.tar.gz')
        model_path = raw_input()

        trainedmodel = sagemaker.model.Model(
        model_data='',
        image= container,
        role=global_vars['role'])  
        global_vars['bt_model'] = trainedmodel
    else:
        bt_model = global_vars['bt_model']
    
    
    
    text_classifier = bt_model.deploy(initial_instance_count = 3,
                                      instance_type = 'ml.c5.18xlarge',
#                                      use_compiled_model = True,
                                     endpoint_name = configs['bt_model_name'],
                                     accelerator_type = 'ml.eia1.medium',
                                      update_endpoint = True
                                     )
    
    global_vars['w2v_classifier'] = text_classifier
    
    return global_vars

global_vars = deploy_model(configs,global_vars)

### Load the Endpoint

In the case where you have restarted the notebook or want to load a different model endpoint, the following code must be run

In [210]:
def load_endpoint(configs, global_vars):
    
    model =  sagemaker.RealTimePredictor(configs['bt_model_name'])
    global_vars['w2v_classifier'] = model
    return global_vars


#find the endpoint name from your AWS Console, configure in your configs.
#load the endpoint
global_vars = load_endpoint(configs, global_vars)


### Load the Labels

If we haven't already loaded the labels (if this is a first time Loading the Notebook after a kernel restart), then you'll need to load the pickle file containing the label mapping.

In [216]:
def load_class_label_mapping(configs, global_vars):

    filename = configs['class_labels_pickle_filename']
    global_vars['labels'] = pickle.load( open(filename, "rb" ) )
    print('Labels Loaded \n{}'.format(global_vars['labels']))
    
    return global_vars
    
global_vars = load_class_label_mapping(configs, global_vars)

Labels Loaded 
{'Books': 0, 'Video': 1, 'Music': 2, 'Video_DVD': 3, 'Toys': 4, 'Video_Games': 5, 'Office_Products': 6, 'PC': 7, 'Camera': 8, 'Kitchen': 9, 'Electronics': 10, 'Software': 11, 'Baby': 12, 'Wireless': 13, 'Home': 14, 'Health_&_Personal_Care': 15, 'Grocery': 16, 'Beauty': 17, 'Sports': 18, 'Home_Entertainment': 19, 'Apparel': 20, 'Shoes': 21, 'Tools': 22, 'Lawn_and_Garden': 23, 'Pet_Products': 24, 'Outdoors': 25, 'Digital_Music_Purchase': 26, 'Digital_Ebook_Purchase': 27, 'Home_Improvement': 28, 'Automotive': 29, 'Jewelry': 30, 'Mobile_Apps': 31, 'Digital_Video_Download': 32}


### Evaluate the Model

Let's evaluate our model to determine how well we're able to predict the different classes. For this we're going to use a sample of data which was not used in the training/test dataset.

In [239]:
def evaluate_test_data_against_model(global_vars, configs):
    
    text_col = 'source'
    label_col = 'label'

    
    #first we need to download the validation dataset... 
    full_path = 's3://'+configs['bucket_name']+'/'+ configs['s3_w2v_validate_file']
    df = pd.read_json(full_path, lines=True)
    
    print('Dataset Rows {}, Columns {}'.format(df.shape[0], df.shape[1]))
    
    y_val = df[label_col].tolist()
    x_val = df[text_col].tolist()
    
    print('Total Eval Data {}'.format(len(x_val)))

    # we need to do some batch inferencing due to the size of the data:
    #each batch is 1000 sentences
    batch_size = 1000
    batches = len(x_val) // batch_size
    
    print('Batches {}'.format(batches))
    
    predictions_batches = []
    labels_inv = {y:x for x,y in global_vars['labels'].items()}
    y_hat = []
        
    text_classifier =  global_vars['w2v_classifier']

    for i in range(0, batches+1):
        lower = batch_size * i
        upper = batch_size * (i+1)
        if i == batches:
            upper = len(x_val)
#         if i % (batches/100) == 0:
        print('Batch {} : {}'.format(lower,upper))
                
        instances_batch = x_val[lower:upper]
        
        payload = {"instances":instances_batch,
                  "configuration": {"k": 1}}


        response = text_classifier.predict(json.dumps(payload))

        predictions = json.loads(response)
        predictions_batches.append(predictions)
    
        for pred in predictions:
            try:
                idx = int(str(pred['label'][0]).replace('__label__',''))
                y_hat.append(labels_inv[idx])
            except:
                y_hat.append('UNKNOWN')
    
    print('Total Predictions {}'.format(len(y_hat)))
#     print(json.dumps(predictions, indent=2))
#     print(list(zip(y_hat, y_test)))
    return y_hat, y_val
              
y_hat, y_val = evaluate_test_data_against_model(global_vars, configs)

Dataset Rows 5441212, Columns 2
Total Eval Data 5441212
Batches 5441
Batch 0 : 1000
Batch 1000 : 2000
Batch 2000 : 3000
Batch 3000 : 4000
Batch 4000 : 5000
Batch 5000 : 6000
Batch 6000 : 7000
Batch 7000 : 8000
Batch 8000 : 9000
Batch 9000 : 10000
Batch 10000 : 11000
Batch 11000 : 12000
Batch 12000 : 13000
Batch 13000 : 14000
Batch 14000 : 15000
Batch 15000 : 16000
Batch 16000 : 17000
Batch 17000 : 18000
Batch 18000 : 19000
Batch 19000 : 20000
Batch 20000 : 21000
Batch 21000 : 22000
Batch 22000 : 23000
Batch 23000 : 24000
Batch 24000 : 25000
Batch 25000 : 26000
Batch 26000 : 27000
Batch 27000 : 28000
Batch 28000 : 29000
Batch 29000 : 30000
Batch 30000 : 31000
Batch 31000 : 32000
Batch 32000 : 33000
Batch 33000 : 34000
Batch 34000 : 35000
Batch 35000 : 36000
Batch 36000 : 37000
Batch 37000 : 38000
Batch 38000 : 39000
Batch 39000 : 40000
Batch 40000 : 41000
Batch 41000 : 42000
Batch 42000 : 43000
Batch 43000 : 44000
Batch 44000 : 45000
Batch 45000 : 46000
Batch 46000 : 47000
Batch 47000 :

Batch 380000 : 381000
Batch 381000 : 382000
Batch 382000 : 383000
Batch 383000 : 384000
Batch 384000 : 385000
Batch 385000 : 386000
Batch 386000 : 387000
Batch 387000 : 388000
Batch 388000 : 389000
Batch 389000 : 390000
Batch 390000 : 391000
Batch 391000 : 392000
Batch 392000 : 393000
Batch 393000 : 394000
Batch 394000 : 395000
Batch 395000 : 396000
Batch 396000 : 397000
Batch 397000 : 398000
Batch 398000 : 399000
Batch 399000 : 400000
Batch 400000 : 401000
Batch 401000 : 402000
Batch 402000 : 403000
Batch 403000 : 404000
Batch 404000 : 405000
Batch 405000 : 406000
Batch 406000 : 407000
Batch 407000 : 408000
Batch 408000 : 409000
Batch 409000 : 410000
Batch 410000 : 411000
Batch 411000 : 412000
Batch 412000 : 413000
Batch 413000 : 414000
Batch 414000 : 415000
Batch 415000 : 416000
Batch 416000 : 417000
Batch 417000 : 418000
Batch 418000 : 419000
Batch 419000 : 420000
Batch 420000 : 421000
Batch 421000 : 422000
Batch 422000 : 423000
Batch 423000 : 424000
Batch 424000 : 425000
Batch 4250

Batch 753000 : 754000
Batch 754000 : 755000
Batch 755000 : 756000
Batch 756000 : 757000
Batch 757000 : 758000
Batch 758000 : 759000
Batch 759000 : 760000
Batch 760000 : 761000
Batch 761000 : 762000
Batch 762000 : 763000
Batch 763000 : 764000
Batch 764000 : 765000
Batch 765000 : 766000
Batch 766000 : 767000
Batch 767000 : 768000
Batch 768000 : 769000
Batch 769000 : 770000
Batch 770000 : 771000
Batch 771000 : 772000
Batch 772000 : 773000
Batch 773000 : 774000
Batch 774000 : 775000
Batch 775000 : 776000
Batch 776000 : 777000
Batch 777000 : 778000
Batch 778000 : 779000
Batch 779000 : 780000
Batch 780000 : 781000
Batch 781000 : 782000
Batch 782000 : 783000
Batch 783000 : 784000
Batch 784000 : 785000
Batch 785000 : 786000
Batch 786000 : 787000
Batch 787000 : 788000
Batch 788000 : 789000
Batch 789000 : 790000
Batch 790000 : 791000
Batch 791000 : 792000
Batch 792000 : 793000
Batch 793000 : 794000
Batch 794000 : 795000
Batch 795000 : 796000
Batch 796000 : 797000
Batch 797000 : 798000
Batch 7980

Batch 1115000 : 1116000
Batch 1116000 : 1117000
Batch 1117000 : 1118000
Batch 1118000 : 1119000
Batch 1119000 : 1120000
Batch 1120000 : 1121000
Batch 1121000 : 1122000
Batch 1122000 : 1123000
Batch 1123000 : 1124000
Batch 1124000 : 1125000
Batch 1125000 : 1126000
Batch 1126000 : 1127000
Batch 1127000 : 1128000
Batch 1128000 : 1129000
Batch 1129000 : 1130000
Batch 1130000 : 1131000
Batch 1131000 : 1132000
Batch 1132000 : 1133000
Batch 1133000 : 1134000
Batch 1134000 : 1135000
Batch 1135000 : 1136000
Batch 1136000 : 1137000
Batch 1137000 : 1138000
Batch 1138000 : 1139000
Batch 1139000 : 1140000
Batch 1140000 : 1141000
Batch 1141000 : 1142000
Batch 1142000 : 1143000
Batch 1143000 : 1144000
Batch 1144000 : 1145000
Batch 1145000 : 1146000
Batch 1146000 : 1147000
Batch 1147000 : 1148000
Batch 1148000 : 1149000
Batch 1149000 : 1150000
Batch 1150000 : 1151000
Batch 1151000 : 1152000
Batch 1152000 : 1153000
Batch 1153000 : 1154000
Batch 1154000 : 1155000
Batch 1155000 : 1156000
Batch 1156000 : 

Batch 1457000 : 1458000
Batch 1458000 : 1459000
Batch 1459000 : 1460000
Batch 1460000 : 1461000
Batch 1461000 : 1462000
Batch 1462000 : 1463000
Batch 1463000 : 1464000
Batch 1464000 : 1465000
Batch 1465000 : 1466000
Batch 1466000 : 1467000
Batch 1467000 : 1468000
Batch 1468000 : 1469000
Batch 1469000 : 1470000
Batch 1470000 : 1471000
Batch 1471000 : 1472000
Batch 1472000 : 1473000
Batch 1473000 : 1474000
Batch 1474000 : 1475000
Batch 1475000 : 1476000
Batch 1476000 : 1477000
Batch 1477000 : 1478000
Batch 1478000 : 1479000
Batch 1479000 : 1480000
Batch 1480000 : 1481000
Batch 1481000 : 1482000
Batch 1482000 : 1483000
Batch 1483000 : 1484000
Batch 1484000 : 1485000
Batch 1485000 : 1486000
Batch 1486000 : 1487000
Batch 1487000 : 1488000
Batch 1488000 : 1489000
Batch 1489000 : 1490000
Batch 1490000 : 1491000
Batch 1491000 : 1492000
Batch 1492000 : 1493000
Batch 1493000 : 1494000
Batch 1494000 : 1495000
Batch 1495000 : 1496000
Batch 1496000 : 1497000
Batch 1497000 : 1498000
Batch 1498000 : 

Batch 1799000 : 1800000
Batch 1800000 : 1801000
Batch 1801000 : 1802000
Batch 1802000 : 1803000
Batch 1803000 : 1804000
Batch 1804000 : 1805000
Batch 1805000 : 1806000
Batch 1806000 : 1807000
Batch 1807000 : 1808000
Batch 1808000 : 1809000
Batch 1809000 : 1810000
Batch 1810000 : 1811000
Batch 1811000 : 1812000
Batch 1812000 : 1813000
Batch 1813000 : 1814000
Batch 1814000 : 1815000
Batch 1815000 : 1816000
Batch 1816000 : 1817000
Batch 1817000 : 1818000
Batch 1818000 : 1819000
Batch 1819000 : 1820000
Batch 1820000 : 1821000
Batch 1821000 : 1822000
Batch 1822000 : 1823000
Batch 1823000 : 1824000
Batch 1824000 : 1825000
Batch 1825000 : 1826000
Batch 1826000 : 1827000
Batch 1827000 : 1828000
Batch 1828000 : 1829000
Batch 1829000 : 1830000
Batch 1830000 : 1831000
Batch 1831000 : 1832000
Batch 1832000 : 1833000
Batch 1833000 : 1834000
Batch 1834000 : 1835000
Batch 1835000 : 1836000
Batch 1836000 : 1837000
Batch 1837000 : 1838000
Batch 1838000 : 1839000
Batch 1839000 : 1840000
Batch 1840000 : 

Batch 2141000 : 2142000
Batch 2142000 : 2143000
Batch 2143000 : 2144000
Batch 2144000 : 2145000
Batch 2145000 : 2146000
Batch 2146000 : 2147000
Batch 2147000 : 2148000
Batch 2148000 : 2149000
Batch 2149000 : 2150000
Batch 2150000 : 2151000
Batch 2151000 : 2152000
Batch 2152000 : 2153000
Batch 2153000 : 2154000
Batch 2154000 : 2155000
Batch 2155000 : 2156000
Batch 2156000 : 2157000
Batch 2157000 : 2158000
Batch 2158000 : 2159000
Batch 2159000 : 2160000
Batch 2160000 : 2161000
Batch 2161000 : 2162000
Batch 2162000 : 2163000
Batch 2163000 : 2164000
Batch 2164000 : 2165000
Batch 2165000 : 2166000
Batch 2166000 : 2167000
Batch 2167000 : 2168000
Batch 2168000 : 2169000
Batch 2169000 : 2170000
Batch 2170000 : 2171000
Batch 2171000 : 2172000
Batch 2172000 : 2173000
Batch 2173000 : 2174000
Batch 2174000 : 2175000
Batch 2175000 : 2176000
Batch 2176000 : 2177000
Batch 2177000 : 2178000
Batch 2178000 : 2179000
Batch 2179000 : 2180000
Batch 2180000 : 2181000
Batch 2181000 : 2182000
Batch 2182000 : 

Batch 2485000 : 2486000
Batch 2486000 : 2487000
Batch 2487000 : 2488000
Batch 2488000 : 2489000
Batch 2489000 : 2490000
Batch 2490000 : 2491000
Batch 2491000 : 2492000
Batch 2492000 : 2493000
Batch 2493000 : 2494000
Batch 2494000 : 2495000
Batch 2495000 : 2496000
Batch 2496000 : 2497000
Batch 2497000 : 2498000
Batch 2498000 : 2499000
Batch 2499000 : 2500000
Batch 2500000 : 2501000
Batch 2501000 : 2502000
Batch 2502000 : 2503000
Batch 2503000 : 2504000
Batch 2504000 : 2505000
Batch 2505000 : 2506000
Batch 2506000 : 2507000
Batch 2507000 : 2508000
Batch 2508000 : 2509000
Batch 2509000 : 2510000
Batch 2510000 : 2511000
Batch 2511000 : 2512000
Batch 2512000 : 2513000
Batch 2513000 : 2514000
Batch 2514000 : 2515000
Batch 2515000 : 2516000
Batch 2516000 : 2517000
Batch 2517000 : 2518000
Batch 2518000 : 2519000
Batch 2519000 : 2520000
Batch 2520000 : 2521000
Batch 2521000 : 2522000
Batch 2522000 : 2523000
Batch 2523000 : 2524000
Batch 2524000 : 2525000
Batch 2525000 : 2526000
Batch 2526000 : 

Batch 2829000 : 2830000
Batch 2830000 : 2831000
Batch 2831000 : 2832000
Batch 2832000 : 2833000
Batch 2833000 : 2834000
Batch 2834000 : 2835000
Batch 2835000 : 2836000
Batch 2836000 : 2837000
Batch 2837000 : 2838000
Batch 2838000 : 2839000
Batch 2839000 : 2840000
Batch 2840000 : 2841000
Batch 2841000 : 2842000
Batch 2842000 : 2843000
Batch 2843000 : 2844000
Batch 2844000 : 2845000
Batch 2845000 : 2846000
Batch 2846000 : 2847000
Batch 2847000 : 2848000
Batch 2848000 : 2849000
Batch 2849000 : 2850000
Batch 2850000 : 2851000
Batch 2851000 : 2852000
Batch 2852000 : 2853000
Batch 2853000 : 2854000
Batch 2854000 : 2855000
Batch 2855000 : 2856000
Batch 2856000 : 2857000
Batch 2857000 : 2858000
Batch 2858000 : 2859000
Batch 2859000 : 2860000
Batch 2860000 : 2861000
Batch 2861000 : 2862000
Batch 2862000 : 2863000
Batch 2863000 : 2864000
Batch 2864000 : 2865000
Batch 2865000 : 2866000
Batch 2866000 : 2867000
Batch 2867000 : 2868000
Batch 2868000 : 2869000
Batch 2869000 : 2870000
Batch 2870000 : 

Batch 3171000 : 3172000
Batch 3172000 : 3173000
Batch 3173000 : 3174000
Batch 3174000 : 3175000
Batch 3175000 : 3176000
Batch 3176000 : 3177000
Batch 3177000 : 3178000
Batch 3178000 : 3179000
Batch 3179000 : 3180000
Batch 3180000 : 3181000
Batch 3181000 : 3182000
Batch 3182000 : 3183000
Batch 3183000 : 3184000
Batch 3184000 : 3185000
Batch 3185000 : 3186000
Batch 3186000 : 3187000
Batch 3187000 : 3188000
Batch 3188000 : 3189000
Batch 3189000 : 3190000
Batch 3190000 : 3191000
Batch 3191000 : 3192000
Batch 3192000 : 3193000
Batch 3193000 : 3194000
Batch 3194000 : 3195000
Batch 3195000 : 3196000
Batch 3196000 : 3197000
Batch 3197000 : 3198000
Batch 3198000 : 3199000
Batch 3199000 : 3200000
Batch 3200000 : 3201000
Batch 3201000 : 3202000
Batch 3202000 : 3203000
Batch 3203000 : 3204000
Batch 3204000 : 3205000
Batch 3205000 : 3206000
Batch 3206000 : 3207000
Batch 3207000 : 3208000
Batch 3208000 : 3209000
Batch 3209000 : 3210000
Batch 3210000 : 3211000
Batch 3211000 : 3212000
Batch 3212000 : 

Batch 3513000 : 3514000
Batch 3514000 : 3515000
Batch 3515000 : 3516000
Batch 3516000 : 3517000
Batch 3517000 : 3518000
Batch 3518000 : 3519000
Batch 3519000 : 3520000
Batch 3520000 : 3521000
Batch 3521000 : 3522000
Batch 3522000 : 3523000
Batch 3523000 : 3524000
Batch 3524000 : 3525000
Batch 3525000 : 3526000
Batch 3526000 : 3527000
Batch 3527000 : 3528000
Batch 3528000 : 3529000
Batch 3529000 : 3530000
Batch 3530000 : 3531000
Batch 3531000 : 3532000
Batch 3532000 : 3533000
Batch 3533000 : 3534000
Batch 3534000 : 3535000
Batch 3535000 : 3536000
Batch 3536000 : 3537000
Batch 3537000 : 3538000
Batch 3538000 : 3539000
Batch 3539000 : 3540000
Batch 3540000 : 3541000
Batch 3541000 : 3542000
Batch 3542000 : 3543000
Batch 3543000 : 3544000
Batch 3544000 : 3545000
Batch 3545000 : 3546000
Batch 3546000 : 3547000
Batch 3547000 : 3548000
Batch 3548000 : 3549000
Batch 3549000 : 3550000
Batch 3550000 : 3551000
Batch 3551000 : 3552000
Batch 3552000 : 3553000
Batch 3553000 : 3554000
Batch 3554000 : 

Batch 3856000 : 3857000
Batch 3857000 : 3858000
Batch 3858000 : 3859000
Batch 3859000 : 3860000
Batch 3860000 : 3861000
Batch 3861000 : 3862000
Batch 3862000 : 3863000
Batch 3863000 : 3864000
Batch 3864000 : 3865000
Batch 3865000 : 3866000
Batch 3866000 : 3867000
Batch 3867000 : 3868000
Batch 3868000 : 3869000
Batch 3869000 : 3870000
Batch 3870000 : 3871000
Batch 3871000 : 3872000
Batch 3872000 : 3873000
Batch 3873000 : 3874000
Batch 3874000 : 3875000
Batch 3875000 : 3876000
Batch 3876000 : 3877000
Batch 3877000 : 3878000
Batch 3878000 : 3879000
Batch 3879000 : 3880000
Batch 3880000 : 3881000
Batch 3881000 : 3882000
Batch 3882000 : 3883000
Batch 3883000 : 3884000
Batch 3884000 : 3885000
Batch 3885000 : 3886000
Batch 3886000 : 3887000
Batch 3887000 : 3888000
Batch 3888000 : 3889000
Batch 3889000 : 3890000
Batch 3890000 : 3891000
Batch 3891000 : 3892000
Batch 3892000 : 3893000
Batch 3893000 : 3894000
Batch 3894000 : 3895000
Batch 3895000 : 3896000
Batch 3896000 : 3897000
Batch 3897000 : 

Batch 4200000 : 4201000
Batch 4201000 : 4202000
Batch 4202000 : 4203000
Batch 4203000 : 4204000
Batch 4204000 : 4205000
Batch 4205000 : 4206000
Batch 4206000 : 4207000
Batch 4207000 : 4208000
Batch 4208000 : 4209000
Batch 4209000 : 4210000
Batch 4210000 : 4211000
Batch 4211000 : 4212000
Batch 4212000 : 4213000
Batch 4213000 : 4214000
Batch 4214000 : 4215000
Batch 4215000 : 4216000
Batch 4216000 : 4217000
Batch 4217000 : 4218000
Batch 4218000 : 4219000
Batch 4219000 : 4220000
Batch 4220000 : 4221000
Batch 4221000 : 4222000
Batch 4222000 : 4223000
Batch 4223000 : 4224000
Batch 4224000 : 4225000
Batch 4225000 : 4226000
Batch 4226000 : 4227000
Batch 4227000 : 4228000
Batch 4228000 : 4229000
Batch 4229000 : 4230000
Batch 4230000 : 4231000
Batch 4231000 : 4232000
Batch 4232000 : 4233000
Batch 4233000 : 4234000
Batch 4234000 : 4235000
Batch 4235000 : 4236000
Batch 4236000 : 4237000
Batch 4237000 : 4238000
Batch 4238000 : 4239000
Batch 4239000 : 4240000
Batch 4240000 : 4241000
Batch 4241000 : 

Batch 4542000 : 4543000
Batch 4543000 : 4544000
Batch 4544000 : 4545000
Batch 4545000 : 4546000
Batch 4546000 : 4547000
Batch 4547000 : 4548000
Batch 4548000 : 4549000
Batch 4549000 : 4550000
Batch 4550000 : 4551000
Batch 4551000 : 4552000
Batch 4552000 : 4553000
Batch 4553000 : 4554000
Batch 4554000 : 4555000
Batch 4555000 : 4556000
Batch 4556000 : 4557000
Batch 4557000 : 4558000
Batch 4558000 : 4559000
Batch 4559000 : 4560000
Batch 4560000 : 4561000
Batch 4561000 : 4562000
Batch 4562000 : 4563000
Batch 4563000 : 4564000
Batch 4564000 : 4565000
Batch 4565000 : 4566000
Batch 4566000 : 4567000
Batch 4567000 : 4568000
Batch 4568000 : 4569000
Batch 4569000 : 4570000
Batch 4570000 : 4571000
Batch 4571000 : 4572000
Batch 4572000 : 4573000
Batch 4573000 : 4574000
Batch 4574000 : 4575000
Batch 4575000 : 4576000
Batch 4576000 : 4577000
Batch 4577000 : 4578000
Batch 4578000 : 4579000
Batch 4579000 : 4580000
Batch 4580000 : 4581000
Batch 4581000 : 4582000
Batch 4582000 : 4583000
Batch 4583000 : 

Batch 4885000 : 4886000
Batch 4886000 : 4887000
Batch 4887000 : 4888000
Batch 4888000 : 4889000
Batch 4889000 : 4890000
Batch 4890000 : 4891000
Batch 4891000 : 4892000
Batch 4892000 : 4893000
Batch 4893000 : 4894000
Batch 4894000 : 4895000
Batch 4895000 : 4896000
Batch 4896000 : 4897000
Batch 4897000 : 4898000
Batch 4898000 : 4899000
Batch 4899000 : 4900000
Batch 4900000 : 4901000
Batch 4901000 : 4902000
Batch 4902000 : 4903000
Batch 4903000 : 4904000
Batch 4904000 : 4905000
Batch 4905000 : 4906000
Batch 4906000 : 4907000
Batch 4907000 : 4908000
Batch 4908000 : 4909000
Batch 4909000 : 4910000
Batch 4910000 : 4911000
Batch 4911000 : 4912000
Batch 4912000 : 4913000
Batch 4913000 : 4914000
Batch 4914000 : 4915000
Batch 4915000 : 4916000
Batch 4916000 : 4917000
Batch 4917000 : 4918000
Batch 4918000 : 4919000
Batch 4919000 : 4920000
Batch 4920000 : 4921000
Batch 4921000 : 4922000
Batch 4922000 : 4923000
Batch 4923000 : 4924000
Batch 4924000 : 4925000
Batch 4925000 : 4926000
Batch 4926000 : 

Batch 5228000 : 5229000
Batch 5229000 : 5230000
Batch 5230000 : 5231000
Batch 5231000 : 5232000
Batch 5232000 : 5233000
Batch 5233000 : 5234000
Batch 5234000 : 5235000
Batch 5235000 : 5236000
Batch 5236000 : 5237000
Batch 5237000 : 5238000
Batch 5238000 : 5239000
Batch 5239000 : 5240000
Batch 5240000 : 5241000
Batch 5241000 : 5242000
Batch 5242000 : 5243000
Batch 5243000 : 5244000
Batch 5244000 : 5245000
Batch 5245000 : 5246000
Batch 5246000 : 5247000
Batch 5247000 : 5248000
Batch 5248000 : 5249000
Batch 5249000 : 5250000
Batch 5250000 : 5251000
Batch 5251000 : 5252000
Batch 5252000 : 5253000
Batch 5253000 : 5254000
Batch 5254000 : 5255000
Batch 5255000 : 5256000
Batch 5256000 : 5257000
Batch 5257000 : 5258000
Batch 5258000 : 5259000
Batch 5259000 : 5260000
Batch 5260000 : 5261000
Batch 5261000 : 5262000
Batch 5262000 : 5263000
Batch 5263000 : 5264000
Batch 5264000 : 5265000
Batch 5265000 : 5266000
Batch 5266000 : 5267000
Batch 5267000 : 5268000
Batch 5268000 : 5269000
Batch 5269000 : 

In [244]:
def evaluate_model_predictions(configs, global_vars, y_pred, y_true):
        
    labels_inv = {y:x for x,y in global_vars['labels'].items()}

    y_true_labels = []
    for y in y_true:
        y_true_labels.append(labels_inv[y])

    print(classification_report(y_true_labels, y_pred))

evaluate_model_predictions(configs, global_vars, y_hat, y_val)

                        precision    recall  f1-score   support

               Apparel       0.04      0.06      0.05    163363
            Automotive       0.55      0.51      0.53    129237
                  Baby       0.46      0.52      0.49     64549
                Beauty       0.01      0.01      0.01    181681
                 Books       0.49      0.93      0.64    747384
                Camera       0.07      0.02      0.03    260491
Digital_Ebook_Purchase       0.80      0.04      0.07    682749
Digital_Music_Purchase       0.00      0.00      0.00     85609
Digital_Video_Download       0.66      0.28      0.39    142493
           Electronics       0.51      0.45      0.48    116795
               Grocery       0.64      0.69      0.66     89981
Health_&_Personal_Care       0.54      0.47      0.50    198754
                  Home       0.51      0.47      0.49    232394
    Home_Entertainment       0.06      0.18      0.09      2763
      Home_Improvement       0.41      

**Notes**: We're seeing miuxed results for different classes, which is expected dueo to the different number of instances available for the different classes (e.g. the number of data points related to the specific product category vary).


## Optional: Using Gensim for Similarity Matching

To Add...

In [None]:
from gensim.models import KeyedVectors
import tarfile


def download_model_binary_and_unpack(configs, global_vars):
    
    model_tar_filename = 'model.tar.gz'
    model_binary_output = 'model.bin'

    s3_bucket = global_vars['s3_bucket']
    s3_model_key = '{}/output/model.tar.gz'.format(configs['bucket_name'], configs['wordvecdata'])
    s3_bucket.download_file(s3_model_key,model_tar_filename)
    
    #untar
    tar = tarfile.open(model_tar_filename)
    tar.extractall()
    tar.close()
    
    return model_binary_output

def load_pretrained_model_into gensim(configs, global_vars):
    
    model_binary = download_model_binary_and_unpack()
    word_vectors = KeyedVectors.load_word2vec_format(model_binary, binary=True)
#     word_vectors.most_similar(positive=['woman', 'king'], negative=['man'])
#     word_vectors.doesnt_match("breakfast cereal dinner lunch".split())
    
