In [1]:
#imports
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup, NavigableString, Tag
import re
import time
import random
import sys
pd.set_option('display.max_colwidth', None)

from sklearn.metrics.pairwise import pairwise_distances, cosine_distances, cosine_similarity
from scipy import sparse
from matplotlib import pyplot as plt
import utils as ut

In [6]:
movie_df = pd.read_csv('./data/movie_dvd.csv', usecols=['customer_id', 'product_id', 'product_title', 
                                                        'star_rating', 'review_date'])


vg_df = pd.read_csv('./data/video_games.csv', usecols=['customer_id', 'product_id', 'product_title', 
                                                        'star_rating', 'review_date'])

books_df = pd.read_csv('./data/books.csv', usecols=['customer_id', 'product_id', 'product_title', 
                                                        'star_rating', 'review_date'])

df_list = [(movie_df,'movie_rec'),
            (vg_df, 'videog_rec'),
            (books_df, 'books_rec')]

for df, name in df_list:
    print(name)
    print(ut.size_in_gb(df))
    print(df.shape)
    print()


movie_rec
1.060563131 GB
(4405432, 5)

videog_rec
0.396461632 GB
(1648136, 5)

books_rec
0.370858643 GB
(1489354, 5)



Okay - now it's time to run each of these through my function that makes the recommender dataframe and saves it to a pickle.

In [8]:
#for df, name in df_list:
    #ut.make_recommender_df(df, name)

Dropping 11384 duplicate values.
Unique customers: 1867543
Unique products: 76336
Size of matrix: (72385, 1867543)
Making df dictionary...
Dictionary made - sparse dataframe under construction...
Size of movie_rec Recommender df: 1.843095412 GB
Dropping 6140 duplicate values.
Unique customers: 979939
Unique products: 20952
Size of matrix: (15938, 979939)
Making df dictionary...
Dictionary made - sparse dataframe under construction...
Size of videog_rec Recommender df: 0.076510705 GB
Dropping 42988 duplicate values.
Unique customers: 795389
Unique products: 49547
Size of matrix: (46575, 795389)
Making df dictionary...
Dictionary made - sparse dataframe under construction...
Size of books_rec Recommender df: 0.422205132 GB


That should do it! Let's load in and preview all 3 pickles just to be sure they look the way we would expect:

In [14]:
books = pd.read_pickle('./pickles/books_rec.pkl')
movies = pd.read_pickle('./pickles/movie_rec.pkl')
videog = pd.read_pickle('./pickles/videog_rec.pkl')

In [15]:
books.head()

Unnamed: 0,"Java(TM) Programming Language, The (3rd Edition) (The Java Series)",You Have the Power: Choosing Courage in a Culture of Fear,Annapurna,Call to Arms (Star Trek: Deep Space Nine / The Dominion War Book 2) (v. 2),Growing a Business,Java 1.1: The Complete Reference,"Daja's Book (Circle of Magic, No.3)",RFK: A Candid Biography of Robert F. Kennedy,A Chosen Faith: An Introduction to Unitarian Universalism,The Conan Chronicles,...,The Wilderness Family: At Home with Africa's Wildlife,The Ice Master: The Doomed 1913 Voyage of the Karluk,"The New Evil (Cheerleaders, No. 7)",Why We Can't Wait (Signet Classics),The Celtic Dragon Tarot Kit,Tuck Everlasting (A Sunburst book),"The World's Best-Kept Beauty Secrets: What Really Works in Beauty, Diet & Fashion","Ammie, Come Home",What Nietzsche Really Said,Chemical Sensitivity: The Truth About Environmental Illness (Consumer Health Library)
"Java(TM) Programming Language, The (3rd Edition) (The Java Series)",0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
You Have the Power: Choosing Courage in a Culture of Fear,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Annapurna,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Call to Arms (Star Trek: Deep Space Nine / The Dominion War Book 2) (v. 2),1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Growing a Business,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [16]:
movies.head()

Unnamed: 0,Seventh Son [Blu-ray],Sailor Moon: The Ultimate Uncut Collection,Love Jones,Spartacus (50th Anniversary Edition) [Blu-ray],Mike Dooley: The Secret of the Law of Attraction 3: Manifesting Change,"Verdi - Requiem / Price, Pavarotti, Cossotto, Ghiaurov, von Karajan, Teatro alla Scala",The Royle Family: Season 2,How to Super Tune! Performance Engine Building and Carburetor Tuning,"I, Robot [Blu-ray]",The Awakening Land (Tv Mini-Series),...,"Rock, Rhythm and Doo Wop","Bozo: The World's Most Famous Clown, Vol. 1",How Hitler Lost the War,Mickey's Magical Christmas: Snowed in at the House of Mouse,Stonehearst Asylum [Blu-ray],The Little Vampire,The Revengers,Godzilla Boxed Set (Godzilla: Tokyo SOS / Godzilla vs. Mechagodzilla / Son of Godzilla),Sonic the Hedgehog: The Movie,TORCHWOOD MIRACLE DAY
Seventh Son [Blu-ray],0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,0.948745,1.0,0.976235,1.0,1.0,1.0
Sailor Moon: The Ultimate Uncut Collection,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Love Jones,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Spartacus (50th Anniversary Edition) [Blu-ray],1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.985996
Mike Dooley: The Secret of the Law of Attraction 3: Manifesting Change,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [17]:
videog.head()

Unnamed: 0,SEGA INITIAL D EXTREME STAGE PLAYSTATION3 the Best (BEST PRICE) for PS3 [Japan Import],3CLeader® Thumb Grip Analog grips Stick Caps for Sony PS4 PS3 Xbox360 controller cap cover - 2 Pairs Silicone Green,ASTRO Gaming MixAmp Pro [2014 model],Futurama,Splatterhouse,NFL Street 2,Xbox Controller (Original Design),Our House - Nintendo DS,PS3/PS2/PC Street Fighter 4 JoyStick Fight Pad Fight Stick Controller,Tachyon: The Fringe,...,Tom Clancy's Ghost Recon Island Thunder - Xbox,Nascar Dirt to Daytona,"Drop shot, Auto-aim, Jitter Xbox 360 Modded Controller COD Ghosts, MW3, Black Ops 2, MW2, Rapid fire mod (Chrome/Black)",Everquest II: Desert of Flames - PC,Barbie Diaries: High School Mystery - PC,Saints Row the Third,Sid Meier's Gettysburg (Jewel Case) - PC,20GB Hard Disk Drive HDD for Microsoft Xbox 360,The Trash Pack,InterAct SuperPad 64 Controller
SEGA INITIAL D EXTREME STAGE PLAYSTATION3 the Best (BEST PRICE) for PS3 [Japan Import],0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3CLeader® Thumb Grip Analog grips Stick Caps for Sony PS4 PS3 Xbox360 controller cap cover - 2 Pairs Silicone Green,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
ASTRO Gaming MixAmp Pro [2014 model],1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Futurama,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.985371,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Splatterhouse,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.992164,1.0,1.0,0.989985,1.0


Looks good! Time to start building the recommender functions so these can searched easily.