# Using spaCy for Capstone Project
### This includes more cleaning and EDA with some modeling practice 

https://www.geeksforgeeks.org/python-sort-python-dictionaries-by-key-or-value/   
https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042      
https://towardsdatascience.com/so-whats-spacy-ad65aa1949e0   
The following links from spacy.io were used to help create this system (note there are many links based off these that are used, but these are the main ones)  
https://spacy.io/usage   
https://spacy.io/usage/spacy-101   
https://spacy.io/usage/vectors-similarity   
https://spacy.io/api/span#vector    
https://spacy.io/api/vectors     
  

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import spacy

from nltk.tokenize import sent_tokenize, word_tokenize, RegexpTokenizer
from nltk.corpus import stopwords 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

In [2]:
nlp_sm = spacy.load('en_core_web_sm')
nlp_lg = spacy.load('en_core_web_lg')

In [78]:
data = pd.read_csv('./data/chewy.csv')

In [80]:
data.shape

(2730, 15)

In [81]:
data.head()

Unnamed: 0,title,price,descriptions,key_benefits,rating,reviews,link,subcat,cat,combined_text,combined_sent_len,combined_word_leng,tokenized_combined,token_vectors,avg_vector
0,Nylabone Teething Pacifier Puppy Chew Toy,3.23,'Every puppy needs a pacifier to soothe teethi...,Designed to encourage positive play and teach ...,4.2,I read the reviews and thought we'd be safe. W...,https://www.chewy.com/nylabone-teething-pacifi...,moderate,chew toys,Every puppy needs a pacifier to soothe teethin...,49,1167,"['every', 'puppy', 'needs', 'pacifier', 'sooth...","[-0.036094774, 0.16643533, -0.17682464, -0.101...",0.003221
1,"KONG Puppy Dog Toy, Color Varies",6.99,"""The Puppy KONG dog toy is customized for a gr...",Unpredictable bounce is great for energetic pu...,4.3,"I have had dozens of dogs over the years, and ...",https://www.chewy.com/kong-puppy-dog-toy-color...,moderate,chew toys,"""The Puppy KONG dog toy is customized for a gr...",38,907,"['puppy', 'kong', 'dog', 'toy', 'customized', ...","[-0.054273944, 0.17632876, -0.15004495, -0.091...",0.002996
2,Petstages Dogwood Tough Dog Chew Toy,8.83,"""Chewing is a natural behavior in all dogs, as...",Chew toy that combines real wood with syntheti...,4.2,My dogs like chasing sticks and the two of the...,https://www.chewy.com/petstages-dogwood-tough-...,moderate,chew toys,"""Chewing is a natural behavior in all dogs, as...",50,1133,"['chewing', 'natural', 'behavior', 'dogs', 'he...","[-0.053820234, 0.16249822, -0.17790341, -0.094...",0.004223
3,Nylabone Teething Rings Puppy Chew Toy,6.49,'Great for teething and tugging! The Puppy Tee...,Designed to encourage positive play and teach ...,4.1,It doesn't save fingers and hands as much from...,https://www.chewy.com/nylabone-teething-rings-...,moderate,chew toys,Great for teething and tugging! The Puppy Teet...,31,790,"['great', 'teething', 'tugging', 'puppy', 'tee...","[-0.04568519, 0.16081789, -0.16212736, -0.0856...",0.003459
4,Nylabone Puppy Teething X Bone Beef Flavored P...,6.82,'Curious puppies have met their match with the...,Non-edible dog toy is made for teething puppie...,4.1,This is a great shoe for a tiny puppy in there...,https://www.chewy.com/nylabone-puppy-teething-...,moderate,chew toys,Curious puppies have met their match with the ...,27,786,"['curious', 'puppies', 'met', 'match', 'nylabo...","[-0.03634293, 0.15598676, -0.14689486, -0.0651...",0.003303


# This next section will be usefull code for the actual recommendation system 

### User Input-- What will the user need to do?
The user input will be a short paragraph answering the following the questions to the best of their abilities:
 - What type of dog do you have?
 - Is your dog a puppy, or older?
 - Is your dog super energetic and playful or more calm?
 - Would you say your dog is smarter than most dogs?
 - Does your dog like to chase after toys, cuddle toys, or chew on them? 
 - Does your dog get pretty bored after playing with a toy for a bit? 
 - Does your dog already have an ideal toy that is their absolute favorite? (If yes, describe it.)   
 Try to describe what your dog IS rather what you dog IS NOT so that the model works better. 
 
The paragraph of text describing their dog will be tokenized and stopwords removed so that it can then be compared to tokenized text for every dog toy to find the most similar toys based off the text. 

The text that I will use to create and test this will be about one of my own dogs Colby:  
An okay example:   
"Colby is 9-year-old retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very food oriented. I wouldn't say he's very smart and definielty prefers to chase toys more than cuddle or chew on them. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."

A better example:
"Colby is 9-year-old retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very food oriented. He is pretty dumn and definielty prefers to chase toys. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."

A third example: 
"Eli is a miniature dachshund that loves to run around and snuggle with soft toys. She is very active but also loves to cuddle and loves toys." 

In [5]:
# tokenizer = RegexpTokenizer(r'\w+')
# stopwords = stopwords.words('english') + ['dog', 'toy', 'dogs', 'toys', 'one']

In [82]:
# User input cleaning: 
user_input_1 = "Colby is 9-year-old golden retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very foot oriented. I wouldn't say he's very smart and definielty prefers to chase toys more than cuddle or chew on them. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."
user_input_colby = "Colby is 9-year-old golden retriever. For the most part, he is pretty mellow, but sometimes gets bursts of energy, especially when we go swimming and playing fetch in the pool with a ball. He also really like treats and is very food oriented. He will get bored with a toy after a bit. His favorite toy is probably a ball."
user_input_eli = "Eli is a 7-year-old miniature dachshund. She loves to run and chase balls and toys. But at the same time, she also enjoys snuggling with soft toys. She can also be quite destructive and loves to chew on things and lick. The second she sees something she chases and barks after it like a little guard dog"
user_input_middy = "Middy is an 8-year-old Golden Retriever. He is both an extremely active and mellow dog. He absolutely loves to chase things and play with us as well as swim after balls. He is also extremely destructive and destroys almost every toy we have. He loves squeaky toys and will chew and destroys them for hours."

# user_doc = nlp_lg(user_input)

# user_tokens = tokenizer.tokenize(str(user_doc).lower())
# user_text = [token for token in user_tokens if token not in stopwords]

# user_no_stop_words
# user_text = []
# for token in user_doc:
#     tokenizer.tokenize(str(token).lower())
# user_text

In [61]:
# combined_tokens = tokenizer.tokenize(df['combined_text'][i].lower())
# no_stop_words = [token for token in combined_tokens if token not in stopwords.words('english')]

In [62]:
# https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042

list_docs = []
for i in range(len(data)):
    if data['combined_text'][i] != '':
        doc = nlp_lg("u" + str(data['combined_text'][i]) + "'")
        list_docs.append(doc)

In [63]:

# list_docs = {}
# for i in range(len(data)):
#     if data['combined_text'][i] != '':
#         doc = nlp_lg("u" + str(data['combined_text'][i]) + "'")
#         list_docs[i] = doc

pd.DataFrame(list_docs)
# print(len(list_docs.keys()))
# len(list_docs.values())

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1457,1458,1459,1460,1461,1462,1463,1464,1465,1466
0,uEvery,puppy,needs,a,pacifier,to,soothe,teething,pain,",",...,,,,,,,,,,
1,"u""The",Puppy,KONG,dog,toy,is,customized,for,a,growing,...,,,,,,,,,,
2,"u""Chewing",is,a,natural,behavior,in,all,dogs,",",as,...,,,,,,,,,,
3,uGreat,for,teething,and,tugging,!,The,Puppy,Teething,Rings,...,,,,,,,,,,
4,uCurious,puppies,have,met,their,match,with,the,Nylabone,Teething,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2725,uGet,your,furry,friend,ready,for,tipoff,with,the,Pets,...,,,,,,,,,,
2726,uTrain,your,aspiring,or,experienced,athlete,in,the,world,of,...,,,,,,,,,,
2727,uGive,Spot,a,fun,new,knot,to,trot,",",chew,...,,,,,,,,,,
2728,uIf,your,dog,loves,tug,of,war,",",the,Squishy,...,,,,,,,,,,


In [64]:
# https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042
# # This will take in what the users says and then score it based off that 
# # I have added to this as well

# def calculate_similarity_with_spacy(nlp, df, user_text, n=6):
#     # Calculate similarity with Spacy 
#     list_sim = []
#     toy_score = []
#     doc1 = nlp(user_text + "'")
#     vectors = doc1.vector # not quite sure if this will be useful bc not sure how to use the .most_similar here
#     for i in df.index:
#         try: 
#             doc2 = list_docs[i]
#             score = doc1.similarity(doc2)
#             list_sim.append((doc1, doc2, score))
#             toy_score.append((score, df['title'][i], df['cat'][i], df['description']))
#         except:
#             continue
    
#     return toy_score, list_sim

Running it on the okay example

In [83]:
def calculate_similarity_with_spacy(nlp, df, user_text, n=6):
    # Calculate similarity with Spacy 
    list_sim = []
    toy_score = []
    doc1 = nlp(user_text + "'")
    vectors = doc1.vector # not quite sure if this will be useful bc not sure how to use the .most_similar here
    for i in df.index:
        try: 
            doc2 = list_docs[i]
            score = doc1.similarity(doc2)
            list_sim.append((doc1, doc2, score))
            toy_score.append((score, df['title'][i], df['cat'][i]))
        except:
            continue
    return toy_score

In [84]:
# ranking_list = calculate_similarity_with_spacy(nlp_lg, data, user_text=user_input_1)
# ranking_list

In [85]:
# sorted(ranked_dict, reverse=True)

In [107]:
def dog_toy_recommender(nlp, data, user_text):
    ranking_list = calculate_similarity_with_spacy(nlp_lg, data, user_text)
    ranked_dict = {}
    for i, score_toy in enumerate(ranking_list):
        ranked_dict[score_toy[0]] = {i : [score_toy[1], ranking_list[i][2]]}
    final_dict = {}
    for i in sorted(ranked_dict, reverse=True): 
        final_dict[i] =  ranked_dict[i]
    return final_dict

In [108]:
# def run_recommender(nlp, data, user_text):
#     toys = dog_toy_recommender(nlp_lg, data, user_text=user_text)
# #     print(toys)
#     toy_list = []
#     for i in list(toys[:10]):
#         toy_list.append(toys[i])
#     return toy_list 
# #     return toys

def run_recommender(nlp, data, user_text):
    toys = dog_toy_recommender(nlp_lg, data, user_text)
    toy_list = []
    for i in list(toys)[:10]:
        toy_list.append((i, toys[i]))
    return toy_list

In [109]:
toy_list = run_recommender(nlp_lg, data, user_text=user_input_1)
# pd.DataFrame(toy_list[0])
toy_list

2488


[(0.9863853345761364,
  {344: ['All Kind Squeaky Football Dog Toy', 'fetch toys']}),
 (0.9862352719522034,
  {416: ["JW Pet iSqueak Bouncin' Baseball Dog Toy, Color Varies, Large",
    'fetch toys']}),
 (0.9860932339030086,
  {387: ['Ethical Pet Latex Soccer Ball Squeaky Dog Chew Toy, Color Varies, 2-in',
    'fetch toys']}),
 (0.9855679906675416, {324: ['KONG Extreme Ball Dog Toy', 'fetch toys']}),
 (0.9854660718549709,
  {330: ['Hartz Dura Play Ball Squeaky Latex Dog Toy, Color Varies',
    'fetch toys']}),
 (0.9854611233260383,
  {312: ['KONG Squeezz Ball Dog Toy, Color Varies', 'fetch toys']}),
 (0.9853825190068171,
  {1204: ['Ethical Pet Soccer Ball Squeaky Plush Dog Toy', 'plush toys']}),
 (0.9852119961707424,
  {1821: ['Outward Hound Floppyz Donkey Squeaky Plush Dog Toy',
    'plush toys']}),
 (0.9851390718941755,
  {1081: ['Jolly Pets Teaser Ball Dog Toy, Blue', 'interactive toys']}),
 (0.985061207825082,
  {945: ['PetSafe Sportsmen Football Treat Dispensing Tough Dog Chew Toy'

In [110]:
# ranking_list = calculate_similarity_with_spacy(nlp_lg, data, user_text=user_input_1)
# # , vectors

# # queries = numpy.asarray([numpy.random.uniform(-1, 1, (300,))])
# # most_similar = nlp_lg.vocab.vectors.most_similar(np.array(vectors), n=10)

# ranked_dict = {}
# for i, score_toy in enumerate(ranking_list[:10]):
# #     print(i, score_toy[0], ranking_list[i][2])
#     ranked_dict[score_toy[0]] = {i : [score_toy[1], ranking_list[i][2]]}

# # https://www.geeksforgeeks.org/python-sort-python-dictionaries-by-key-or-value/
# final_dict = {}
# for i in sorted(ranked_dict, reverse=True) : 
#     print ((i, ranked_dict[i]), end ='\n')
# #     final_dict[i] = ranked_dict[i]
# # final_dict

In [111]:
run_recommender(nlp_lg, data, user_text=user_input_colby)

2488


[(0.9814632827501275,
  {416: ["JW Pet iSqueak Bouncin' Baseball Dog Toy, Color Varies, Large",
    'fetch toys']}),
 (0.980571381642635,
  {1204: ['Ethical Pet Soccer Ball Squeaky Plush Dog Toy', 'plush toys']}),
 (0.9805123543004775,
  {945: ['PetSafe Sportsmen Football Treat Dispensing Tough Dog Chew Toy',
    'interactive toys']}),
 (0.9799213183240278,
  {2353: ['Mighty Squeaky Stuffing-Free Plush Ball Dog Toy, Orange, Large',
    'plush toys']}),
 (0.97976265475306,
  {1098: ['Jolly Pets Teaser Ball Dog Toy, Red', 'interactive toys']}),
 (0.979609159224325,
  {312: ['KONG Squeezz Ball Dog Toy, Color Varies', 'fetch toys']}),
 (0.9795406871852733,
  {387: ['Ethical Pet Latex Soccer Ball Squeaky Dog Chew Toy, Color Varies, 2-in',
    'fetch toys']}),
 (0.9793151679488704,
  {2337: ['Chuckit! Indoor Flying Squirrel Dog Toy', 'plush toys']}),
 (0.9792858821072271,
  {558: ['West Paw Echo Collection Rando Ball Dog Toy', 'fetch toys']}),
 (0.9792680026751011, {684: ['KONG AirDog Donut 

In [112]:
run_recommender(nlp_lg, data, user_text=user_input_middy)

2488


[(0.9755991992477805,
  {344: ['All Kind Squeaky Football Dog Toy', 'fetch toys']}),
 (0.974459031179541,
  {945: ['PetSafe Sportsmen Football Treat Dispensing Tough Dog Chew Toy',
    'interactive toys']}),
 (0.9740395971322395,
  {1648: ['Petlou Monkey Stick Plush Dog Toy, 26-in', 'plush toys']}),
 (0.9739317168012707,
  {393: ['JW Pet iSqueak Funble Football Dog Toy, Color Varies',
    'fetch toys']}),
 (0.9738388950789447,
  {1212: ['Frisco Muscle Plush Squeaking Unicorn Dog Toy', 'plush toys']}),
 (0.9738076465552218,
  {25: ['Outward Hound Triple Jack Squeaky Dog Chew Toy', 'chew toys']}),
 (0.9737597360016708,
  {2489: ['Ethical Pet Play Strong Ball & Rope Tough Dog Chew Toy',
    'rope & tug toys']}),
 (0.9737056888323712,
  {2512: ['Mammoth Denim 3 Knot Rope Tug with Tennis Ball Dog Toy, Medium',
    'rope & tug toys']}),
 (0.9734631994629767,
  {12: ['JW Pet Play Place Butterfly Puppy Teether, Color Varies',
    'chew toys']}),
 (0.9734487514470396,
  {1247: ['Frisco Plush wi

In [113]:
run_recommender(nlp_lg, data, user_text=user_input_eli)

2488


[(0.9758448370558819,
  {1182: ['Frisco Fur Really Real Squirrel Dog Toy', 'plush toys']}),
 (0.9746307677951922,
  {1102: ['ZippyPaws Zippy Burrow Pig Barn Dog Toy', 'interactive toys']}),
 (0.9742235417266696,
  {1821: ['Outward Hound Floppyz Donkey Squeaky Plush Dog Toy',
    'plush toys']}),
 (0.9735159213487884,
  {1588: ['Ethical Pet Fun Sloth Squeaky Plush Dog Toy, Color Varies',
    'plush toys']}),
 (0.9732204010009505,
  {1222: ['Frisco Fur Really Real Rabbit Dog Toy', 'plush toys']}),
 (0.9731067931941596,
  {1484: ['Multipet Swingin Slevins Squeaky Plush Dog Toy', 'plush toys']}),
 (0.9728508849063767,
  {2498: ['Dogzilla Monster Tug Dog Toy', 'rope & tug toys']}),
 (0.9727337837789289,
  {1282: ['Frisco Bobberz Plush Squeaking Unicorn Dog Toy', 'plush toys']}),
 (0.9726797319921872,
  {1327: ['Frisco Skinny Plush Squeaking Penguin Dog Toy', 'plush toys']}),
 (0.9726548274565907,
  {1963: ['Ethical Pet Chirpies Dog Toy, Character Varie', 'plush toys']})]

In [75]:
run_recommender(nlp_lg, data, user_text='pattycake is a cockapoo. she is older and very calm. she likes to stay indoors, and bark at squirrels from the window')

[(0.9863853345761364,
  {344: ['All Kind Squeaky Football Dog Toy', 'fetch toys']}),
 (0.9862352719522034,
  {416: ["JW Pet iSqueak Bouncin' Baseball Dog Toy, Color Varies, Large",
    'fetch toys']}),
 (0.9860932339030086,
  {387: ['Ethical Pet Latex Soccer Ball Squeaky Dog Chew Toy, Color Varies, 2-in',
    'fetch toys']}),
 (0.9855679906675416, {324: ['KONG Extreme Ball Dog Toy', 'fetch toys']}),
 (0.9854660718549709,
  {330: ['Hartz Dura Play Ball Squeaky Latex Dog Toy, Color Varies',
    'fetch toys']}),
 (0.9854611233260383,
  {312: ['KONG Squeezz Ball Dog Toy, Color Varies', 'fetch toys']}),
 (0.9853825190068171,
  {1204: ['Ethical Pet Soccer Ball Squeaky Plush Dog Toy', 'plush toys']}),
 (0.9852119961707424,
  {1821: ['Outward Hound Floppyz Donkey Squeaky Plush Dog Toy',
    'plush toys']}),
 (0.9851390718941755,
  {1081: ['Jolly Pets Teaser Ball Dog Toy, Blue', 'interactive toys']}),
 (0.985061207825082,
  {945: ['PetSafe Sportsmen Football Treat Dispensing Tough Dog Chew Toy'