# Using spaCy for Capstone Project
### This includes more cleaning and EDA with some modeling practice 

https://www.geeksforgeeks.org/python-sort-python-dictionaries-by-key-or-value/   
https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042      
https://towardsdatascience.com/so-whats-spacy-ad65aa1949e0   
The following links from spacy.io were used to help create this system (note there are many links based off these that are used, but these are the main ones)  
https://spacy.io/usage   
https://spacy.io/usage/spacy-101   
https://spacy.io/usage/vectors-similarity   
https://spacy.io/api/span#vector    
https://spacy.io/api/vectors     
  

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import spacy

from nltk.tokenize import sent_tokenize, word_tokenize, RegexpTokenizer
from nltk.corpus import stopwords 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

In [2]:
nlp_sm = spacy.load('en_core_web_sm')
nlp_lg = spacy.load('en_core_web_lg')

In [3]:
data = pd.read_csv('./data/chewy.csv')

In [4]:
data.head()

Unnamed: 0,title,price,descriptions,key_benefits,rating,reviews,subcat,cat,combined_text,combined_sent_len,combined_word_leng,tokenized_combined,token_vectors,avg_vector
0,Nylabone Teething Pacifier Puppy Chew Toy,3.59,'Every puppy needs a pacifier to soothe teethi...,Designed to encourage positive play and teach ...,4.2,I do not know how to rate this. The puppy this...,moderate,chew toys,Every puppy needs a pacifier to soothe teethin...,50,1146,"['every', 'puppy', 'needs', 'pacifier', 'sooth...","[-0.04003238, 0.16376925, -0.17635255, -0.0941...",0.00292
1,"KONG Puppy Dog Toy, Color Varies",6.99,"""The Puppy KONG dog toy is customized for a gr...",Unpredictable bounce is great for energetic pu...,4.3,"I have had dozens of dogs over the years, and ...",moderate,chew toys,"""The Puppy KONG dog toy is customized for a gr...",43,964,"['puppy', 'kong', 'dog', 'toy', 'customized', ...","[-0.04645431, 0.17621087, -0.15408944, -0.0865...",0.002565
2,Petstages Dogwood Tough Dog Chew Toy,8.83,"""Chewing is a natural behavior in all dogs, as...",Chew toy that combines real wood with syntheti...,4.2,"Our 8 month old, shepherd/mastiff has had this...",moderate,chew toys,"""Chewing is a natural behavior in all dogs, as...",48,1144,"['chewing', 'natural', 'behavior', 'dogs', 'he...","[-0.043032583, 0.16056213, -0.17279097, -0.084...",0.005094
3,Nylabone Teething Rings Puppy Chew Toy,6.57,'Great for teething and tugging! The Puppy Tee...,Designed to encourage positive play and teach ...,4.1,I have a small shih tzu who absolutely loves t...,moderate,chew toys,Great for teething and tugging! The Puppy Teet...,32,772,"['great', 'teething', 'tugging', 'puppy', 'tee...","[-0.040631622, 0.15921111, -0.15474631, -0.073...",0.004021
4,Nylabone Puppy Teething X Bone Beef Flavored P...,6.89,'Curious puppies have met their match with the...,Non-edible dog toy is made for teething puppie...,4.1,This is a great shoe for a tiny puppy in there...,moderate,chew toys,Curious puppies have met their match with the ...,25,804,"['curious', 'puppies', 'met', 'match', 'nylabo...","[-0.03072327, 0.15317601, -0.14757921, -0.0565...",0.00323


# This next section will be usefull code for the actual recommendation system 

### User Input-- What will the user need to do?
The user input will be a short paragraph answering the following the questions to the best of their abilities:
 - What type of dog do you have?
 - Is your dog a puppy, or older?
 - Is your dog super energetic and playful or more calm?
 - Would you say your dog is smarter than most dogs?
 - Does your dog like to chase after toys, cuddle toys, or chew on them? 
 - Does your dog get pretty bored after playing with a toy for a bit? 
 - Does your dog already have an ideal toy that is their absolute favorite? (If yes, describe it.)   
 Try to describe what your dog IS rather what you dog IS NOT so that the model works better. 
 
The paragraph of text describing their dog will be tokenized and stopwords removed so that it can then be compared to tokenized text for every dog toy to find the most similar toys based off the text. 

The text that I will use to create and test this will be about one of my own dogs Colby:  
An okay example:   
"Colby is 9-year-old retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very food oriented. I wouldn't say he's very smart and definielty prefers to chase toys more than cuddle or chew on them. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."

A better example:
"Colby is 9-year-old retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very food oriented. He is pretty dumn and definielty prefers to chase toys. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."

A third example: 
"Eli is a miniature dachshund that loves to run around and snuggle with soft toys. She is very active but also loves to cuddle and loves toys." 

In [5]:
# tokenizer = RegexpTokenizer(r'\w+')
# stopwords = stopwords.words('english') + ['dog', 'toy', 'dogs', 'toys', 'one']

In [6]:
# User input cleaning: 
user_input_1 = "Colby is 9-year-old golden retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very foot oriented. I wouldn't say he's very smart and definielty prefers to chase toys more than cuddle or chew on them. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."
user_input_2 = "Colby is 9-year-old retriever. For the most part, he is pretty mellow, but sometimes he gets bursts of energy, especially when we go swimming and play fetch in the pool with a ball. He also really like treats and is very food oriented. He is pretty dumn and definielty prefers to chase toys. He will get bored with a toy after a bit. His favorite toy is probably a ball and not sqeaky."
user_input_3 = "Eli is a miniature dachshund that loves to run around and snuggle with soft toys. She is very active but also loves to cuddle and loves toys." 
user_input_4 = "My dog is incredibly smart but lazy and loves plush toys to cuddle with."
# user_doc = nlp_lg(user_input)

# user_tokens = tokenizer.tokenize(str(user_doc).lower())
# user_text = [token for token in user_tokens if token not in stopwords]

# user_no_stop_words
# user_text = []
# for token in user_doc:
#     tokenizer.tokenize(str(token).lower())
# user_text

In [7]:
# combined_tokens = tokenizer.tokenize(df['combined_text'][i].lower())
# no_stop_words = [token for token in combined_tokens if token not in stopwords.words('english')]

In [36]:
# https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042

list_docs = []
for i in range(len(data)):
    if data['combined_text'][i] != '':
        doc = nlp_lg("u" + str(data['combined_text'][i]) + "'")
        list_docs.append(doc)

In [37]:
len(list_docs)

2399

In [40]:
# https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042
# # This will take in what the users says and then score it based off that 
# # I have added to this as well

# def calculate_similarity_with_spacy(nlp, df, user_text, n=6):
#     # Calculate similarity with Spacy 
#     list_sim = []
#     toy_score = []
#     doc1 = nlp(user_text + "'")
#     vectors = doc1.vector # not quite sure if this will be useful bc not sure how to use the .most_similar here
#     for i in df.index:
#         try: 
#             doc2 = list_docs[i]
#             score = doc1.similarity(doc2)
#             list_sim.append((doc1, doc2, score))
#             toy_score.append((score, df['title'][i], df['cat'][i], df['description']))
#         except:
#             continue
    
#     return toy_score, list_sim

Running it on the okay example

In [51]:
def calculate_similarity_with_spacy(nlp, df, user_text, n=6):
    # Calculate similarity with Spacy 
    list_sim = []
    toy_score = []
    doc1 = nlp(user_text + "'")
    vectors = doc1.vector # not quite sure if this will be useful bc not sure how to use the .most_similar here
    for i in df.index:
        try: 
            doc2 = list_docs[i]
            score = doc1.similarity(doc2)
            list_sim.append((doc1, doc2, score))
            toy_score.append((score, df['title'][i], df['cat'][i]))
        except:
            continue
    return toy_score

In [52]:
ranking_list = calculate_similarity_with_spacy(nlp_lg, data, user_text=user_input_1)
ranking_list

[(0.9781122065996709,
  'Nylabone Teething Pacifier Puppy Chew Toy',
  'chew toys'),
 (0.9763683389654572, 'KONG Puppy Dog Toy, Color Varies', 'chew toys'),
 (0.9776567409882416, 'Petstages Dogwood Tough Dog Chew Toy', 'chew toys'),
 (0.9785074494357022, 'Nylabone Teething Rings Puppy Chew Toy', 'chew toys'),
 (0.9765118088279411,
  'Nylabone Puppy Teething X Bone Beef Flavored Puppy Chew Toy, Small',
  'chew toys'),
 (0.9753482309690404,
  'Nylabone Puppy Petite Dental Puppy Chew Toy',
  'chew toys'),
 (0.9783614941184837,
  'Nylabone Puppy Chicken Flavored Teething Dinosaur Puppy Chew Toy, Petite',
  'chew toys'),
 (0.97674373669675,
  'Petstages Cool Teething Stick Tough Dog Chew Toy',
  'chew toys'),
 (0.9792848075864213,
  'KONG Puppy Goodie Bone Dog Toy, Color Varies',
  'chew toys'),
 (0.9784351065806404,
  "Hartz Chew 'n Clean Tuff Bone Tough Dog Chew Toy Toy, Color Varies",
  'chew toys'),
 (0.9758858714930118,
  'Nylabone Puppy Chew Starter Kit Triple Pack Puppy Chew Toy',
  

In [13]:
# sorted(ranked_dict, reverse=True)

In [14]:
def dog_toy_recommender(nlp, data, user_text):
    ranking_list = calculate_similarity_with_spacy(nlp_lg, data, user_text=user_text)
    ranked_dict = {}
    for i, score_toy in enumerate(ranking_list):
    #     print(i, score_toy[0], ranking_list[i][2])
        ranked_dict[score_toy[0]] = {i : [ranking_list[i][2], score_toy[1]]}
    final_dict = {}
    for i in sorted(ranked_dict, reverse=True) : 
        final_dict[i] =  ranked_dict[i]
    return final_dict

In [15]:
def run_recommender(nlp, data, user_text):
    toys = dog_toy_recommender(nlp_lg, data, user_text=user_text)
#     print(toys[:10])
    toy_list = []
    for i in list(toys)[:10]:
        toy_list.append(toys)
    return toy_list 

In [16]:
toy_list = run_recommender(nlp_lg, data, user_text=user_input_1)
toy_list

[]

In [17]:
# ranking_list = calculate_similarity_with_spacy(nlp_lg, data, user_text=user_input_1)
# # , vectors

# # queries = numpy.asarray([numpy.random.uniform(-1, 1, (300,))])
# # most_similar = nlp_lg.vocab.vectors.most_similar(np.array(vectors), n=10)

# ranked_dict = {}
# for i, score_toy in enumerate(ranking_list[:10]):
# #     print(i, score_toy[0], ranking_list[i][2])
#     ranked_dict[score_toy[0]] = {i : [score_toy[1], ranking_list[i][2]]}

# # https://www.geeksforgeeks.org/python-sort-python-dictionaries-by-key-or-value/
# final_dict = {}
# for i in sorted(ranked_dict, reverse=True) : 
#     print ((i, ranked_dict[i]), end ='\n')
# #     final_dict[i] = ranked_dict[i]
# # final_dict

In [18]:
run_recommender(nlp_lg, data, user_text=user_input_2)

[]

In [19]:
run_recommender(nlp_lg, data, user_text=user_input_3)

[]

In [20]:
run_recommender(nlp_lg, data, user_text=user_input_4)

[]

In [21]:
run_recommender(nlp_lg, data, user_text='pattycake is a cockapoo. she is older and very calm. she likes to stay indoors, and bark at squirrels from the window')

[]