# User Queries

Our goal in this notebook is to describe a method for handling free form user queries using a pretrained transformer.

We will use the "sentence-transformers" package to transform strings of text to vectors. The package can be installed in VScode by running the code in the next cell in the terminal. 

In [None]:
# pip install -U sentence-transformers


Once the sentence-transformers package has been downloaded, we import the packages we will be using:

In [1]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer("all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


We also import a dataframe that contains recipe names and recipe descriptions.

In [2]:
words = pd.read_pickle('../data/recwords.pk')

In [22]:
words.head()

Unnamed: 0_level_0,Name,Description,RecipeCategory
RecipeId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
38,Low-Fat Berry Blue Frozen Dessert,Make and share this Low-Fat Berry Blue Frozen ...,Frozen Desserts
40,Best Lemonade,This is from one of my first Good House Keepi...,Beverages
42,Cabbage Soup,Make and share this Cabbage Soup recipe from F...,Vegetable
44,Warm Chicken A La King,I copied this one out of a friend's book so ma...,Chicken
45,Buttermilk Pie With Gingersnap Crumb Crust,Make and share this Buttermilk Pie With Ginger...,Pie


We will not use the recipe categories, but both the name and description can be fed to the sentence transformer to obtain a vector that summarizes the "semantic content" of a recipe. 

In [4]:
recnamevecs = embedder.encode(words.Name.values)

In [6]:
recdescvecs = embedder.encode(words.Description.values)

There may be duplicate names, so we will use recipe ids to refer to recipes in the code. However, once the computations have been done, we will want to see the names of the recipes we've obtained. The following dictionary will speed that up.

In [11]:
rec_id_to_name = {i:words['Name'][i] for i in words.index}

In [13]:
rec_ids = list(words.index)
name_vecs_dict = {rec_ids[i]:recnamevecs[i] for i in range(len(rec_ids))}
desc_vecs_dict = {rec_ids[i]:recdescvecs[i] for i in range(len(rec_ids))}

We now have two distinct ways of associating a vector to each recipe.

A user query will consist of the following:
* A string $s$, whose semantic content describes what the user wants.
* An integer $k$ that tells us how many recipes to return.

Given a user query:
* We use the transformer to obtain a vector $v_s$ representing the string $s$.
* We sort the recipes, so that the recipes whose vectors have the largest dot product with $v_s$ come up first.
* We return the names of the top $k$ recipes after sorting. 

In [14]:
def find_rec_dot(query:str,id_to_vecs:dict,num_results:int)->list:
    rec_ids_copy = list(id_to_vecs.keys())
    query_vec = embedder.encode(query)
    rec_ids_copy.sort(key = lambda x: np.dot(query_vec,id_to_vecs[x]),reverse=True)
    return [rec_id_to_name[id] for id in rec_ids_copy[:num_results]]
    


Ok, we're ready to start searching for recipes.

In [23]:
find_rec_dot('I want an elegant chocolate recipe',name_vecs_dict,10)

['Easy Chocolate Delight',
 'Chocolate Lush Layered Dessert',
 'Homemade Hot Chocolate',
 'Fabulous Hot Chocolate',
 'Creamy, Thick Hot Chocolate',
 'Truly Amazing Creamy Hot Chocolate',
 'Easy (But Elegant) Chocolate Candy',
 'Best Ever Chocolate Cake - Recipe',
 'Thick and Chocolatey Hot Chocolate',
 'Ultra-Rich Hot Chocolate']

In [24]:
find_rec_dot('I want an greasy comfort food',name_vecs_dict,10)

['Veal or Turkey Burgers W/Onion Gravy (Low Fat!)',
 'Hot Chicken, Bacon &amp; Garlic Mayo',
 'The Naughty Things I Do for Chicken Tortilla Soup',
 'Fried Fresh Corn With Bacon Grease',
 'Olive Mayonnaise for Hot Dogs, Burgers, Chip Dip',
 "Kittencal's Moist Turkey Burgers for the Grill (Low Fat)",
 'Corn and Shrimp Soup (Low-Fat)',
 'Wet Chicken or Turkey Burritos',
 'Homestyle Chicken Noodle Soup',
 'Low Fat Butter Bean and Ham Soup']

This is not good - "low fat" and "greasy" are close.

In [18]:
find_rec_dot('I want greasy comfort food',desc_vecs_dict,10)

['Southern Buttermilk Fried Chicken',
 'Vegetarian Tortilla Soup',
 'Yosemite Chicken Stew  With Cornmeal Dumplings  (Low Fat)',
 'Perfect Homemade Hash Browns',
 "Dee's Philly Cheese Steak  Burger",
 'Deli Macaroni Salad',
 'Chicken Chow Mein',
 'Dipping Oil -  Rosemary Garlic',
 'Savory Spiced Nuts',
 'Beef and Noodles - Crock Pot']

Using descriptions instead of recipe names seems to fix the issue.

In [19]:
find_rec_dot('Healthy vegetable dish',desc_vecs_dict,10)

['Oriental Stir Fry Vegetables With Oyster Sauce',
 'Tomato-Zucchini Gratin',
 'Pinakbet ( Philippine Vegetable Stew)',
 'Brown Rice and Carrot Pilaf',
 'Carrot and Squash Stir-Fry',
 'Healthy Bow Tie Chicken Supper',
 'Simple Hamburger Helper',
 "Weck's Fabulous Veggie Sandwich",
 'Curry Spiced Winter Squash',
 'Oven-Roasted Vegetables']

In [20]:
find_rec_dot('I have bell peppers and shrimp',desc_vecs_dict,10)

['Shrimp With Red and Yellow Peppers',
 'Spicy Shrimp With Green Beans &amp; Red Pepper',
 'Super Bowl Sunday Seafood Dip',
 'Northwest Creamy Smoked Salmon Fettuccine Alfredo',
 'Shrimp With Tomatoes, Olives and Basil',
 'Shrimp With Broccoli',
 'Shrimp and Egg Fried Rice With Napa Cabbage - Tyler Florence',
 'Spicy Salsa-Cilantro Shrimp',
 'Garlic Shrimp Spaghetti',
 'Quick Shrimp Scampi Bake']

In [21]:
find_rec_dot('Spicy noodles and vegetable dish',desc_vecs_dict,10)

['Easy Lazy Lasagna',
 'Thai Noodles With Chicken',
 'Spicy Sesame Noodles',
 'Spicy Asian Noodles With Chicken',
 'Spicy Szechuan Peanut Sauce',
 'Spicy Shrimp and Noodles',
 '3 Dragon’s Szechuan Steak',
 'Asian Creation',
 'Spicy pepper beef with noodles',
 'Thai Ginger Coconut Vegetable Toss']