# Queries

- "Apple pie"
- "Chicken" in the African category
- "Easy bread" less than 2h
- "Pasta bolognese"
- "Oatmeal"

Prioritize doing 3 of the queries first and running test models, then add more if possible.

# Scoring

Human classification of top 100 results obtained using the standard system (LTR-less). Scoring is done on a numeric scale from 0-5.

## Criteria

The attribution of a given score is a bit subjective but tries to follow the following guidelines:

0. A document that does not match the query.
1. A document that vaguely matches the query, is very incomplete (missing important fields, like instructions) and has no reviews. Or has very negative reviews.
2. A document that partially matches the query, is incomplete and has no reviews. Or a document with negative reviews.
3. A document that matches the query semantically, is reasonably complete (may miss more than two fields) and has at least one positive review.
4. A document that perfectly or almost perfectly matches the query semantically, is complete or missing just one of the fields and has a good number of positive reviews (5 to 20).
5. A document that perfectly matches the query semantically, is complete (the recipe has a full ingredient list, steps and cook time/nutritional information) and has a lot of positive reviews (more than 20).

In [3]:
import urllib.parse as urlp

URL = "http://localhost:8983/solr/recipes/select"
URL += "?rows=100"
URL += "&q.op=AND"
URL += "&q={q}"
URL += "&qf=" + "Name^5 Description Ingredients^2 Keywords^2 Instructions Reviews^0.5 AuthorName^0.2"
URL += "&wt=json"
URL += "&defType=edismax"
URL += "&fl=id,RecipeId,score,[features]"
URL += "&fq={fq}"

query = ["apple pie", "chicken", "easy bread", "pasta bolognese", "oatmeal"]
facet = ["", "Category_Facet:African", "", "", ""]
urls = [URL.format(q=query[i], fq=facet[i]) for i in range(len(query))]

In [7]:
import requests
import simplejson
import pandas as pd

for (idx, url) in enumerate(urls):
    response = requests.request("GET", url)
    json = simplejson.loads(response.text)

    for doc in json["response"]["docs"]:
        doc["URL"] = "http://localhost:3000/recipe/{0}".format(doc["RecipeId"]) 
        doc["query"] = query[idx]
        doc["facet"] = facet[idx]
    
    df = pd.DataFrame(json["response"]["docs"])
    df.to_csv("queries/query{0}_results.csv".format(idx+1), index=False)