# Wine Reviews Search Engine

Enter some sample queries to see how well this performs!

- lots of tannins leading to a harsh, puckery feel in the mouth
- shiraz fruity plum
- fruity chardonnay with cherry flavors
- sweet citrus chardonnay
- dessert wine

In [2]:
# Setup search engine

import build.constants as C
import nmslib
import pandas as pd
import sqlite3 as sql
import time

from sentence_transformers import SentenceTransformer

start = time.process_time()
print(f"LOADING NMS index from {C.NMS_INDEX1}...")
index = nmslib.init(method="hnsw", space="cosinesimil")
index.loadIndex(C.NMS_INDEX1)

print(f"LOADING sentence transformer {C.SENTENCE_TRANSFORMER_MODEL_NAME}...")
model = SentenceTransformer(C.SENTENCE_TRANSFORMER_MODEL_NAME)

print(f"LOADING dataset from {C.SQLITE_DATASET} sqlite file...")
with sql.connect(C.SQLITE_DATASET) as c:
    df = pd.read_sql("select * from wine", c)
end = time.process_time()
print(f"INIT completed in {end-start:.2f} seconds")

def search(df, query: str) -> None:
    start = time.process_time()
    query_embeddings = model.encode(query, convert_to_tensor=True).cpu()
    ids, distances = index.knnQuery(query_embeddings, k=20)
    end = time.process_time()
    print(f"SEARCHED {df.shape[0]} reviews of {df.title.nunique()} wines "
          f"from {df.winery.nunique()} wineries in {(end-start)*1000:.2f}ms\n")

    # TODO: better Jupyter output
    matches = []
    for i, j in zip(ids, distances):
        print((f"NAME: {df.winery.values[i]} {df.title.values[i]} "
            f"({df.country.values[i]})\n"
            f"REVIEW: {df.description.values[i]}\n"
            f"RANK: {df.points.values[i]} "
            f"DISTANCE: {j:.2f}"))

Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib


LOADING NMS index from /data/index.bin...
LOADING sentence transformer msmarco-distilbert-base-v4...
LOADING dataset from ./data/wine.db sqlite file...
INIT completed in 2.60 seconds


In [4]:
search(df, "shiraz fruity plum")

SEARCHED 100261 reviews of 99388 wines from 14975 wineries in 11.18ms

NAME: D'Arenberg D'Arenberg 2011 Tyche's Mustard Single Vineyard Shiraz (McLaren Vale) (Australia)
REVIEW: Despite some plum fruit, this single-vineyard Shiraz tracks more to the savory side, blending notes of espresso, earth, roasted meat and black olive. It's full bodied and slightly creamy and dusty in texture, with a long finish.
RANK: 90 DISTANCE: 0.27
NAME: Suhru Suhru 2013 Shiraz (North Fork of Long Island) (US)
REVIEW: Fleshy black plums and berries burst from nose to palate of this juicy lip-smacking Long Island Shiraz. It's approachably plush and round with a pleasantly clingy mouthfeel. A backdrop of sweet spice and fine, feather-tipped tannins extend the finish.
RANK: 88 DISTANCE: 0.30
NAME: West Cape Howe West Cape Howe 2009 Two Steps Shiraz (Western Australia) (Australia)
REVIEW: Sourced from Mount Barker, this is a full-bodied, supple example of WA Shiraz. Concentrated plummy fruit is marked by hints 