# Wine Reviews Search Engine

Enter some sample queries to see how well this performs!

- lots of tannins leading to a harsh, puckery feel in the mouth
- shiraz fruity plum
- fruity chardonnay with cherry flavors
- sweet citrus chardonnay
- dessert wine

In [1]:
# Setup search engine

import build.constants as C
import nmslib
import pandas as pd
import sqlite3 as sql
import time

from sentence_transformers import SentenceTransformer

start = time.process_time()
print(f"LOADING NMS index from {C.NMS_INDEX1}...")
index = nmslib.init(method="hnsw", space="cosinesimil")
index.loadIndex(C.NMS_INDEX1)

print(f"LOADING sentence transformer {C.SENTENCE_TRANSFORMER_MODEL_NAME}...")
model = SentenceTransformer(C.SENTENCE_TRANSFORMER_MODEL_NAME)

print(f"LOADING dataset from {C.SQLITE_DATASET} sqlite file...")
with sql.connect(C.SQLITE_DATASET) as c:
    df = pd.read_sql("select * from wine", c)
end = time.process_time()
print(f"INIT completed in {end-start:.2f} seconds")

def search(df, query: str) -> None:
    start = time.process_time()
    query_embeddings = model.encode(query, convert_to_tensor=True).cpu()
    ids, distances = index.knnQuery(query_embeddings, k=20)
    end = time.process_time()
    print(f"SEARCHED {df.shape[0]} reviews of {df.title.nunique()} wines "
          f"from {df.winery.nunique()} wineries in {(end-start)*1000:.2f}ms\n")

    # TODO: better Jupyter output
    matches = []
    for i, j in zip(ids, distances):
        print((f"NAME: {df.winery.values[i]} {df.title.values[i]} "
            f"({df.country.values[i]})\n"
            f"REVIEW: {df.description.values[i]}\n"
            f"RANK: {df.points.values[i]} "
            f"DISTANCE: {j:.2f}"))

Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib


LOADING NMS index from /data/index.bin...
LOADING sentence transformer msmarco-distilbert-base-v4...
LOADING dataset from ./data/wine.db sqlite file...
INIT completed in 2.62 seconds


In [3]:
search(df, "dessert wine")

SEARCHED 100261 reviews of 99388 wines from 14975 wineries in 11.83ms

NAME: Girl Go Lightly Girl Go Lightly 2012 Moscato (California) (US)
REVIEW: So sugary sweet, it has to be considered a dessert wine. The orange, pineapple, vanilla fudge and honeysuckle flavors have a cleansing edge of acidity.
RANK: 84 DISTANCE: 0.31
NAME: Clos Solène Clos Solène 2010 Sweet Clémentine Grenache (Paso Robles) (US)
REVIEW: Sweet with raspberry jam, cherry liqueur and vanilla cream flavors, this is a good dessert wine. But it could be a lot better if it were more intense. The finish trails off a little.
RANK: 86 DISTANCE: 0.34
NAME: Forest Glen Forest Glen 2009 White Merlot (California) (US)
REVIEW: As sweet as a dessert wine, with sugary raspberry, cherry and spice flavors.
RANK: 82 DISTANCE: 0.34
NAME: Fort Ross Fort Ross 2012 Late Harvest Chardonnay (Fort Ross-Seaview) (US)
REVIEW: Honey, tangerine, apricot and vanilla flavors characterize this dessert wine. With a residual sugar of 97 g/L, it's in