# Wine Reviews Search Engine

Enter some sample queries to see how well this performs!

- lots of tannins leading to a harsh, puckery feel in the mouth
- shiraz fruity plum
- fruity chardonnay with cherry flavors
- sweet citrus chardonnay
- dessert wine

In [1]:
# Setup search engine

import build.constants as C
import nmslib
import pandas as pd
import sqlite3 as sql
import time

from sentence_transformers import SentenceTransformer

start = time.process_time()
print(f"LOADING NMS index from {C.NMS_INDEX1}...")
index = nmslib.init(method="hnsw", space="cosinesimil")
index.loadIndex(C.NMS_INDEX1)

print(f"LOADING sentence transformer {C.SENTENCE_TRANSFORMER_MODEL_NAME}...")
model = SentenceTransformer(C.SENTENCE_TRANSFORMER_MODEL_NAME)

print(f"LOADING dataset from {C.SQLITE_DATASET} sqlite file...")
with sql.connect(C.SQLITE_DATASET) as c:
    df = pd.read_sql("select * from wine", c)
end = time.process_time()
print(f"INIT completed in {end-start:.2f} seconds")

def search(df, query: str) -> None:
    start = time.process_time()
    query_embeddings = model.encode(query, convert_to_tensor=True).cpu()
    ids, distances = index.knnQuery(query_embeddings, k=20)
    end = time.process_time()
    print(f"SEARCHED {df.shape[0]} reviews of {df.title.nunique()} wines "
          f"from {df.winery.nunique()} wineries in {(end-start)*1000:.2f}ms\n")

    # TODO: better Jupyter output
    matches = []
    for i, j in zip(ids, distances):
        print((f"NAME: {df.winery.values[i]} {df.title.values[i]} "
            f"({df.country.values[i]})\n"
            f"REVIEW: {df.description.values[i]}\n"
            f"RANK: {df.points.values[i]} "
            f"DISTANCE: {j:.2f}"))

Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib


LOADING NMS index from ./data/index.bin...
LOADING sentence transformer msmarco-distilbert-base-v4...
LOADING dataset from ./data/wine.db sqlite file...
INIT completed in 2.77 seconds


In [3]:
search(df, "lots of tannins leading to a harsh, puckery feel in the mouth")

SEARCHED 100261 reviews of 99388 wines from 14975 wineries in 11.60ms

NAME: Clos La Chance Clos La Chance 2010 Estate Zinfandel (Central Coast) (US)
REVIEW: Harsh in the mouth, with sharp acidity and uneven tannins, this has a candied-berry flavor.
RANK: 82 DISTANCE: 0.41
NAME: San Simeon San Simeon 2009 Syrah (Paso Robles) (US)
REVIEW: Feels harsh and bitter in the mouth, with simple, sweet flavors of raisins, white sugar and alcohol.
RANK: 80 DISTANCE: 0.45
NAME: Michael Pozzan Michael Pozzan 2011 Pinot Noir (Russian River Valley) (US)
REVIEW: Feels harsh in the mouth, mostly because the fruit is too thin to balance out the acids and tannins. It has watery, dry flavors of cola and pomegranates.
RANK: 82 DISTANCE: 0.45
NAME: Casa Tiene Vista Casa Tiene Vista 2010 Blackbird Merlot (California) (US)
REVIEW: Feels harsh and angular in the mouth, with overripe raisin flavors. Not going anywhere.
RANK: 81 DISTANCE: 0.48
NAME: Riboli Riboli 2005 Cabernet Sauvignon (Rutherford) (US)
REVIEW: