### Faceted Product Search
This is a demo of product search scenario based on facets (topics) mined from Amazon reviews.<br/>

Following demo is based on topic modeling and indexing a subset of data from the following dataset <br/>
https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Electronics_v1_00.tsv.gz

This demo uses metapy library to build, host & query from inverted index <br/>
[https://github.com/meta-toolkit/metapy](https://github.com/meta-toolkit/metapy)

In [137]:
import pandas as pd

import metapy
import pytoml
import re

### Build Index
- We indexed mined topics, goal is to do faceted search. instead of title search
- Uses inbuilt BM25 ranker using default hyper parameters

In [138]:
config_file = 'config.toml'
index = metapy.index.make_inverted_index(config_file)

# user Bm25 ranker, with optional hyper params
ranker = metapy.index.OkapiBM25()

### Load Product metadata from sentiments feed

In [139]:
product_meta = pd.read_json('../../data/product_sentiments.product_aggregated.slim.json')

### Methods to perform search
Helper method to search index, get product_ids and do lookup to get product title etc.

In [140]:
def QueryIndex(index_query, max_results = 50):
    '''
    Queries the given index and returns product Ids.
    Uses basic BM25 ranker with default parameters.
    '''
    top_docs = ranker.score(index, index_query, num_results=max_results)

    # collect product_ids
    result_productids = []
    for num, (d_id, _) in enumerate(top_docs):
        content = index.metadata(d_id).get('content')
        if content is not None:
            product_id = re.split(r'\t+', content)[0]
            result_productids.append(product_id)
    return result_productids

In [141]:
def GetDocumentContent(result_productids):
    '''
    returns document meta-data like product title for the ids returned by index query.
    '''
    search_results = pd.DataFrame(columns=['product_id','product_title'])

    for product in result_productids:
        product_id = str(product)
        found_df = product_meta[product_meta['product_id'] == product_id][['product_id', 'product_title']]
        if found_df.shape[0] > 0:
            search_results.append(found_df[['product_id', 'product_title']])
            search_results = pd.concat([search_results, found_df])

    return search_results

In [142]:
def PerformSearch(search_query):
    '''
    main driver method for searchign the index.
    '''
    if not search_query.strip():
        print ("Please enter a valid query")
        return
    else:
        index_query = metapy.index.Document()
        index_query.content(search_query)
        
        product_ids_from_index = QueryIndex(index_query)
        search_content = GetDocumentContent(product_ids_from_index)
        
    return search_content

### Issues queries to index and see the results

In [143]:
pd.set_option('display.max_colwidth', -1)

sample_query = "best noise cancelling head phones"
#sample_query = "music player with good battery"
#sample_query = "durable charging cables"
#sample_query = "     "

search_results = PerformSearch(sample_query)
search_results

Unnamed: 0,product_id,product_title
304,B00EWJHRMY,Bose QuietComfort 15 Acoustic Noise Cancelling Headphones - Limited Edition (Discontinued by Manufacturer)
232,B0094S36RI,Logitech 982-000079 UE 6000 Headphones
40,B000GFDC7C,Bose? QuietComfort? 3 Acoustic Noise Cancelling? Headphones
397,B00M1NEUA0,"Bose QuietComfort 25 Headphones (wired, 3.5mm)"
268,B00BSXRBGE,NoiseHush Active Noise-Cancelling Over-Ear Headphones - Black / Silver
444,B00QPHW63G,Symphonized NRG Premium Genuine Wood In-ear Noise-isolating Headphones|Earbuds|Earphones with Microphone
56,B000VWOL3O,Zune Premium Headphones for Zune 4GB
339,B00I0SCD72,Sennheiser Pro Headphones
355,B00IUICOR6,Bose SoundTrue Headphones Around-Ear Style
453,B00SN858RG,"Sentey LS-4420 Warp Black/Red Headphones with Stereo High Definition, Over-Ear, Detachable Audio Cable 3.5mm, Foldable Headphone, Powerful Bass and Carrying Bag Included."
