# Search-assist

**Telperion** is a bunch of idependent services, some of which combine into **search-assist**.
These services being
- Search service
- Reverse neural transliteration service (**laurelin**)
- Language model service (**lm**)

In [1]:
# Ignore this code
import json
def render(j): print(json.dumps(j, indent=2, ensure_ascii=False))

## Reverse neural transliteration service

- Transliterates from Indian languages to English
- Indian languages supported: Hindi, Marathi, Bengali, Assamese, Kannada, Tamil, Punjabi
- Not domain sensitive
- Tries to give phonetically correct output
- Is a bare / backbone service, unusable on its own without autocorrect
- Requires special hardware to run and train, porting it to CPU / mobile is TBD

Use cases:
- Can pair up with autocorrect and language models to build reverse-transliteration product for cases where customers just want their search queries in english without any entity recognition.

Exmaple:

In [2]:
import requests

URL = "http://54.161.88.198/transliterate"

ret = requests.post(URL, data=json.dumps(
{
    # You can change this input here
    "sentences":["सैमसंग फ्रंट लोड वाशिंग मशीन विथ लॉट्स ऑफ़ सुपर पावर्स "],
    # Get top 3 candidates
    "candidates":"5",
    # Supported: hi (hindi), bn (bengali), ta (tamil), kn (kannada)
    # mr (marathi), as (assamese), pa (punjabi)
    "language":"hi"
}
))
ret = json.loads(ret.text)
ign = render(ret)

[
  {
    "transliteration": "samsang frant load washing masheen with lots of super povers ",
    "candidates": [
      {
        "word": "samsang frant load washing masheen with lates of super povers ",
        "score": -24.608490943909
      },
      {
        "word": "saimsan frnt load washing machine with lots au suppe powers ",
        "score": -27.435626983643
      },
      {
        "word": "samsung frant lode washing masheen vith lates of super pavers ",
        "score": -29.822515487671
      },
      {
        "word": "saimsun frant load washing mashina with lats of supar parver ",
        "score": -31.818751335144
      },
      {
        "word": "seamsan frant load washing mashin vith lits of super powers ",
        "score": -33.710003852844
      }
    ],
    "input": "सैमसंग फ्रंट लोड वाशिंग मशीन विथ लॉट्स ऑफ़ सुपर पावर्स "
  }
]


In [7]:
# Transliteration with autocorrect and beam over domain-specific language model:

URL = "http://54.210.69.46:5397/search"

ret = requests.post(URL, data=json.dumps(
{
  "text": ["सैमसंग फ्रंट लोड वाशिंग मशीन विथ लॉट्स ऑफ़ सुपर पावर्स "],
  "domains": ["retail"],
  "search_fields": ["entity"],
  "ret_fields": ["entity", "entity_type", "category", "sub_category", "product_type"],
  "pivot_fields": ["category", "sub_category", "product_type"],
  "accuracy": [90],
  "nr_categories": [1],
  "best_of": [1]
}
))
ret = json.loads(ret.text)
print("Best transliteration candidate:")
ign = [ print(x) for x in ret['candidate_queries']]

Best transliteration candidate:
 samsung front load washing machine with lots off super powers


## Language Model service

- Gives log probability of sentences (which sentence out of a list of sentences is more correct?)
- Beam searches for the perfect sentence given a list of words

Use cases:
- Determing the correct-ness of sentences after operations
- Choosing among alternatives (e.g choosing the best autocorrect candidates out of a bunch of possible candidates)
- Can be used for grammar correction

Examples:

In [None]:
# Probability of sentences
URL = "http://54.210.69.46:8080/log_prob"

ret = requests.post(URL, data=json.dumps(
{
  # which sentence is "better"?
  "sentences": ["dog eats food", "food eats dog"],
  "domains": ["generic"]
}
))
ret = json.loads(ret.text)
render(ret)

In [None]:
# Best possible sentences (create the best possible sentence given a bunch of candidates for each word)
# For example, over here "machine" and "gizmo" are synonyms, however,
# "samsung washing machine" is better than "samsung washing gizmo"

URL = "http://54.210.69.46:8080/beam_search"

ret = requests.post(URL, data=json.dumps(
{
  "length": ["1"],
  "tokens": ["1", "2", "3", "4"],
  "beamSize": ["10"],
  "1": ["i", "it", "he"],
  "2": ["must", "might", "was", "am", "is", "probably might", "probably", "probably must"],
  "3": ["be", "have been", "", "have", "will", "will be", "will have been"],
  "4": ["mistaken", "a mistake", "commited mistake", "commited a mistake"]
}
))
ret = json.loads(ret.text)
render(ret)

## Search service

- Transliterates search queries with the help of lm and transliterarion
- Identifies taxonomies (e.g. electronics -> mobile phones) where the search queries might belong to (based on query coverage)
- Identifies terms in the search query which belong to each relevant taxonomy (entity recognition followed by disambiguation)
- Identifies the possible type of each term (e.g. `samsung -> brand`, `front load washing machine -> category`)
- Supports contextual searching (e.g. search in the context of `sports`)
- Support for vertical search engines (supporting multiple verticals / taxonomies)

Use cases:
- Enables free-text searching
- Autocorrect user search queries in English
- Transliterate domain-specific search queries
- Support bot-based / chat-based search
- Address long tail of search queries

Can be built:
- Query recommendation / "Did you mean" functionality
- Auto-complete
- Customer data indexing (for customers who want a completely managed search)
- Document tagging

Examples:

In [None]:
URL = "http://54.210.69.46:5397/search"

ret = requests.post(URL, data=json.dumps(
{
  "text": ["सैमसंग फ्रंट लोड वाशिंग मशीन विथ लॉट्स ऑफ़ सुपर पावर्स "],
  # domain to search in
  "domains": ["retail"],
  # domain-specific values, specific to each index, configurable internally, hidden from users 
  "search_fields": ["entity"],
  "ret_fields": ["entity", "entity_type", "category", "sub_category", "product_type"],
  # this is the taxnomy: category -> sub_category -> product_type
  "pivot_fields": ["category", "sub_category", "product_type"],
  # other API parameters for fine tuning
  "accuracy": [95],
  "nr_categories": [3],
  "best_of": [1]
}
))
ret = json.loads(ret.text)
render(ret)

In [None]:
# Search for the same thing in the context of sports

URL = "http://54.210.69.46:5397/search"

ret = requests.post(URL, data=json.dumps(
{
  "text": ["सैमसंग फ्रंट लोड वाशिंग मशीन विथ लॉट्स ऑफ़ सुपर पावर्स "],
  "domains": ["retail"],
  "search_fields": ["entity"],
  "ret_fields": ["entity", "entity_type", "category", "sub_category", "product_type"],
  "pivot_fields": ["category", "sub_category", "product_type"],
  "accuracy": [90],
  "contexts": ["sports"],
  "nr_categories": [10],
  "best_of": [1]
}
))
ret = json.loads(ret.text)
render(ret)