# Translate and DSL Search: Introduction

September 18, 2019

1. you enter a search text (e.g. 乳がん)
2. you detect the language using Google translate (in the example: Japanese) - Jan can help if needed
3. in case non-English text was entered
  * you translate the text into English (in the example: breast cancer)
  * you expand the search text (in the example: 乳がん OR "breast cancer") To be able to set the quotes, you need to get the tokens from the translation service.  
4. You are performing a search in all available Dimensions data sources and you are returning the number of results without and with expansion

# 1. Install Libraries

### Important
If you see a **'Restart Runtime'** button after running this cell, please click it in order ensure all libraries work as expected.

In [None]:
# NOTE at the end of the installation please click the 'Restart Runtime' button 
!pip install dimcli prompt-toolkit plotly_express tqdm googletrans ipython -U

# 2. Log into the Dimensions API

And also setup a bunch of useful libraries. 

In [1]:
import pandas
import time
from tqdm import tqdm_notebook as tqdm
from googletrans import Translator
import dimcli
import plotly_express as px
# 
from getpass import getpass
user = getpass('Enter username here')
password = getpass('Enter password here')
print('=> username is', user)
print('=> password is', "*" * len(password))
dimcli.login(user, password)
#
dsl = dimcli.Dsl()

translator = Translator()

Enter username here ···
Enter password here ····


=> username is emd
=> password is ****
401 Client Error: Unauthorized for url: https://app.dimensions.ai/api/auth.json
Login failed: please ensure your credentials are correct.


# 3. Define Functions for Querying

In [None]:
def translate(word):
    """ Uses the  https://pypi.org/project/googletrans/ library to translate"""
    # eg translate('乳がん')
    r = translator.translate(word)
    if r.src == "en":
      return False
    else:
      return r.text

def build_keywords(word):
    """
    From a single keyword eg 乳がん, returned a list of keywords from translation eg
    ['\\"乳がん\\"', '\\"Breast cancer\\"', '\\"乳がん\\" OR \\"Breast cancer\\"']
    """
    t = translate(word)
    if t:
      return [f"""\\\"{word}\\\"""", f"""\\\"{t}\\\""""] # f"""\\\"{word}\\\" OR \\\"{t}\\\""""
    else: # if english, no need to expand
      return [f"""\\\"{word}\\\""""]

# The next function launches a query ```search publications for \"keyword\" return publications limit 1``` on all DSL sources. 

def multi_search(keywords, sources):
    "Launch multiple keyword searches across multiple DSL sources"
    out = []
    for k in tqdm(keywords, desc='1st loop: keywords'):
        for s in tqdm(sources, desc='2nd loop: sources'):
            res = dsl.query(f"""search {s} for \"{k}\" return {s} limit 1""",  show_results=False)
    #         print(s, res.total_count)
            out.append({'source' : s, 'objects' : res.total_count, 'query' : k})
            time.sleep(1)
    return pandas.DataFrame.from_dict(out)

# 4. Run Your Query

In [None]:
#@title Enter a Search Term

search_term = '\u4E73\u304C\u3093'  #@param {type: "string"}

keywords = build_keywords(search_term)
print("Expanded into.. :", keywords)
sources = [x for x in dimcli.G.sources() if x != 'researchers']
print("Searching in.. :", sources)
df = multi_search(build_keywords(search_term), sources)
df

### Plot the results 

In [None]:
px.bar(df, x="query", y="objects", color="source")

In [None]:
px.bar(df, x="source", y="objects", facet_col="query")