# QA of DSL Release v1.19 - Authors Search Feature only
The purpose of this notebook is to review and test the new features in 1.19


In [30]:
import dimcli
from dimcli.shortcuts import dslquery, dslqueryall
import pandas as pd
dimcli.login(instance="test")
dsl = dimcli.Dsl() # by default it reuses object from %dsl_login

DimCli v0.5.3 - Succesfully connected to <https://integration.ds-metrics.com> (method: dsl.ini file)


---

## DSL-203 authors search

https://uberresearch.atlassian.net/browse/DSL-203


## using ``in authors`` search field

Usecase
* Exact name search. We have a special search index for name+surname combinations. This will give you better results than using 'where'.

Features
* requires exact name + surname
* can have boolean operators inside the search string (AND, OR, MUST, SHOULD)
* Limitations: can't be used in combination with other full text search operators.
* Asterisks cannot be used as it is slow

In [2]:
# fails as the inner quotes are missing
dsl.query("""search publications in authors for "Michele Pasin" return publications""")

<dimcli.Result object #4357692816. Dict keys: '_stats', 'publications'>

In [3]:
# fails because of *
dsl.query("""search publications in authors for "michele *asin" return publications""")

<dimcli.Result object #4573228048. Dict keys: 'errors'>

In [4]:
# fails with non escaped quotes
dsl.query("""search publications in authors for ""Michele Pasin"" return publications""")

<dimcli.Result object #4357806480. Dict keys: 'errors'>

In [5]:
# works
dsl.query("""search publications in authors for "\\"Michele Pasin\\"" return publications""")

<dimcli.Result object #4590790480. Dict keys: '_stats', 'publications'>

In [6]:
# asterisks return no results - no error because it's a nested string and the asterisk doesn’t have a special meaning anymore
# So this query correctly returns nothing, because there is no person called exactly “M* P*
dsl.query("""search publications in authors for "\\"M* P*\\"" return publications""")

<dimcli.Result object #4560595792. Dict keys: '_stats', 'publications'>

In [7]:
# works with abbreviation
dsl.query("""search publications in authors for "\\"M Pasin\\"" return publications""")

<dimcli.Result object #4591066320. Dict keys: '_stats', 'publications'>

In [8]:
# works, but no data is returned cause we are using a single word
dsl.query("""search publications in authors for "\\"Pasin\\"" return publications""")

<dimcli.Result object #4573255248. Dict keys: '_stats', 'publications'>

In [9]:
# works with AND 
dsl.query("""search publications in authors for "\\"Michele Pasin\\" AND \\"V Lopez\\" " return publications""")

<dimcli.Result object #4591113040. Dict keys: '_stats', 'publications'>

In [10]:
# works with AND and OR 
q = """( \\"Michele Pasin\\" AND \\"V Lopez\\" ) OR ( \\"Michele Pasin\\" AND \\"J Bradley\\" )"""
dsl.query(f"""search publications in authors for "{q}" return publications""")

<dimcli.Result object #4591122640. Dict keys: '_stats', 'publications'>

In [11]:
# works with AND and OR, plus a field filter
q = """( \\"Michele Pasin\\" AND \\"V Lopez\\" ) OR ( \\"Michele Pasin\\" AND \\"J Bradley\\" )"""
dsl.query(f"""search publications in authors for "{q}" where year in [2010:2019] return publications""")

<dimcli.Result object #4591115024. Dict keys: '_stats', 'publications'>

In [4]:
# works with MUST/SHOULD ie + and -
dsl.query("""search publications in authors for "+(\\"Michele Pasin\\") -(\\"John Bradley\\")" return publications[authors]""")

<dimcli.Result object #4645654096. Dict keys: '_stats', 'publications'>

## using ``where authors`` field 

Usecase
* Partial name search. Any combination of name or surname words is allowed. Of course this returns more results.

Features
* it can be used in combination with other full text search operators (eg search in title_abstract)
* does not allow boolean operators in search string
* allows for partial matched using the ~ operator (asterisk and other special chars are stripped out)

In [12]:
# works
dsl.query("""search publications where authors = "Michele Pasin" return publications""")

<dimcli.Result object #4591225616. Dict keys: '_stats', 'publications'>

In [13]:
# does not return any result - should it cause an error?
dsl.query("""search publications where authors = "Michele Pasin AND Vanessa Lopez" return publications""")

<dimcli.Result object #4591114256. Dict keys: '_stats', 'publications'>

In [14]:
# works but * is stripped out presumably
dsl.query("""search publications where authors = "*asin" return publications""")

<dimcli.Result object #4591448592. Dict keys: '_stats', 'publications'>

In [15]:
# works although seems slower than previous queries 
dsl.query("""search publications where authors ~ "michele pasin" return publications[authors]""")

<dimcli.Result object #4602317136. Dict keys: '_stats', 'publications'>

In [16]:
# works and returns a warning as expected
dsl.query("""search publications where authors ~ "asin" return publications[author_affiliations]""")



In [17]:
# works and returns a warning as expected
dsl.query("""search publications where authors ~ "asin" return publications[author_affiliations]""")



In [18]:
# works with authors + search index
dsl.query("""search publications in full_data for "prosopography" where authors ~ "asin" return publications""")

<dimcli.Result object #4591098320. Dict keys: '_stats', 'publications'>

In [19]:
# works with authors + search index + another filter
dsl.query("""search publications in full_data for "prosopography" where authors ~ "asin" and year > 2000 return publications""")

<dimcli.Result object #4606970192. Dict keys: '_stats', 'publications'>