# QA of DSL Release v1.22
The purpose of this notebook is to review and test the new features in 1.22

Docs: https://docs.dimensions.ai/dsl/1.22.0-preview/


## Prerequisites

Please install the latest versions of these libraries to run this notebook. 

In [14]:
#
# load common libraries
import pandas as pd
from pandas.io.json import json_normalize
import time
import json
from tqdm import tqdm_notebook as tqdm
import plotly_express as px
import dimcli
from dimcli.shortcuts import *

dimcli.login(instance="test")
dsl = dimcli.Dsl() 


DimCli v0.6.2.4 - Succesfully connected to <https://integration.ds-metrics.com> (method: dsl.ini file)


---

## [DSL-307] Improve warning messages

https://uberresearch.atlassian.net/browse/DSL-307

In [3]:
%dsl search publications where research_orgs.types="Company" and year = 2014

Returned Publications: 20 (total = 27779)
Usage of entity filters (research_orgs.types) is discouraged as it often leads to incomplete results. Please see https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields for more details.
Query is too long or complex. Please see https://docs.dimensions.ai/dsl/faq.html for more information. [code: 4]


<dimcli.Dataset object #4666093648. Records: 20/27779>

In [4]:
%%dsl 
search publications in full_data for "\"ALK non small cell lung cancer\"~5 OR \"ALK NSCLC\"~5"
where year in [2015:2019] and funders.types in ["Company"]
return funders

Returned Funders: 20
Usage of entity filters (funders.types) is discouraged as it often leads to incomplete results. Please see https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields for more details.
Query is too long or complex. Please see https://docs.dimensions.ai/dsl/faq.html for more information. [code: 4]


<dimcli.Dataset object #4666115152. Records: 20/508>

In [5]:
%%dsl 
search publications in full_data for "CRISPR\/CAS9"
where researchers.first_publication_year>2014
return researchers[all] 

Returned Researchers: 11
Query is too long or complex. Please see https://docs.dimensions.ai/dsl/faq.html for more information. [code: 4]
Usage of entity filters (researchers.first_publication_year) is discouraged as it often leads to incomplete results. Please see https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields for more details.


<dimcli.Dataset object #4666140624. Records: 11/1>

In [6]:
%%dsl 
search publications return publications[researchers] return researchers[all]

Returned Publications: 20 (total = 107124256)
Returned Researchers: 20


<dimcli.Dataset object #4666247184. Records: 20/107124256>

**Comments** 

* Change message as follows: “Please review your query, as it contains an entity filter (XX-XX-XX) that can lead to incomplete results. More details on https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields.“
* Be selective when to show message
    * Do not show message for entities: states, open_access, countries, categories, org groups  
    * Show message for entities: journals, orgs, cities (not 100% sure about cities so please dbcheck) 
* Show a single message: ie no need to also show also the warning 'Query is too long or complex.etc..'

## [DSL-325] Add clinical_trials.acronym and clinical_trials.brief_title

https://uberresearch.atlassian.net/browse/DSL-325

In [8]:
fields = ["acronym", "brief_title", "interventions"]

In [9]:
%dsldocs clinical_trials

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,clinical_trials,abstract,string,Abstract or description of the clinical trial.,False,False,False
1,clinical_trials,acronym,string,Acronym of the clinical trial.,True,False,False
2,clinical_trials,active_years,integer,List of active years for a clinical trial.,True,False,True
3,clinical_trials,associated_grant_ids,string,Dimensions IDs of the grants associated to the...,True,False,False
4,clinical_trials,brief_title,string,Brief title of the clinical trial.,True,False,False
5,clinical_trials,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
6,clinical_trials,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
7,clinical_trials,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
8,clinical_trials,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True
9,clinical_trials,category_hrcs_rac,categories,`HRCS – Research Activity Codes <https://app.d...,True,True,True


In [10]:
q = """search clinical_trials where {} is not empty return clinical_trials"""

for x in fields:
    data = dsl.query(q.format(x))
    print(data.stats)

Returned Clinical_trials: 20 (total = 95345)
{'total_count': 95345}
Returned Clinical_trials: 20 (total = 486704)
{'total_count': 486704}
Returned Clinical_trials: 20 (total = 328633)
{'total_count': 328633}


In [11]:
q = """search clinical_trials for "hiv" where {} is not empty return clinical_trials[id+brief_title+acronym+interventions] limit 1"""

for x in fields:
    data = dsl.query(q.format(x))
    print(data.json)

Returned Clinical_trials: 1 (total = 2804)
{'_stats': {'total_count': 2804, 'limit': 1, 'offset': 0}, 'clinical_trials': [{'brief_title': 'The Prevalence of Neurocognitive Disorder in a Primary Care-based HIV Cohort Compared to a HIV-negative Control Cohort -', 'id': 'NCT01434563', 'acronym': 'CNS HAND'}]}
Returned Clinical_trials: 1 (total = 11132)
{'_stats': {'total_count': 11132, 'limit': 1, 'offset': 0}, 'clinical_trials': [{'brief_title': 'Exploratory open-label randomized clinical trial to assess the efficacy of first-line dual vs triple antiretroviral therapy art in hiv-1 reservoir and in peripheral compartments in hiv-infected patients', 'id': '2019-002733-10'}]}
Returned Clinical_trials: 1 (total = 9035)
{'_stats': {'total_count': 9035, 'limit': 1, 'offset': 0}, 'clinical_trials': [{'brief_title': 'Impact of CMV Status on HIV Viral Load Decay', 'id': 'NCT03349359', 'interventions': [{'type': 'Drug', 'name': 'HIV antiretroviral therapy', 'description': "HIV patients are receivi

## [DSL-329] Fix error with missing cities IDs when only one is missing

https://uberresearch.atlassian.net/browse/DSL-329

In [13]:
%dsl search publications  where type in ["article", "chapter", "proceeding", "monograph"] and (id in ["pub.1027065973"]) return publications[all] sort by id  limit 25 skip 0

Returned Publications: 1 (total = 1)
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/r

<dimcli.Dataset object #4677885712. Records: 1/1>

## [DSL-244] Publications.resulting_publication_doi (link from preprints)

https://uberresearch.atlassian.net/browse/DSL-244

In [6]:
%dsldf search publications where resulting_publication_doi is not empty return publications[id+doi+title+journal+type+year+resulting_publication_doi]

Returned Publications: 20 (total = 871144)


Unnamed: 0,id,journal.id,journal.title,resulting_publication_doi,title,type,year
0,pub.1123829662,jour.1371339,arXiv,10.1145/3369199.3369225,Femoral Neck Angle Impacts Hip Disorder and Su...,preprint,2020
1,pub.1123829659,jour.1371339,arXiv,10.1103/physrevd.100.124059,Revisiting scalar and tensor perturbations in ...,preprint,2020
2,pub.1123829724,jour.1371339,arXiv,10.1080/15384101.2019.1706903,The double dealing of cyclin D1,preprint,2020
3,pub.1123829794,jour.1371339,arXiv,10.1038/s41467-019-14031-2,Colloidal interactions and unusual crystalliza...,preprint,2020
4,pub.1123829749,jour.1371339,arXiv,10.1142/11186,Astrophysical Constraints on Strong Modified G...,preprint,2020
5,pub.1123829786,jour.1371339,arXiv,10.1007/978-3-319-27279-5_10,On phase asymmetries in oscillatory pipe flow,preprint,2020
6,pub.1123829758,jour.1371339,arXiv,10.1007/s11468-019-01105-6,Plasmonic sensors based on funneling light thr...,preprint,2020
7,pub.1123829753,jour.1371339,arXiv,10.1109/jsen.2019.2960556,Ultra-narrow spectral response of a hybrid pla...,preprint,2020
8,pub.1123829716,jour.1371339,arXiv,10.1039/c8cp07764a,Entangling non planar molecules via inversion ...,preprint,2020
9,pub.1123829656,jour.1371339,arXiv,10.1109/mgrs.2019.2955120,A Review on InSAR Phase Denoising,preprint,2020


In [9]:
%dsldf search publications where doi="10.1038/s41586-019-1247-7"

Returned Publications: 1 (total = 1)


Unnamed: 0,author_affiliations,id,issue,journal.id,journal.title,pages,title,type,volume,year
0,"[[{'first_name': 'Ye', 'last_name': 'Yuan', 'c...",pub.1115995497,7760,jour.1018957,Nature,214-218,Elastic colloidal monopoles and reconfigurable...,article,570,2019


In [12]:
%dsldf search publications where resulting_publication_doi is not empty return year limit 100

Returned Year: 32


Unnamed: 0,count,id
0,61207,2018
1,59532,2017
2,58371,2016
3,54099,2015
4,53272,2014
5,50383,2013
6,47979,2012
7,44821,2011
8,41742,2010
9,38347,2009


In [13]:
%dsldf search publications where resulting_publication_doi is not empty return journal limit 50

Returned Journal: 4


Unnamed: 0,count,id,title
0,860210,jour.1371339,arXiv
1,8074,jour.1293558,bioRxiv
2,1660,jour.1345647,JMIR Preprints
3,1200,jour.1053424,PeerJ Preprints


In [14]:
%dsldf search publications where resulting_publication_doi is not empty return type

Returned Type: 1


Unnamed: 0,count,id
0,871144,preprint


In [16]:
%dsldf search publications where resulting_publication_doi = "10.1038/s41586-019-1247-7"

Returned Publications: 1 (total = 1)


Unnamed: 0,author_affiliations,id,journal.id,journal.title,title,type,year
0,"[[{'first_name': 'Ye', 'last_name': 'Yuan', 'c...",pub.1123829620,jour.1371339,arXiv,Elastic colloidal monopoles and reconfigurable...,preprint,2020


## [DSL-310] Extend policy documents: FOR v2, cancer types

https://uberresearch.atlassian.net/browse/DSL-310

> tests based on the previous QA of 1.21 https://github.com/digital-science/dsl-QA/blob/master/v1.21/v1-1.21-qa.ipynb

**Testing FOR v2**

In [19]:
for x in ['policy_documents']:
    one = "04 Earth Sciences"
    two = "0201 Astronomical and Space Sciences"
    q = f"""search {x} where category_for.name="{one}" return {x} """
    print(q)
    data = dsl.query(q)
    q = f"""search {x} where category_for.name="{two}" return {x} """
    print(q)
    data = dsl.query(q)

search policy_documents where category_for.name="04 Earth Sciences" return policy_documents 
Returned Policy_documents: 20 (total = 3936)
search policy_documents where category_for.name="0201 Astronomical and Space Sciences" return policy_documents 
Returned Policy_documents: 20 (total = 120)


**Testing that we get warnings with old FOR**

In [20]:

for x in ['policy_documents']:
    one = "04 Earth Sciences"
    two = "0201 Astronomical and Space Sciences"
    q = f"""search {x} where FOR.name="{one}" return {x} """
    print("===\n", q)
    data = dsl.query(q)
    q = f"""search {x} where FOR.name="{two}" return {x} """
    print("===\n", q)
    data = dsl.query(q)

===
 search policy_documents where FOR.name="04 Earth Sciences" return policy_documents 
Returned Policy_documents: 0
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
===
 search policy_documents where FOR.name="0201 Astronomical and Space Sciences" return policy_documents 
Returned Policy_documents: 20 (total = 120)
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


**Testing cancer types classifications**

looks good, now there is also `category_icrp_ct`

In [22]:
%dsldocs policy_documents

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,policy_documents,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
1,policy_documents,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
2,policy_documents,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
3,policy_documents,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True
4,policy_documents,category_hrcs_rac,categories,`HRCS – Research Activity Codes <https://app.d...,True,True,True
5,policy_documents,category_icrp_cso,categories,`ICRP Common Scientific Outline <https://app.d...,True,True,True
6,policy_documents,category_icrp_ct,categories,`ICRP Cancer Types <https://app.dimensions.ai/...,True,True,True
7,policy_documents,category_rcdc,categories,"`Research, Condition, and Disease Categorizati...",True,True,True
8,policy_documents,date_inserted,date,Date when the record was inserted into Dimensi...,True,False,False
9,policy_documents,id,string,Dimensions policy document ID,True,False,False


In [25]:
fields = ["category_icrp_ct"]
sources = ['policy_documents']

In [29]:
for x in sources:
    print("===\n", x)
    for f in fields:
        q = f"""search {x} where {f} is not empty return {x} """
        print("----\n", q)
        data = dsl.query(q, verbose=True)
        q = f"""search {x} where {f}.name="Blood Cancer" return {x} """
        print("----\n", q)
        data = dsl.query(q, verbose=True)
        q = f"""search {x} where {f}.name="3.4 Vaccines" return {x} """
        print("----\n", q)
        data = dsl.query(q, verbose=True)
        q = f"""search {x} where {f} is not empty return  {f} limit 100 """
        print("----\n", q)
        data = dsl.query(q, verbose=True)
        print(data.as_dataframe())

===
 policy_documents
----
 search policy_documents where category_icrp_ct is not empty return policy_documents 
Returned Policy_documents: 20 (total = 15880)
----
 search policy_documents where category_icrp_ct.name="Blood Cancer" return policy_documents 
Returned Policy_documents: 1 (total = 1)
----
 search policy_documents where category_icrp_ct.name="3.4 Vaccines" return policy_documents 
Returned Policy_documents: 0
----
 search policy_documents where category_icrp_ct is not empty return  category_icrp_ct limit 100 
Returned Category_icrp_ct: 36
    count    id                                               name
0   10672  3816                           Not Site-Specific Cancer
1    2742  3809                                        Lung Cancer
2    1550  3793                            Colon and Rectal Cancer
3    1083  3796                    Esophageal / Oesophageal Cancer
4     963  3822                                  Pharyngeal Cancer
5     946  3806                          

## [DSL-296] Add public NIH metadata to API

https://uberresearch.atlassian.net/browse/DSL-296

In [45]:
x = "foa_number"
q = [
    # counting how many have no value
    f"search grants where {x} is empty return grants[id+{x}] limit 5" ,
    # counting how many have some value - including the string "Not available"
    f"search grants where {x} is not empty return grants[id+{x}+funding_org_name+active_year] limit 5" ,
    # counting years
    f"search grants where {x} is not empty return active_year limit 50" ,
    # counting how many have the wrong "Not available" value
    f"""search grants where {x} = "Not available" return grants[id+{x}+funding_org_name+active_year] limit 5""" ,
    # counting how many have proper good values
    f"""search grants where {x} is not empty and {x} != "Not available" return grants[id+{x}+funding_org_name+active_year] limit 20""",
    # search for valid ID
    f"""search grants where {x} = "PA-18-670" return grants[id+{x}+funding_org_name+active_year] limit 20""" ,
]

for el in q: 
    print("New query...\n> " + el + "\n")
    data = dsl.query(el)
    print(data.as_dataframe())
    print("\n=======\n")
    

New query...
> search grants where foa_number is empty return grants[id+foa_number] limit 5

Returned Grants: 5 (total = 4017008)
              id
0  grant.8587603
1  grant.8586109
2  grant.8558602
3  grant.8671733
4  grant.8672077


New query...
> search grants where foa_number is not empty return grants[id+foa_number+funding_org_name+active_year] limit 5

Returned Grants: 5 (total = 992720)
                            active_year     foa_number  \
0                          [2020, 2021]  Not available   
1  [2020, 2021, 2022, 2023, 2024, 2025]  Not available   
2  [2020, 2021, 2022, 2023, 2024, 2025]  Not available   
3  [2020, 2021, 2022, 2023, 2024, 2025]  Not available   
4                    [2020, 2021, 2022]  Not available   

                                    funding_org_name             id  
0  Directorate for Social, Behavioral & Economic ...  grant.8674931  
1                        Directorate for Engineering  grant.8674911  
2   Directorate for Mathematical & Physical S

## [DSL-318] Deprecate patents.funder_groups and clinical_trials.funder_groups

https://uberresearch.atlassian.net/browse/DSL-318

In [47]:
%dsldf search patents where funder_groups is not empty return patents limit 5

Returned Patents: 5 (total = 175144)
Field 'funder_groups' is deprecated. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,assignee_names,assignees,filing_status,granted_year,id,inventor_names,publication_date,times_cited,title,year
0,"[Wisconsin Alumni Research Foundation, WISCONS...","[{'id': 'grid.453773.1', 'name': 'Wisconsin Al...",Grant,2003.0,US-PP14225-P2,"[Brent H. McCown, Eric L. Zeldin, Peter Normin...",2003-10-14,1.0,Cranberry variety named ‘HyRed’,2002
1,"[Princeton University, UNIV PRINCETON]","[{'id': 'grid.16750.35', 'name': 'Princeton Un...",Application,,US-20020146797-A1,[Kevan Shokat],2002-10-10,,Engineered protein kinases which can utilize m...,2001
2,"[Beth Israel Deaconess Medical Center Inc, BET...","[{'id': 'grid.239395.7', 'name': 'Beth Israel ...",Grant,2003.0,US-6667388-B2,"[Kiflai Bein, Michael Simons]",2003-12-23,5.0,Peptide inhibitor of MMP activity and angiogen...,2001
3,"[University of California, UNIV CALIFORNIA]","[{'id': 'grid.30389.31', 'name': 'University o...",Grant,2003.0,US-6656390-B2,"[Lin Song Li, Quanxi Jia]",2003-12-02,,Preparation of energy storage materials,2003
4,[University of Alabama at Birmingham Research ...,"[{'id': 'grid.265892.2', 'name': 'University o...",Grant,2004.0,US-6743893-B2,"[Jeffrey A. Engler, Jae Hwy Lee, James F. Coll...",2004-06-01,27.0,Receptor-mediated uptake of peptides that bind...,2001


In [48]:
%dsldf search clinical_trials where funder_groups is not empty return clinical_trials limit 5

Returned Clinical_trials: 5 (total = 45962)
Field 'funder_groups' is deprecated. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,active_years,id,investigator_details,title
0,"[2005, 2006, 2007, 2008, 2009, 2010]",NCT00249756,"[[Stanley Sacks, PhD, Principal Investigator, ...",Re-Entry MTC for Offenders With MICA Disorders
1,"[2005, 2006, 2007]",NCT00250185,,The Role of Nitrite in Preconditioning Mediate...
2,"[2005, 2006]",NCT00250198,,"A Randomized, Double-blind, Pilot Study of the..."
3,"[2004, 2005, 2006, 2007, 2008, 2009, 2010]",NCT00250484,"[[Steven D Freedman, PhD, MD, Principal Invest...",The Effect of 10-Day Treatment of Repetitive T...
4,"[2003, 2004, 2005, 2006, 2007, 2008, 2009, 201...",NCT00250523,"[[Jason L Sperry, MD, Principal Investigator, ...",Mathematical Modeling of the Acute Inflammator...


## [DSL-307] Improve warnings on sub-query errors

https://uberresearch.atlassian.net/browse/DSL-307

Show message for entities: `journals`, `orgs`, `cities`

In [53]:
%dsl search publications where research_orgs.types="Company" 

Returned Publications: 20 (total = 736654)
Please review your query, as it contains an entity filter (research_orgs.types) that can lead to incomplete results. More details on https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields


<dimcli.Dataset object #4578230672. Records: 20/736654>

In [52]:
%dsl search publications where journal.title="Company"

Returned Publications: 20 (total = 1349)
Please review your query, as it contains an entity filter (journal.title) that can lead to incomplete results. More details on https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields


<dimcli.Dataset object #4577708368. Records: 20/1349>

In [55]:
%dsl search publications where research_org_cities.name="Company"

Returned Publications: 0
Please review your query, as it contains an entity filter (research_org_cities.name) that can lead to incomplete results. More details on https://docs.dimensions.ai/dsl/language.html#literal-fields-vs-entity-fields


<dimcli.Dataset object #4578234128. Records: 0/0>

Do not show message for entities: `states`, `open_access`, `countries`, `categories`, `org groups`  

In [57]:
%dsl search publications where research_org_state_codes.name="Company" 

Returned Publications: 0


<dimcli.Dataset object #4578351056. Records: 0/0>

In [59]:
%dsl search publications where open_access_categories.name="All" 

Returned Publications: 20 (total = 28312868)


<dimcli.Dataset object #4577655696. Records: 20/28312868>

In [61]:
%dsl search publications where research_org_countries.name="All" 

Returned Publications: 0


<dimcli.Dataset object #4577345168. Records: 0/0>

In [62]:
%dsl search publications where category_for.name="All" 

Returned Publications: 0


<dimcli.Dataset object #4577674000. Records: 0/0>

## [DSL-337] Remove _email field from describe

https://uberresearch.atlassian.net/browse/DSL-337

In [128]:
%dsldocs researchers

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,researchers,current_research_org,organizations,The most recent research organization linked t...,True,True,True
1,researchers,first_grant_year,integer,First year the researcher was awarded a grant.,True,False,True
2,researchers,first_name,string,First Name.,True,False,False
3,researchers,first_publication_year,integer,,True,False,True
4,researchers,id,string,Dimensions researcher ID.,True,False,False
5,researchers,last_grant_year,integer,Last year the researcher was awarded a grant.,True,False,True
6,researchers,last_name,string,Last Name.,True,False,False
7,researchers,last_publication_year,integer,,True,False,True
8,researchers,nih_ppid,string,"The PI Profile ID (i.e., ppid) is a Researcher...",True,False,False
9,researchers,obsolete,integer,Indicates researcher ID status. 0 means that t...,True,False,False


## [DSL-302] Empty results should be returned as null: Phase 2 - nested entity fields

https://uberresearch.atlassian.net/browse/DSL-302

In [150]:
dsl.query("""set return_all_keys search policy_documents where id = "policy.432137" return policy_documents[publisher_org+id]""").json

Returned Policy_documents: 1 (total = 1)


{'_stats': {'total_count': 1},
 'policy_documents': [{'id': 'policy.432137',
   'publisher_org': {'id': 'grid.454763.7',
    'linkout': ['https://www.government.nl'],
    'country_name': 'Netherlands',
    'state_name': None,
    'city_name': 'The Hague',
    'name': 'Government of Netherlands',
    'acronym': None,
    'types': ['Government'],
    'longitude': 4.316667,
    'latitude': 52.083332}}]}

In [151]:
dsl.query("""search policy_documents where id = "policy.432137" return policy_documents[publisher_org+id]""").json

Returned Policy_documents: 1 (total = 1)


{'_stats': {'total_count': 1},
 'policy_documents': [{'id': 'policy.432137',
   'publisher_org': {'id': 'grid.454763.7',
    'linkout': ['https://www.government.nl'],
    'country_name': 'Netherlands',
    'city_name': 'The Hague',
    'name': 'Government of Netherlands',
    'types': ['Government'],
    'longitude': 4.316667,
    'latitude': 52.083332}}]}

## [DSL-335] Improve error messages.

https://uberresearch.atlassian.net/browse/DSL-335

**1 - query taking too long to run**

In [148]:
dsl.query("""search publications where research_orgs="grid.9132.9" return publications [all] limit 1000 skip 0""")

Returned Errors: 1
1 EvaluationError found
The response generated by your query is too large, e.g. because it includes records with lots of data. Please review it by keeping in mind the guidelines on https://docs.dimensions.ai/dsl/1.22.0-preview/faq.html#queries-and-errors [code: 2]


<dimcli.Dataset object #4622459792. Errors: 1>

**2 - query returning a large response**

In [149]:
dsl.query("""search publications for "0000-0001-5770-4883" return publications [author_affiliations] limit 500 skip 0""")

Returned Errors: 1
1 EvaluationError found
The response generated by your query is too large, e.g. because it includes records with lots of data. Please review it by keeping in mind the guidelines on https://docs.dimensions.ai/dsl/1.22.0-preview/faq.html#queries-and-errors [code: 2]


<dimcli.Dataset object #4622474512. Errors: 1>

**3 - query which is too long**

Query with lots of keywords fails with over 10k chars...

In [144]:
q = """search publications for "comprehensive taxonomic revision OR taxonomic revision OR revision OR Acacia OR China OR indigenous species OR species OR critical reassessment OR reassessment OR segregates OR profile OR line drawings OR distribution maps OR maps OR photographs OR plants OR genus OR introduction OR countries OR Britton OR Rose OR Wight OR Arn OR identification key OR key OR Sect OR Maslin OR consequences OR study OR number OR new species OR Senegalia OR new combination OR combination OR Seigler OR Ebinger OR new records OR records OR lectotype OR lectotypification OR Franch OR Kurz OR Hayata OR ex Benth OR Benth OR neotype OR comprehensive taxonomic revision OR taxonomic revision OR revision OR Acacia OR China OR indigenous species OR species OR critical reassessment OR reassessment OR segregates OR profile OR line drawings OR distribution maps OR maps OR photographs OR plants OR genus OR introduction OR countries OR Britton OR Rose OR Wight OR Arn OR identification key OR key OR Sect OR Maslin OR consequences OR study OR number OR new species OR Senegalia OR new combination OR combination OR Seigler OR Ebinger OR new records OR records OR lectotype OR lectotypification OR Franch OR Kurz OR Hayata OR ex Benth OR Benth OR neotype OR area OR species-richness OR richness OR Asia OR high degree OR degree OR endemism OR Yunnan OR most species OR diverse provinces OR Province OR species of Acacia OR Vachelli OR patient conversations OR patient values OR serious illness OR illness OR better outcomes OR outcomes OR cancer care OR care OR efficacy OR quality-improvement interventions OR improvement interventions OR intervention OR occurrence OR timing OR quality OR serious illness conversations OR oncology clinicians OR clinicians OR patients OR advanced cancer OR cancer OR setting OR participants OR cluster randomized clinical trial OR randomized clinical trial OR clinical trials OR trials OR outpatient oncology OR oncology OR Dana-Farber Cancer Institute OR Cancer Institute OR Institute OR physicians OR practice clinicians OR high risk OR risk OR death OR MAIN OUTCOME OR measures OR primary outcome OR Secondary outcomes OR documentation OR initial conversation OR electronic medical records OR medical records OR records OR study OR age OR years OR women OR practice OR females OR medical record review OR record review OR review OR patient death OR higher proportion OR proportion OR intervention patients OR control OR intervention conversations OR median OR months OR greater focus OR prognosis OR illness understanding OR life OR treatment preferences OR end OR life care planning OR care planning OR arm OR more intervention patients OR conclusion OR relevance OR knowledge OR first such study OR such studies OR improvement OR registration OR Improve Communication OR protonproton collisions OR proton collisions OR collisions OR ATLAS detector OR detector OR Large Hadron Collider OR Hadron Collider OR Collider OR center OR mass energy OR energy OR luminosity OR FB OR leptons OR electrons OR muons OR electroweak production OR bosons OR observed significance OR deviation OR fiducial cross-sections OR cross-section OR sections OR interference effects OR effects OR leptonic decay modes OR decay modes OR mode OR EW OR differential fiducial cross-sections OR sum OR electroweak OR strong production OR kinematic observables OR observables OR boson pair production OR pair production OR pp collisions OR TeV OR intrinsic porous structure OR porous structure OR structure OR unique chemical versatility OR chemical versatility OR versatility OR application opportunities OR MOF membranes OR membrane OR research enthusiasm OR ceramic substrate OR substrate OR flexibility OR synthesis OR ZIF-8 membranes OR polyethersulfone hollow fibers OR hollow fibers OR fibers OR facile seed OR method OR resultant membranes OR electron microscopy OR microscopy OR energy dispersive X OR dispersive X OR rays OR ray diffraction OR diffraction OR Fourier OR infrared spectroscopy OR spectroscopy OR prepared membranes OR compact structure OR performance OR H2 OR CO2 OR N2 OR CH4 OR H2 permeance OR permeance OR preferential adsorption OR adsorption OR composite membranes OR high permeance OR potential applications OR applications OR efficient CO2 capture OR CO2 capture OR capture OR industrial mixtures OR mixture OR gas permeation OR permeation OR amount OR ZIF-8 crystals OR crystals OR membrane thickness OR thickness OR reproducibility OR seeding method OR strong adhesion OR adhesion OR ZIF-8 OR PESF OR excellent stability OR stability OR good reproducibility OR simple seed OR onset OR puberty OR increase OR secretion OR hypothalamic gonadotrophin OR gonadotrophin OR hormone OR action OR development OR stimulatory inputs OR release OR gradual decrease OR decrease OR inhibitory inputs OR peptides OR pubertal onset OR dynorphin OR medial basal hypothalamus OR basal hypothalamus OR hypothalamus OR neurokinin B OR puberty approaches OR candidates OR role OR initial study OR study OR acute effects OR effects OR agonists OR senktide OR MBH tissue OR tissue OR vitro OR central injection OR injection OR animals OR days OR assessment OR vitro secretion OR GnRH OR insulin-like growth factor OR like growth factor OR growth factor OR factors OR important role OR additional animals OR administration OR receptor protein OR protein OR receptors OR GnRH secretion OR tissue incubates OR incubates OR chronic studies OR central administration OR identification algorithm OR algorithm OR collisions OR ATLAS experiment OR experiments OR set of techniques OR set OR technique OR jet shape observables OR tagger OR use OR physics analysis OR utility of combinations OR utility OR combination OR decision tree OR trees OR deep neural networks OR neural network OR network OR two-variable combination OR variable combinations OR quark tagging OR tagging OR constituent inputs OR input OR re-optimization OR optimization OR deconstruction techniques OR data OR event topology OR topology OR atlases OR cross-section OR sections OR top-quark pair OR quark pairs OR pairs OR photons OR proton-proton collision data OR collision data OR luminosity OR fb-1 OR ATLAS detector OR detector OR LHC OR center OR mass energy OR energy OR TeV. OR measurements OR dilepton final states OR final state OR state OR fiducial volume OR leptons OR jet OR signal OR background OR fiducial cross-sections OR FB OR dilepton channel OR channels OR function OR photon transverse momentum OR transverse momentum OR momentum OR absolute pseudorapidity OR pseudorapidity OR angular distance OR distance OR azimuthal opening angle OR opening angle OR angle OR pseudorapidity difference OR agreement OR theoretical predictions OR prediction OR differential fiducial cross-sections OR leptonic final states OR atlases OR new particles OR particles OR pairs OR top quark OR quarks OR proton-proton collision data OR collision data OR ATLAS detector OR detector OR Large Hadron Collider OR Hadron Collider OR Collider OR center OR mass energy OR energy OR TeV OR luminosity OR fb-1 OR top-quark pair production OR quark pair production OR pair production OR hadronic decay modes OR decay modes OR mode OR high transverse momentum jets OR momentum jets OR jet OR hadrons OR analysis techniques OR technique OR reconstruction OR different kinematic regimes OR kinematic regime OR regime OR search sensitivity OR sensitivity OR hypothetical particles OR wide mass range OR mass range OR range OR invariant mass distribution OR mass distribution OR distribution OR quark candidates OR candidates OR resonant production OR spin OR decay widths OR width OR significant deviations OR deviation OR standard model prediction OR model predictions OR prediction OR limit OR production cross-section times OR cross-section time OR section times OR time OR fraction OR boson OR bosons OR Kaluza-Klein gravitons OR gravitons OR Kaluza-Klein gluons OR gluons OR production cross sections OR cross sections OR sections OR technicolor models OR model OR mass OR simplified framework OR heavy particles OR top-quark pair OR quark pairs OR hadronic final states OR final state OR state OR pp collisions OR collisions OR taxonomic revision OR revision OR Acacia OR China OR indigenous species OR species OR critical reassessment OR reassessment OR segregates OR profile OR line drawings OR distribution maps OR maps OR photographs OR plants OR genus OR introduction OR countries OR Britton OR Rose OR Wight OR Arn OR identification key OR key OR Sect OR Maslin OR consequences OR study OR number OR new species OR Senegalia OR new combination OR combination OR Seigler OR Ebinger OR new records OR records OR lectotype OR lectotypification OR Franch OR Kurz OR Hayata OR ex Benth OR Benth OR neotype OR comprehensive taxonomic revision OR taxonomic revision OR revision OR Acacia OR China OR indigenous species OR species OR critical reassessment OR reassessment OR segregates OR profile OR line drawings OR distribution maps OR maps OR photographs OR plants OR genus OR introduction OR countries OR Britton OR Rose OR Wight OR Arn OR identification key OR key OR Sect OR Maslin OR consequences OR study OR number OR new species OR Senegalia OR new combination OR combination OR Seigler OR Ebinger OR new records OR records OR lectotype OR lectotypification OR Franch OR Kurz OR Hayata OR ex Benth OR Benth OR neotype OR area OR species-richness OR richness OR Asia OR high degree OR degree OR endemism OR Yunnan OR most species OR diverse provinces OR Province OR species of Acacia OR Vachelli OR patient conversations OR patient values OR serious illness OR illness OR better outcomes OR outcomes OR cancer care OR care OR efficacy OR quality-improvement interventions OR improvement interventions OR intervention OR occurrence OR timing OR quality OR serious illness conversations OR oncology clinicians OR clinicians OR patients OR advanced cancer OR cancer OR setting OR participants OR cluster randomized clinical trial OR randomized clinical trial OR clinical trials OR trials OR outpatient oncology OR oncology OR Dana-Farber Cancer Institute OR Cancer Institute OR Institute OR physicians OR practice clinicians OR high risk OR risk OR death OR MAIN OUTCOME OR measures OR primary outcome OR Secondary outcomes OR documentation OR initial conversation OR electronic medical records OR medical records OR records OR study OR age OR years OR women OR practice OR females OR medical record review OR record review OR review OR patient death OR higher proportion OR proportion OR intervention patients OR control OR intervention conversations OR median OR months OR greater focus OR prognosis OR illness understanding OR life OR treatment preferences OR end OR life care planning OR care planning OR arm OR more intervention patients OR conclusion OR relevance OR knowledge OR first such study OR such studies OR improvement OR registration OR Improve Communication OR protonproton collisions OR proton collisions OR collisions OR ATLAS detector OR detector OR Large Hadron Collider OR Hadron Collider OR Collider OR center OR mass energy OR energy OR luminosity OR FB OR leptons OR electrons OR muons OR electroweak production OR bosons OR observed significance OR deviation OR fiducial cross-sections OR cross-section OR sections OR interference effects OR effects OR leptonic decay modes OR decay modes OR mode OR EW OR differential fiducial cross-sections OR sum OR electroweak OR strong production OR kinematic observables OR observables OR boson pair production OR pair production OR pp collisions OR TeV OR intrinsic porous structure OR porous structure OR structure OR unique chemical versatility OR chemical versatility OR versatility OR application opportunities OR MOF membranes OR membrane OR research enthusiasm OR ceramic substrate OR substrate OR flexibility OR synthesis OR ZIF-8 membranes OR polyethersulfone hollow fibers OR hollow fibers OR fibers OR facile seed OR method OR resultant membranes OR electron microscopy OR microscopy OR energy dispersive X OR dispersive X OR rays OR ray diffraction OR diffraction OR Fourier OR infrared spectroscopy OR spectroscopy OR prepared membranes OR compact structure OR performance OR H2 OR CO2 OR N2 OR CH4 OR H2 permeance OR permeance OR preferential adsorption OR adsorption OR composite membranes OR high permeance OR potential applications OR applications OR efficient CO2 capture OR CO2 capture OR capture OR industrial mixtures OR mixture OR gas permeation OR permeation OR amount OR ZIF-8 crystals OR crystals OR membrane thickness OR thickness OR reproducibility OR seeding method OR strong adhesion OR adhesion OR ZIF-8 OR PESF OR excellent stability OR stability OR good reproducibility OR simple seed OR onset OR puberty OR increase OR secretion OR hypothalamic gonadotrophin OR gonadotrophin OR hormone OR action OR development OR stimulatory inputs OR release OR gradual decrease OR decrease OR inhibitory inputs OR peptides OR pubertal onset OR dynorphin OR medial basal hypothalamus OR basal hypothalamus OR hypothalamus OR neurokinin B OR puberty approaches OR candidates OR role OR initial study OR study OR acute effects OR effects OR agonists OR senktide OR MBH tissue OR tissue OR vitro OR central injection OR injection OR animals OR days OR assessment OR vitro secretion OR GnRH OR insulin-like growth factor OR like growth factor OR growth factor OR factors OR important role OR additional animals OR administration OR receptor protein OR protein OR receptors OR GnRH secretion OR tissue incubates OR incubates OR chronic studies OR central administration OR identification algorithm OR algorithm OR collisions OR ATLAS experiment OR experiments OR set of techniques OR set OR technique OR jet shape observables OR tagger OR use OR physics analysis OR utility of combinations OR utility OR combination OR decision tree OR trees OR deep neural networks OR neural network OR network OR two-variable combination OR variable combinations OR quark tagging OR tagging OR constituent inputs OR input OR re-optimization OR optimization OR deconstruction techniques OR data OR event topology OR topology OR atlases OR cross-section OR sections OR top-quark pair OR quark pairs OR pairs OR photons OR proton-proton collision data OR collision data OR luminosity OR fb-1 OR ATLAS detector OR detector OR LHC OR center OR mass energy OR energy OR TeV. OR measurements OR dilepton final states OR final state OR state OR fiducial volume OR leptons OR jet OR signal OR background OR fiducial cross-sections OR FB OR dilepton channel OR channels OR function OR photon transverse momentum OR transverse momentum OR momentum OR absolute pseudorapidity OR pseudorapidity OR angular distance OR distance OR azimuthal opening angle OR opening angle OR angle OR pseudorapidity difference OR agreement OR theoretical predictions OR prediction OR differential fiducial cross-sections OR leptonic final states OR atlases OR new particles OR particles OR pairs OR top quark OR quarks OR proton-proton collision data OR collision data OR ATLAS detector OR detector OR Large Hadron Collider OR Hadron Collider OR Collider OR center OR mass energy OR energy OR TeV OR luminosity OR fb-1 OR top-quark pair production OR quark pair production OR pair production OR hadronic decay modes OR decay modes OR mode OR high transverse momentum jets OR momentum jets OR jet OR hadrons OR analysis techniques OR technique OR reconstruction OR different kinematic regimes OR kinematic regime OR regime OR search sensitivity OR sensitivity OR hypothetical particles OR wide mass range OR mass range OR range OR invariant mass distribution OR mass distribution OR distribution OR quark candidates OR candidates OR resonant production OR spin OR decay widths OR width OR significant deviations OR deviation OR standard model prediction OR model predictions OR prediction OR limit OR production cross-section times OR cross-section time OR section times OR time OR fraction OR boson OR bosons OR Kaluza-Klein gravitons OR gravitons OR Kaluza-Klein gluons OR gluons OR production cross sections OR cross sections OR sections OR technicolor models OR model OR mass OR simplified framework OR heavy particles OR top-quark pair OR quark pairs OR hadronic final states OR final state OR state OR pp collisions OR collisions" return publications """
print(len(q))

16373


In [145]:
dsl.query(q)

Returned Errors: 1
1 EvaluationError found
Your query is too long e.g. because it includes too many filters. Please review it by keeping in mind the guidelines on https://docs.dimensions.ai/dsl/1.22.0-preview/faq.html#queries-and-errors [code: 3]


<dimcli.Dataset object #4622434704. Errors: 1>

Trying a query with lots of where statements - seems to fails after 8k chars

In [146]:
years_clause = " or ".join([ " year = %s" % y for y in  [x for x in range(1500, 2000)]])
q = f"""search publications where {years_clause} return publications"""
print(len(q))

8042


In [147]:
dsl.query(q)

Returned Errors: 1
1 QueryError found
Your query is too long e.g. because it includes too many filters. Please review it by keeping in mind the guidelines on https://docs.dimensions.ai/dsl/1.22.0-preview/faq.html#queries-and-errors [code: 6]


<dimcli.Dataset object #4622475024. Errors: 1>

## [DSL-297] Support Researchers search in Patents

https://uberresearch.atlassian.net/browse/DSL-297