# QA of DSL Release v1.23
The purpose of this notebook is to review and test the new features in 1.23

Docs: https://docs.dimensions.ai/dsl/1.23.0-preview/


## Prerequisites

Please install the latest versions of these libraries to run this notebook. 

In [1]:
#
# load common libraries
import pandas as pd
from pandas.io.json import json_normalize

import time
import json
from tqdm.notebook import tqdm as progress

import plotly.express as px
from plotly.offline import plot 

import dimcli
from dimcli.shortcuts import *

dimcli.login(instance="test")
dsl = dimcli.Dsl() 


DimCli v0.6.3 - Succesfully connected to <https://integration.ds-metrics.com> (method: dsl.ini file)


---

## [DSL-345] Add secret .restricted_to field as PF-01

https://uberresearch.atlassian.net/browse/DSL-345

* works ok, with the exception of researchers and organizations

In [25]:
q = """ search {} where pf01 is not empty return {}[id+pf01] limit 1"""

for source in dimcli.G.sources():
    data = dsl.query(q.format(source, source))
    if not data.errors:
        print(getattr(data, source)[0])
                    

Returned Publications: 1 (total = 12682329)
{'id': 'pub.1123466816', 'pf01': 'SN'}
Returned Grants: 1 (total = 586742)
{'id': 'grant.8104257', 'pf01': ['SN']}
Returned Patents: 1 (total = 359097)
{'id': 'EP-2131403-A1', 'pf01': ['SN']}
Returned Clinical_trials: 1 (total = 37145)
{'id': 'NCT00249951', 'pf01': ['SN']}
Returned Policy_documents: 1 (total = 20437)
{'id': 'policy.643831', 'pf01': 'SN'}
Returned Errors: 1
Semantic Error
Semantic errors found:
	Field 'pf01' is not present in source 'researchers'. Available fields are: current_research_org,first_grant_year,first_name,first_publication_year,id,last_grant_year,last_name,last_publication_year,nih_ppid,obsolete,orcid_id,redirect,research_orgs,total_grants,total_publications
Returned Errors: 1
Semantic Error
Semantic errors found:
	Field 'pf01' is not present in source 'organizations'. Available fields are: acronym,city_name,cnrs_ids,country_name,established,external_ids_fundref,hesa_ids,id,ificlaims_ids,isni_ids,latitude,linkout,l

#### Testing that the field is not returned by 'describe'

In [26]:
%dsldocs grants

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,grants,abstract,string,Abstract or summary from a grant proposal.,False,False,False
1,grants,active_year,integer,List of active years for a grant.,True,False,True
2,grants,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
3,grants,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
4,grants,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
5,grants,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True
6,grants,category_hrcs_rac,categories,`HRCS – Research Activity Codes <https://app.d...,True,True,True
7,grants,category_icrp_cso,categories,`ICRP Common Scientific Outline <https://app.d...,True,True,True
8,grants,category_icrp_ct,categories,`ICRP Cancer Types <https://app.dimensions.ai/...,True,True,True
9,grants,category_rcdc,categories,"`Research, Condition, and Disease Categorizati...",True,True,True


#### verifying that 'all' does not return the secret field.. 

In [30]:
q = """ search {} where pf01 is not empty return {}[all] limit 1"""

for source in dimcli.G.sources() + ['datasets']:
    data = dsl.query(q.format(source, source))
    if not data.errors:
        record = getattr(data, source)[0]
        print("=== pf01 in fields?:", "pf01" in [x for x in record.keys()])
                    

Returned Publications: 1 (total = 12682329)
Field 'terms' is deprecated in favor of concepts. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'author_affiliations' is deprecated in favor of authors. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'FOR_first' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenot

IndexError: list index out of range

## DSL-343 Deprecate grant.title_language in favor of grant.language_title 

https://uberresearch.atlassian.net/browse/DSL-343

In [22]:
%dsldf search grants where title_language is not empty return grants[id+title_language] limit 10

Returned Grants: 10 (total = 5037267)
Field 'title_language' is deprecated in favor of language_title. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,id,title_language
0,grant.8690978,en
1,grant.8715161,en
2,grant.8688630,en
3,grant.8688908,en
4,grant.8689245,en
5,grant.8689399,en
6,grant.8690522,en
7,grant.8690709,en
8,grant.8690632,en
9,grant.8689727,en


In [23]:
%dsldf search grants where language_title is not empty return grants[id+language_title] limit 10

Returned Grants: 10 (total = 5037267)


Unnamed: 0,id,language_title
0,grant.8690978,en
1,grant.8715161,en
2,grant.8688630,en
3,grant.8688908,en
4,grant.8689245,en
5,grant.8689399,en
6,grant.8690522,en
7,grant.8690709,en
8,grant.8690632,en
9,grant.8689727,en


## DSL-330 grants.resulting_pub_ids

https://uberresearch.atlassian.net/browse/DSL-330

In [21]:
%dsldf search grants where resulting_publication_ids is not empty return grants

Returned Grants: 20 (total = 1381805)
Field 'resulting_publication_ids' is deprecated. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,active_year,end_date,funders,funding_org_name,id,language,original_title,project_num,start_date,start_year,title,title_language
0,"[2020, 2021, 2022]",2022-08-31,"[{'id': 'grid.270680.b', 'name': 'European Com...",European Commission,grant.8389274,en,Mechanisms that maintain centromere DNA repeat...,838793,2020-09-01,2020,Mechanisms that maintain centromere DNA repeat...,en
1,"[2020, 2021, 2022]",2022-12-31,"[{'id': 'grid.452988.a', 'name': 'Office of Ba...",Office of Basic Energy Sciences,grant.4319742,en,ACTIVATION OF HYDROGEN UNDER AMBIENT CONDITION...,DE-FG02-07ER46475,2020-01-01,2020,ACTIVATION OF HYDROGEN UNDER AMBIENT CONDITION...,en
2,"[2020, 2021, 2022, 2023, 2024]",2024-12-31,"[{'id': 'grid.452896.4', 'name': 'European Res...",European Research Council,grant.8104257,en,ENgineering FrustratiOn in aRtificial Colloida...,811234,2020-01-01,2020,ENgineering FrustratiOn in aRtificial Colloida...,en
3,"[2020, 2021, 2022]",2022-12-31,"[{'id': 'grid.453025.5', 'name': 'Office of Nu...",Office of Nuclear Physics,grant.4321720,en,Nuclei in a relativistic framework: at and bey...,DE-SC0013037,2020-01-01,2020,Nuclei in a relativistic framework: at and bey...,en
4,[2020],,"[{'id': 'grid.431143.0', 'name': 'National Hea...",National Health and Medical Research Council,grant.7879083,en,"Iron and Alzheimer's disease: new mechanisms, ...",1159403,2020-01-01,2020,"Iron and Alzheimer's disease: new mechanisms, ...",en
5,[2020],,"[{'id': 'grid.431143.0', 'name': 'National Hea...",National Health and Medical Research Council,grant.7879100,en,Social and behavioural disturbances in dementi...,1158762,2020-01-01,2020,Social and behavioural disturbances in dementi...,en
6,"[2019, 2020, 2021, 2022]",2022-12-14,"[{'id': 'grid.452988.a', 'name': 'Office of Ba...",Office of Basic Energy Sciences,grant.6505296,en,Semiconductor nanoshell quantum dots for energ...,DE-SC0016872,2019-12-15,2019,Semiconductor nanoshell quantum dots for energ...,en
7,"[2019, 2020, 2021, 2022]",2022-12-14,"[{'id': 'grid.452988.a', 'name': 'Office of Ba...",Office of Basic Energy Sciences,grant.4321680,en,"Spectroscopy, Kinetics, and Dynamics of Combus...",DE-SC0002123,2019-12-15,2019,"Spectroscopy, Kinetics, and Dynamics of Combus...",en
8,"[2019, 2020, 2021, 2022]",2022-12-14,"[{'id': 'grid.452963.f', 'name': 'Office of Bi...",Office of Biological and Environmental Research,grant.4319539,en,"Sectoral Interactions, Compounding Influences ...",DE-FG02-94ER61937,2019-12-15,2019,"Sectoral Interactions, Compounding Influences ...",en
9,"[2019, 2020, 2021, 2022]",2022-11-30,"[{'id': 'grid.452988.a', 'name': 'Office of Ba...",Office of Basic Energy Sciences,grant.4321931,en,"SUPERCONDUCTIVITY, MAGNETISM AND CORRELATED EL...",DE-FG02-04ER46105,2019-12-01,2019,"SUPERCONDUCTIVITY, MAGNETISM AND CORRELATED EL...",en


## [DSL-342] Deprecate *.category_ua in favor of *.category.uoa

https://uberresearch.atlassian.net/browse/DSL-342

##### publications

In [35]:
%dsldf search publications where category_ua is not empty return category_ua limit 5

Returned Category_ua: 5
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,count,id,name
0,12718005,30012,B12 Engineering
1,7582614,30001,A01 Clinical Medicine
2,5386256,30003,"A03 Allied Health Professions, Dentistry, Nurs..."
3,3421517,30011,B11 Computer Science and Informatics
4,2653665,30008,B08 Chemistry


In [36]:
%dsldf search publications where category_uoa is not empty return category_uoa limit 5

Returned Category_uoa: 5


Unnamed: 0,count,id,name
0,12718005,30012,B12 Engineering
1,7582614,30001,A01 Clinical Medicine
2,5386256,30003,"A03 Allied Health Professions, Dentistry, Nurs..."
3,3421517,30011,B11 Computer Science and Informatics
4,2653665,30008,B08 Chemistry


##### datasets

In [37]:
%dsldf search datasets where category_uoa is not empty return category_uoa limit 5

Returned Category_uoa: 5


Unnamed: 0,count,id,name
0,224696,30012,B12 Engineering
1,175022,30001,A01 Clinical Medicine
2,144379,30007,B07 Earth Systems and Environmental Sciences
3,139504,30008,B08 Chemistry
4,132183,30005,A05 Biological Sciences


In [38]:
%dsldf search datasets where category_ua is not empty return category_ua limit 5

Returned Errors: 1
Semantic Error
Semantic errors found:
	Field 'category_ua' is not present in source 'datasets'. Available fields are: associated_grant_ids,authors,category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_uoa,date,date_created,date_embargo,date_inserted,date_modified,description,doi,figshare_url,funder_countries,funders,id,journal,keywords,language_desc,language_title,license,pf01,publication_id,reference_ids,repository_id,research_org_cities,research_org_countries,research_org_states,research_orgs,researchers,title,year
	Facet 'category_ua' is not present in source 'datasets'. Available facets are: category_bra,category_for,category_hra,category_hrcs_hc,category_hrcs_rac,category_icrp_cso,category_icrp_ct,category_rcdc,category_uoa,funder_countries,funders,journal,keywords,language_desc,language_title,pf01,repository_id,research_org_cities,research_org_countries,research_org_states,research_or

## [DSL-353] Deprecate clinical_trials.organizations in favor of research_orgs

https://uberresearch.atlassian.net/browse/DSL-353

In [7]:
%dsldf search clinical_trials where organizations = "grid.48336.3a" return clinical_trials limit 1

Returned Clinical_trials: 1 (total = 13656)
Field 'organizations' is deprecated in favor of research_orgs. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details


Unnamed: 0,id,investigator_details,title
0,NCT00251329,"[[Sandra X Franco, MD, Principal Investigator,...",Phase II Neoadjuvant Trial of Docetaxel (Taxot...


In [8]:
%dsldf search clinical_trials where research_orgs = "grid.48336.3a" return clinical_trials limit 1

Returned Clinical_trials: 1 (total = 13656)


Unnamed: 0,id,investigator_details,title
0,NCT00251329,"[[Sandra X Franco, MD, Principal Investigator,...",Phase II Neoadjuvant Trial of Docetaxel (Taxot...


In [10]:
%dsldf search clinical_trials return organizations limit 1

Returned Organizations: 1


Unnamed: 0,acronym,count,country_name,id,name
0,NCI,13656,United States,grid.48336.3a,National Cancer Institute


In [9]:
%dsldf search clinical_trials return research_orgs limit 1

Returned Research_orgs: 1


Unnamed: 0,acronym,count,country_name,id,name
0,NCI,13656,United States,grid.48336.3a,National Cancer Institute


In [11]:
%dsldocs clinical_trials

Unnamed: 0,sources,field,type,description,is_filter,is_entity,is_facet
0,clinical_trials,abstract,string,Abstract or description of the clinical trial.,False,False,False
1,clinical_trials,acronym,string,Acronym of the clinical trial.,True,False,False
2,clinical_trials,active_years,integer,List of active years for a clinical trial.,True,False,True
3,clinical_trials,associated_grant_ids,string,Dimensions IDs of the grants associated to the...,True,False,False
4,clinical_trials,brief_title,string,Brief title of the clinical trial.,True,False,False
5,clinical_trials,category_bra,categories,`Broad Research Areas <https://app.dimensions....,True,True,True
6,clinical_trials,category_for,categories,`ANZSRC Fields of Research classification <htt...,True,True,True
7,clinical_trials,category_hra,categories,`Health Research Areas <https://app.dimensions...,True,True,True
8,clinical_trials,category_hrcs_hc,categories,`HRCS - Health Categories <https://app.dimensi...,True,True,True
9,clinical_trials,category_hrcs_rac,categories,`HRCS – Research Activity Codes <https://app.d...,True,True,True
