# Starting out with Dimensions #

Dimcli is a Python client for accessing the Dimensions API, which helps authenticate into the API and handle query requests/responses.

In [1]:
# necessary imports
import dimcli
import pandas as pd

# visualization libraries
import seaborn as sns
import matplotlib as plt

In [24]:
# visualization settings
sns.set_style("white")
sns.set(rc={'figure.figsize':(12, 10)})

In [3]:
# get access with config file. resource: https://digital-science.github.io/dimcli/getting-started.html# 
dimcli.login()

[2mSearching config file credentials for default 'live' instance..[0m


[2mDimcli - Dimensions API Client (v1.2)[0m
[2mConnected to: <https://app.dimensions.ai/api/dsl> - DSL v2.9[0m
[2mMethod: dsl.ini file[0m


In [4]:
dsl = dimcli.Dsl()

# An Overview of Using Dimcli to Access Dimensions Data

One way to query data is using the dimcli instance to make requests. 

In [None]:
dsl.query("search publications for \"malaria\" return publications limit 1").data # use data attribute to get JSON of info

Another way to query information is to use Dimcli magic commands, created specifically for the Jupyter environment, that run faster. This specific command returns a pandas dataframe.

In [None]:
%%dsldf

search publications 
    for "malaria"
return publications [title + research_orgs + research_org_names + times_cited + funders]
limit 5

In [None]:
# all magic commands result stored in..
dsl_last_results.head()

# Publications

Let's start querying to do some exploratory data analysis on research done at UCSD. 

In [5]:
# set some variables here to make it easier to query
GRIDID = "grid.266100.3" # Grid ID for University of California, San Diego
FIELDS = "title+type+year+journal+authors+research_orgs+research_org_names+publisher+times_cited+funders"

In [6]:
query = f"""search publications in title_abstract_only
            for "artificial intelligence" 
            where research_orgs.id="{GRIDID}" 
            return publications[{FIELDS}] sort by year limit 645"""
result = dsl.query(query)

Returned Publications: 645 (total = 645)
[2mTime: 2.29s[0m
Field current_organization_id of the authors field is deprecated and will be removed in the next major release.


In [7]:
# turn result from Dimcli query into pandas dataframe
pubs = result.as_dataframe()
pubs.head()

Unnamed: 0,title,authors,research_org_names,research_orgs,times_cited,type,year,journal.id,journal.title,publisher
0,Pediatric endoscopy: how can we improve patien...,"[{'affiliations': [{'city': 'Boston', 'city_id...","[University of Toronto, Rady Children's Hospit...","[{'city_name': 'San Diego', 'country_code': 'U...",0,article,2024,jour.1037145,Expert Review of Gastroenterology & Hepatology,
1,"P146: BeginNGS, an artificial intelligence-ena...","[{'affiliations': [{'city': None, 'city_id': N...","[Fabric Genomics, Illumina (United States), Ge...","[{'acronym': 'UCSD', 'city_name': 'San Diego',...",0,article,2024,jour.1452539,Genetics in Medicine Open,Elsevier
2,Impact of artificial intelligence arrhythmia m...,"[{'affiliations': [{'city': 'San Diego', 'city...","[University of California, San Diego, VA San D...","[{'acronym': 'UCSD', 'city_name': 'San Diego',...",0,article,2024,jour.1094057,Journal of Cardiovascular Electrophysiology,Wiley
3,A scoping review of artificial intelligence in...,"[{'affiliations': [{'city': 'Preston', 'city_i...","[Maastricht University, University of Californ...","[{'acronym': 'UM', 'city_name': 'Ann Arbor', '...",0,article,2024,jour.1090373,Medical Teacher,Taylor & Francis
4,Transforming Big Data into AI‐ready data for n...,"[{'affiliations': [{'city': 'West Point', 'cit...","[University of Alabama, Stevens Institute of T...","[{'acronym': 'UNC', 'city_name': 'Chapel Hill'...",0,article,2024,jour.1036439,Obesity,Wiley


## Data Cleaning and Reformatting

After querying from Dimensions, let's organize our data so it is easier to work with. 

In [None]:
# let's do some data processing and reformat the data
# the funders information is given a list of dictionaries, each dictionary represents the info for a funder
pubs['funder_name'] = pubs['funders'].apply(lambda x : [] if isinstance(x, float)
                                            else [dict['name'] for dict in x])

## Publishers

In [None]:
# create dataframe to create an order to sort visualization of publishers
count_publisher = pubs['publisher'].value_counts().reset_index()
# count_publisher = count_publisher[count_publisher['publisher'] > 4] # don't include publishers with <4 UCSD publications
count_publisher.head()

After pulling publications from University of California, San Diego which are about artificial intelligence, let's visualize which journals people are publishing in and which publishers. 

In [None]:
plt.pie(count_publisher['publisher'], labels=count_publisher['index'])
plt.show()

In [None]:
sns.barplot(count_publisher, x="publisher", y="index", order=count_publisher['index'], estimator=)
plt.xlabel('Publication Count')
plt.ylabel('Publisher')
plt.show()

Which publishers does UCSD have contracts with? .......

# Datasets

At UCSD, where are people publishing datasets? 

In [None]:
%dsldf search datasets for "machine learning" return datasets

In [None]:
# normal query
res = dsl.query("search publications for \"artificial-intelligence\" return researchers") # return DslDataset object
[x['first_name'] + " " + x['last_name'] for x in res['researchers']]