# Coronavirus clinical trials

This notebook shows how to generate a **Coronavirus** query for clinical trials similar to the one publicly accessible at https://covid-19.dimensions.ai.  


The search filter is 

```
'"2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR (("coronavirus"  OR "corona virus") AND (Wuhan OR China))'
```

So the full query is 

```
search clinical_trials 
        in full_data for "\"2019-nCoV\" OR \"COVID-19\" OR \"SARS-CoV-2\" OR ((\"coronavirus\"  OR \"corona virus\") AND (Wuhan OR China))" 
        where active_years=2020 
return clinical_trials limit 1000
```



> NOTE: the webapp has a facet *start year* which is not available in the API. So we are using **active years**

Once we have the query results, we also add an extra column with the full Dimensions URL for the object. 


## Prerequisites

In [1]:
# @markdown # Get the API library and login 
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai" #@param {type: "string"}

!pip install dimcli -U --quiet

from datetime import date

import dimcli
from dimcli.shortcuts import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()

[?25l[K     |█▍                              | 10kB 26.3MB/s eta 0:00:01[K     |██▉                             | 20kB 1.8MB/s eta 0:00:01[K     |████▎                           | 30kB 2.1MB/s eta 0:00:01[K     |█████▊                          | 40kB 1.7MB/s eta 0:00:01[K     |███████▏                        | 51kB 1.9MB/s eta 0:00:01[K     |████████▋                       | 61kB 2.2MB/s eta 0:00:01[K     |██████████                      | 71kB 2.4MB/s eta 0:00:01[K     |███████████▌                    | 81kB 2.6MB/s eta 0:00:01[K     |█████████████                   | 92kB 2.9MB/s eta 0:00:01[K     |██████████████▎                 | 102kB 2.8MB/s eta 0:00:01[K     |███████████████▊                | 112kB 2.8MB/s eta 0:00:01[K     |█████████████████▏              | 122kB 2.8MB/s eta 0:00:01[K     |██████████████████▋             | 133kB 2.8MB/s eta 0:00:01[K     |████████████████████            | 143kB 2.8MB/s eta 0:00:01[K     |█████████████████████▌    

## Query and download the data 


In [25]:
q = '"2019-nCoV" OR "COVID-19" OR "SARS-CoV-2" OR (("coronavirus"  OR "corona virus") AND (Wuhan OR China))'

data = dsl.query(f"""search clinical_trials 
        in full_data for "{dsl_escape(q)}" 
        where active_years=2020 return clinical_trials limit 1000""").as_dataframe()

data['dimensions_url'] = data['id'].apply(lambda x: dimensions_url(x, "clinical_trials"))   

today = date.today().strftime("%d-%m-%Y")
title = "clinical_trials_about_coronavirus-" + today
data.to_csv(title + ".csv")

Returned Clinical_trials: 61 (total = 61)


### Save the data to a new Google Sheet

This involves an authorization step from Google. The new spreadsheet URL will appear at the bottom. 

In [31]:
from google.colab import auth
auth.authenticate_user()

import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
import pandas as pd
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

sh = gc.create(title)
worksheet = gc.open(title).sheet1
set_with_dataframe(worksheet, data)
spreadsheet_url = "https://docs.google.com/spreadsheets/d/%s" % sh.id
print(spreadsheet_url)

https://docs.google.com/spreadsheets/d/1T0PpOyFZRNzAfbQSS3KrGZm_f4ghklHHzNrjDuz-w8w
