# Calling the openFDA API for AZ Drug Labels

## Objective
- To determine the average number of ingredients contained in AstraZeneca's medicines per year, using the data from the [OpenFDA API](https://open.fda.gov/apis/drug/label/).
- Use the field `spl_product_data_elements` for identifying a medicine's ingredients.

## Setup
1. Go to [openFDA Authentication page](https://open.fda.gov/apis/authentication/) and request your API key. It will be immediately sent to the email you provide.
2. Go to your email and retrieve your API key for openFDA.
3. Create a `.env` file in the current directory (relative to this .ipynb file).
4. In your `.env` file Write the following line `API_KEY=<<YOUR_KEY_ISSUED_BY_OPENFDA>>`
5. Create new python environment:
    - conda: `conda create -n openFDA python=3.8` -> `conda activate openFDA` (you can replace `openFDA` with any name you prefer)
    - virtualenv: `pip install virtualenv` -> `virtualenv openFDA` -> `openFDA\Scripts\activate` (Windows)| `source mypython/bin/activate` (MacOS/Linux)
6. Run `pip install -r requirements.txt` (can also run in conda `conda install -r requirements.txt`)
7. [Install MongoDB](https://docs.mongodb.com/manual/installation/) for your operating system.  

This should provide you with a fresh enviroment for replicating the current script.

_The system and enviroment details for creating this porgram, plus the full list of packages used, are found in the last two cells of the notebook_

In [1]:
import os
import requests
import pymongo
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("API_KEY")

In [2]:
# Create the mongo client:
client = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
# See existing databses in your local MongoDB. 
# If you've just installed MongoDB, you'll probably only see "admin", "config", and "local":
print(client.list_database_names())

['admin', 'config', 'local', 'nlu_pipeline_dev', 'openFDA', 'webscrapping_db']


In [3]:
# Define name for database. 
# If the db doesn't exist in MongoDB, a new db will be created in MongoDB after the first insertion of a document into it:
db_name = "openFDA"

In [4]:
# Check if database of interest exists:
db_list = client.list_database_names()
dboi = db_name
if dboi in db_list:
    print(dboi + " exists.")
else:
    print(dboi + " does not exist.")

openFDA exists.


In [5]:
# Create database
db = client[db_name]

In [6]:
# Create collection
collection_name = "astra_zeneca"
coll = db[collection_name]

In [7]:
# Check if collection exists:
coll_list = db.list_collection_names()
if collection_name in coll_list:
    print("Collection " + collection_name + " exists.")
else:
    print("Collection " + collection_name + " does not exist yet.")

Collection astra_zeneca does not exist yet.


In [8]:
# Description:
# - Function for querying the openFDA API drug label dataset, with one variable query parameter.
# Params:
# - api_key: your API key
# - search_key: the search query parameter
# - search_value: the search query value for search_key
# - batch_size: number of results per query, for recursively querying the API until all search query results are returned
# - max_api_calls: maximum number of recursions for querying the API, in case search query results is too large
# Return:
# - JSON object (if only 1 result), or list of JSON objects (if more than 1 results)

def get_openFDA_label_data(api_key, search_key, search_value, batch_size, max_api_calls):
    response_batch_size = batch_size
    current_call = 0
    output_list = []
    while current_call <= max_api_calls:
        current_skip = current_call * batch_size
        current_call += 1
        print("Current API call: " + str(current_call))
        if current_call == 1:
            url_query = "https://api.fda.gov/drug/label.json?api_key=" + api_key + "&search=" + search_key + ":'" + search_value + "" + "'&limit=" + str(batch_size)
        else:
            url_query = "https://api.fda.gov/drug/label.json?api_key=" + api_key + "&search=" + search_key + ":'" + search_value + "" + "'&limit=" + str(batch_size) + "&skip=" + str(current_skip)
        response_payload = requests.get(url_query)
        print("Status Code: " + str(response_payload.status_code))
        temp_list = response_payload.json()["results"]
        output_list.extend(temp_list)
        temp_batch_size = len(temp_list)
        print("Response batch size: " + str(temp_batch_size))
        if temp_batch_size < batch_size:
            break
    return(output_list)

In [9]:
# Call the openFDA drug label API for medicines from AstraZeneca":
az_medications = get_openFDA_label_data(API_KEY, "openfda.manufacturer_name", "AstraZeneca", batch_size=10, max_api_calls=10)

Current API call: 1
Status Code: 200
Response batch size: 10
Current API call: 2
Status Code: 200
Response batch size: 10
Current API call: 3
Status Code: 200
Response batch size: 10
Current API call: 4
Status Code: 200
Response batch size: 10
Current API call: 5
Status Code: 200
Response batch size: 1


In [10]:
# Identify the number of results for AstraZeneca medications in the drug label dataset:
len(az_medications)

41

In [11]:
# Explore the results by looking at the first medication in the response object:
az_medications[0]

{'effective_time': '20200327',
 'drug_interactions': ['7 DRUG INTERACTIONS • Concomitant use of strong CYP3A4 inhibitors: Reduce quetiapine dose to one sixth when coadministered with strong CYP3A4 inhibitors (e.g., ketoconazole, ritonavir) ( 2.5 , 7.1 , 12.3 ) • Concomitant use of strong CYP3A4 inducers: Increase quetiapine dose up to 5 fold when used in combination with a chronic treatment (more than 7-14 days) of potent CYP3A4 inducers (e.g., phenytoin, rifampin, St. John’s wort) ( 2.6 , 7.1 , 12.3 ) • Discontinuation of strong CYP3A4 inducers: Reduce quetiapine dose by 5 fold within 7-14 days of discontinuation of CYP3A4 inducers ( 2.6 , 7.1 , 12.3 ) 7.1 Effect of Other Drugs on Quetiapine The risks of using SEROQUEL in combination with other drugs have not been extensively evaluated in systematic studies. Given the primary CNS effects of SEROQUEL, caution should be used when it is taken in combination with other centrally acting drugs. SEROQUEL potentiated the cognitive and motor e

In [12]:
# Description:
# - Function for inserting documents into MongoDB in bulk.
# Params:
# - collection: name of the collection where the documents will be stored
# - medication_dict_list: the response object from querying the openFDA drug label dataset
# Return:
# - Prints the ObjectID created for every document inserted into MongoDB

def insert_medication_dicts_into_mongodb(collection, medication_dict_list):
    x = collection.insert_many(medication_dict_list)
    print(x.inserted_ids)

In [13]:
# Store results into MongoDB:
insert_medication_dicts_into_mongodb(coll, az_medications)

[ObjectId('5f5128004d5419b13561aa71'), ObjectId('5f5128004d5419b13561aa72'), ObjectId('5f5128004d5419b13561aa73'), ObjectId('5f5128004d5419b13561aa74'), ObjectId('5f5128004d5419b13561aa75'), ObjectId('5f5128004d5419b13561aa76'), ObjectId('5f5128004d5419b13561aa77'), ObjectId('5f5128004d5419b13561aa78'), ObjectId('5f5128004d5419b13561aa79'), ObjectId('5f5128004d5419b13561aa7a'), ObjectId('5f5128004d5419b13561aa7b'), ObjectId('5f5128004d5419b13561aa7c'), ObjectId('5f5128004d5419b13561aa7d'), ObjectId('5f5128004d5419b13561aa7e'), ObjectId('5f5128004d5419b13561aa7f'), ObjectId('5f5128004d5419b13561aa80'), ObjectId('5f5128004d5419b13561aa81'), ObjectId('5f5128004d5419b13561aa82'), ObjectId('5f5128004d5419b13561aa83'), ObjectId('5f5128004d5419b13561aa84'), ObjectId('5f5128004d5419b13561aa85'), ObjectId('5f5128004d5419b13561aa86'), ObjectId('5f5128004d5419b13561aa87'), ObjectId('5f5128004d5419b13561aa88'), ObjectId('5f5128004d5419b13561aa89'), ObjectId('5f5128004d5419b13561aa8a'), ObjectId('5

## Observations
- There were 41 medicine labels from AstraZeneca returned from the openFDA drug label dataset, identified with manufacturer_name == "AstraZeneca Pharmaceuticals LP" in the openfda object in the response objects.
- API queries looking for results of partial matches to the search terms "astra" or "zeneca" mostly returned objects with an empty openfda object, hence the manufacturer_name field was missing, not allowing to conclusive identify the medicine as belonging to AZ. An example of this returned objects contained the following substring in the field "how_supplied": "All trademarks are the property of the AstraZeneca group©AstraZeneca 2002AstraZeneca LP, Wilmington, DE 19850721668-04 Rev. 05/04" (product == POLOCAINE).
- The fields for "active_ingredient" and "inactive_ingredient" in AZ medicines labels are empty (i.e contain no data).
- However, a verbose account of AZ medication's active and inactive ingredients can be founded within the block of text inside the "description" field.

## Next Steps:
- Use the verbose account of AZ medication's active and inactive ingredients in the "description" field, to learn a model for Multi-Word Expressions (collocations) strictly biased for AZ's medicines' ingredients.

In [14]:
!conda info


     active environment : openFDA
    active env location : C:\anaconda3\envs\openFDA
            shell level : 2
       user config file : C:\Users\Francis Beeson\.condarc
 populated config files : C:\Users\Francis Beeson\.condarc
          conda version : 4.8.4
    conda-build version : 3.18.11
         python version : 3.8.5.final.0
       virtual packages : 
       base environment : C:\anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\anaconda3\pkgs
                          C:\Users\Francis Beeson\.conda\pkgs
                          C:\Users\Francis Beeson\AppData\Local\con

In [15]:
!conda list

# packages in environment at C:\anaconda3\envs\openFDA:
#
# Name                    Version                   Build  Channel
argon2-cffi               20.1.0                   pypi_0    pypi
attrs                     20.1.0                   pypi_0    pypi
backcall                  0.2.0                      py_0  
bleach                    3.1.5                    pypi_0    pypi
ca-certificates           2020.7.22                     0  
certifi                   2020.6.20                py38_0  
cffi                      1.14.2                   pypi_0    pypi
chardet                   3.0.4                    pypi_0    pypi
colorama                  0.4.3                      py_0  
decorator                 4.4.2                      py_0  
defusedxml                0.6.0                    pypi_0    pypi
entrypoints               0.3                      pypi_0    pypi
idna                      2.10                     pypi_0    pypi
ipykernel                 5.3.4            py38