In [6]:
import json
import requests

In [47]:
import os
import time
import numpy as np
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import string
import random
from IPython.core.display import display, HTML
import datetime

display(HTML("<style>.container { width:100% !important; }</style>"))

# (2020-05-05)

# An Exploratory Analysis of FDA Adverse Events Data

## Logistics

###### Prompt:

The U.S. Food and Drug Administration (FDA) regulates over-the-counter and prescription drugs in the United States, including biological therapeutics and generic drugs. This work covers more than just medicines. For example, fluoride toothpaste, antiperspirants, dandruff shampoos and sunscreens are all considered drugs. 
 
An adverse event is submitted to the FDA to report any undesirable experience associated with the use of a medical product in a patient. For drugs, this includes serious drug side effects, product use errors, product quality problems, and therapeutic failures for prescription or over-the-counter medicines and medicines administered to hospital patients or at outpatient infusion centers. 
 
The FDA's database of adverse event reports is made available through a web API at https://open.fda.gov/apis/try-the-api/ Each report contains general information about the report, patient information, a list of the drugs that the patient is taking, and a list of the patient reactions.  It is possible to use these data in many ways: your brief is to explore these data and to see what might be learned from them.  As a guide, you might consider a practical solution to one of the following questions: 

- Are different adverse events reported in different countries? 
- What are the different adverse events associated with different disease areas? 
- What drugs tend to be taken together? 
 
You should publish your code to your personal github repository and send a link two days before interview.  At interview you should expect to discuss your code, any statistics or visualizations you may have used, limitations of the underlying data, and how your solution could be generalized, extended, and made into a robust product.

###### Things To Do:

* [ ] explore the data in FAERS
* [ ] code for connecting to the API, querying and fetching results
- [x] GitHub repo: "fda_adverse_events" or maybe "azn_cs_faers" -> is it too cryptic?
- [ ] code for reading JSONs
- [ ] slidedeck
    - discuss code
    - explain stats & visualizations
    - discuss limitations of the data
    - explain findings
    - potential generalization, extension, and further development into a robust product
- [ ] email the GitHub repo (Saturday? Sunday?)
- [ ] rehearse presentation

## Some Key / Relevant Variables in FAERS for this Case Study (JSON Format)

- `meta`
- `results`
    - `safetyreportid` : The 8-digit Safety Report ID number, also known as the case report number or case ID. The first 7 digits (before the hyphen) identify an individual report and the last digit (after the hyphen) is a checksum. This field can be used to identify or find a specific adverse event report.
    - <mark>`receivedate` : Date that the report was first received by FDA. If this report has multiple versions, this will be the date the first version was received by FDA. **FDA USES THIS IN REPORTS**</mark>
    - `transmissiondate` : Date that the record was created. This may be earlier than the date the record was received by the FDA.
    - `receiptdate` : Date that the most recent information in the report was received by FDA.
    - `patient`
        - `patient.patientonsetage`
        - `patient.patientsex`
        - `patient.reaction`
            - `patient.reaction.reactionmeddrapt` : Patient reaction, as a MedDRA term. Note that these terms are encoded in British English. For instance, diarrhea is spelled diarrohea. MedDRA is a standardized medical terminology.  
            - `patient.reaction.reactionoutcome` :  Outcome of the reaction in reactionmeddrapt at the time of last observation. Value is one of the following
                1. Recovered/resolved
                2. Recovering/resolving
                3. Not recovered/not resolved
                4. Recovered/resolved with sequelae (consequent health issues)
                5. Fatal
                6. Unknown
        - `patient.drug`
            - `patient.drug.drugindication` :  Indication for the drug’s use.
            -
            - `patient.drug.openfda.pharm_class_epc` : drug class Established pharmacologic class associated with an approved indication of an active moiety (generic drug) that the FDA has determined to be scientifically valid and clinically meaningful. Takes the form of the pharmacologic class, followed by [EPC] (such as Thiazide Diuretic [EPC] or Tumor Necrosis Factor Blocker [EPC].
    - `primarysource.reportercountry` : Country from which the report was submitted.
    - `occurcountry` : The name of the country where the event occurred.
    - `primarysourcecountry` : Country of the reporter of the event
    - `primarysource.qualification` : who's reporting


* Drug Indication 
* Drug Adverse Events


## Connecting to the openFDA API

###### OpenFDA API key: 
"WCsZXD7fwzXRDV02maz9kKLAaRzs5J8kzHSSNIgw"

"With an API key: 240 requests per minute, per key. 120000 requests per day, per key."

![title](img/api_call.png)

In [13]:
api_key = "WCsZXD7fwzXRDV02maz9kKLAaRzs5J8kzHSSNIgw"

In [30]:
query_start = "https://api.fda.gov/drug/event.json?api_key="+ api_key

In [14]:
# Make a get request to get the latest position of the international space station from the opennotify api.
response = requests.get("https://api.fda.gov/drug/event.json?api_key="+ api_key+"&limit=1")
# Print the status code of the response.
print(response.status_code)

200


In [15]:
response.content

b'{\n  "meta": {\n    "disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service.",\n    "terms": "https://open.fda.gov/terms/",\n    "license": "https://open.fda.gov/license/",\n    "last_updated": "2020-05-02",\n    "results": {\n      "skip": 0,\n      "limit": 1,\n      "total": 11901829\n    }\n  },\n  "results": [\n    {\n      "receiptdateformat": "102",\n      "receiver": null,\n      "companynumb": "HQWYE821915MAR04",\n      "receivedateformat": "102",\n      "primarysource": null,\n      "seriousnessother": "1",\n      "transmissiondateformat": "102",\n      "fulfillexpeditecriteria": "1",\n      "safetyreportid": "4322505-4",\n      "sender": {\n        "senderorganization": "FDA-Public Use"\n      },\n      "receivedate": "20040319",\n      "patient": {\n   

In [16]:
# Make a get request to get the latest position of the international space station from the opennotify api.
response = requests.get("https://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+20081231]&limit=1")
# Print the status code of the response.
print(response.status_code)

200


In [22]:
response.content

b'{\n  "meta": {\n    "disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service.",\n    "terms": "https://open.fda.gov/terms/",\n    "license": "https://open.fda.gov/license/",\n    "last_updated": "2020-05-02",\n    "results": {\n      "skip": 0,\n      "limit": 1,\n      "total": 1313656\n    }\n  },\n  "results": [\n    {\n      "receiptdateformat": "102",\n      "receiver": null,\n      "companynumb": "HQWYE821915MAR04",\n      "receivedateformat": "102",\n      "primarysource": null,\n      "seriousnessother": "1",\n      "transmissiondateformat": "102",\n      "fulfillexpeditecriteria": "1",\n      "safetyreportid": "4322505-4",\n      "sender": {\n        "senderorganization": "FDA-Public Use"\n      },\n      "receivedate": "20040319",\n      "patient": {\n    

In [None]:
https://api.fda.gov/drug/event.json

## Parsing & Formatting JSON

## Exploration

## Answering Specific Questions from the Case Study

###### 1. Are different adverse events reported in different countries? 

relevant fields:
- `primarysource.reportercountry` : Country from which the report was submitted.
- `patient.reaction.reactionmeddrapt.exact` : Patient reaction, as a MedDRA term. Note that these terms are encoded in British English. For instance, diarrhea is spelled diarrohea. MedDRA is a standardized medical terminology.


potential ways of answering the question:
- pick top X countries, and report top X adverse events for each
- construct a matrix of top X countries vs top Y adverse events, and visualize how similar countries are in this hyperspace, do clustering analysis, scrutinize outliers

In [46]:
response = requests.get(query_start+"&count=primarysource.reportercountry.exact")

In [45]:
parsed = json.loads(response.content)
print(json.dumps(parsed, indent=2, sort_keys=True))

{
  "meta": {
    "disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service.",
    "last_updated": "2020-05-02",
    "license": "https://open.fda.gov/license/",
    "terms": "https://open.fda.gov/terms/"
  },
  "results": [
    {
      "count": 6015418,
      "term": "US"
    },
    {
      "count": 1955420,
      "term": "UNITED STATES"
    },
    {
      "count": 515144,
      "term": "COUNTRY NOT SPECIFIED"
    },
    {
      "count": 331015,
      "term": "GB"
    },
    {
      "count": 255643,
      "term": "JP"
    },
    {
      "count": 247876,
      "term": "CA"
    },
    {
      "count": 241461,
      "term": "FR"
    },
    {
      "count": 185304,
      "term": "DE"
    },
    {
      "count": 136483,
      "term": "IT"
    },
    {
      "count": 115681,

###### ANSWER:

###### 2. What are the different adverse events associated with different disease areas?

###### ANSWER:

###### 3. What drugs tend to be taken together?  

###### ANSWER:

## Exploration