<h2>Analyzing Enrollment Count and Duration according to Trial Phases</h2>

Clinical trials are commonly classified into five phases. Each phase of the drug approval process is treated as a separate clinical trial. Depending on the phase, one expect typically different numbers of patients and a different duration:

| Phase | Description | Expected Enrollment Count | Expected Duration
| --- | --- | --- | --- |
| Phase 1 | Safety and dosage | 20-80 | several months |
| Phase 2 | Efficacy and side effects | 100-300 | several months to 2 years |
| Phase 3 | Efficacy and monitoring of adverse reactions | 1000-3000 | 1 to 4 years |
| Phase 4 | Safety and efficacy | --- | --- |

Source: https://www.fda.gov/patients/drug-development-process/step-3-clinical-research#phases

<h3>Do our trials meet the typical expectations?</h3>

In [32]:
expectations = {
    "phase1": {
        'count_min': 20,
        'count_max': 80,
        'duration_min': 3,
        'duration_max': 12
    },
    "phase2": {
        'count_min': 100,
        'count_max': 300,
        'duration_min': 3,
        'duration_max': 24
    },
    "phase3": {
        'count_min': 1000,
        'count_max': 3000,
        'duration_min': 12,
        'duration_max': 48
    },
    "phase4": {
        'count_min': 1000,
        'count_max': 10000,
        'duration_min': 12,
        'duration_max': 72
    }
}

In [33]:
from pymongo import MongoClient
from datetime import datetime
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook

mongoinstance = "mongodb+srv://sovanta:Si8T8TtsViHYenjx@clinicaltrials-exomh.mongodb.net/test?retryWrites=true&w=majority"
try:
    client = MongoClient(mongoinstance)
    trialsDB = client['clinical-trials']
    clinicaltrials = trialsDB.list_collection_names()
    if "trials" in clinicaltrials:
        print("Collection 'trials' found in 'clinical-trials' DB")
    trialsCollection = trialsDB['trials']
except Exception as err:
    print("Problems initiating MongoDB - {}".format(err))

Collection 'trials' found in 'clinical-trials' DB


In [52]:
import statistics 

def getResults(phaseNumber):
    phase = "Phase " + str(phaseNumber)
    criteria = {
        '$and': [
            { 'EnrollmentCount': { '$ne': 0} },
            { 'EnrollmentDuration': { '$ne': 0} },
            #{ 'Phase': {'$eq': phase}} # containing phase
            { 'Phase': {'$eq': [phase]}} # only this phase
        ]
    }

    results = list(trialsCollection.find(criteria, projection={"_id": 0, 'NCTId':1, 'Phase': 1, 'EnrollmentCount':1, 'EnrollmentDuration': 1, 'LocationFacility': 1, 'LocationCountry': 1, 'CollaboratorName': 1, 'HealthyVolunteers': 1}))

    print(f"Number of trials in {phase}: {len(results)}")
    
    key = "phase" + str(phaseNumber)
    trials_meeting_all = []
    trials_meeting_only_duration = []
    trials_meeting_only_count = []
    trials_meeting_nothing = []
    
    duration_values = []
    count_values = []
    
    trials_over_duration = []
    trials_over_count = []
    
    results_dict = {}
    for trial in results:
        results_dict[trial['NCTId']] = trial
        count = trial['EnrollmentCount']
        count_values.append(count)
        duration = trial['EnrollmentDuration']
        duration_values.append(duration)
        if count > expectations[key]['count_min'] and count > expectations[key]['count_max'] and duration > expectations[key]['duration_min'] and duration < expectations[key]['duration_max']:
            trials_meeting_all.append(trial['NCTId'])
        elif count > expectations[key]['count_min'] and count < expectations[key]['count_max']:
            trials_meeting_only_count.append(trial['NCTId'])
            if duration > expectations[key]['duration_max']:
                trials_over_duration.append(trial['NCTId'])
        elif duration > expectations[key]['duration_min'] and duration < expectations[key]['duration_max']:
            trials_meeting_only_duration.append(trial['NCTId'])
            if count > expectations[key]['count_max']:
                trials_over_count.append(trial['NCTId'])
        else:
            trials_meeting_nothing.append(trial['NCTId'])
            if duration > expectations[key]['duration_max']:
                trials_over_duration.append(trial['NCTId'])
            if count > expectations[key]['count_max']:
                trials_over_count.append(trial['NCTId'])
               
    print("\nMeeting all expectations: ", len(trials_meeting_all))
    print("Meeting duration: ", len(trials_meeting_only_duration))
    print("Meeting count: ", len(trials_meeting_only_count))
    print("Meeting nothing: ", len(trials_meeting_nothing))
    
    print("\n-- Duration --")
    print(f"Expectation: {expectations[key]['duration_min']} to {expectations[key]['duration_max']} months")
    print("Mean: ", statistics.mean(duration_values))
    print("Min: ", min(duration_values))
    print("Max: ", max(duration_values))
    print("Median: ", statistics.median(duration_values))
    print("Standard deviation: ", statistics.stdev(duration_values))
    
    print("\n-- Enrollment Count --")
    print(f"Expectation: {expectations[key]['count_min']} to {expectations[key]['count_max']} patients")
    print("Mean: ", statistics.mean(count_values))
    print("Min: ", min(count_values))
    print("Max: ", max(count_values))
    print("Median: ", statistics.median(count_values))
    print("Standard deviation: ", statistics.stdev(count_values))
    
    # What could be the reason for a deviation?
    # 1: Trials meeting expectations
    print("\n-- Trials meeting expectations --")
    nr_research_center_involved = []
    nr_countries_involved = []
    nr_org_involved = []
    for id in trials_meeting_all:
        nr_research_center_involved.append(len(results_dict[id]['LocationFacility']))
        nr_countries_involved.append(len(results_dict[id]['LocationCountry']))
        nr_org_involved.append(len(results_dict[id]['CollaboratorName']))
    print("Mean Number of Research Centers involved: ", statistics.mean(nr_research_center_involved))
    print("Mean Number of Countries involved: ", statistics.mean(nr_countries_involved))
    print("Mean Number of Organizations involved: ", statistics.mean(nr_org_involved))  
          
    # 2: Trials over expected duration
    print("\n-- Trials over duration --")
    nr_research_center_involved = []
    nr_countries_involved = []
    nr_org_involved = []
    for id in trials_over_duration:
        nr_research_center_involved.append(len(results_dict[id]['LocationFacility']))
        nr_countries_involved.append(len(results_dict[id]['LocationCountry']))
        nr_org_involved.append(len(results_dict[id]['CollaboratorName']))
    print("Mean Number of Research Centers involved: ", statistics.mean(nr_research_center_involved))
    print("Mean Number of Countries involved: ", statistics.mean(nr_countries_involved))
    print("Mean Number of Organizations involved: ", statistics.mean(nr_org_involved))
    
    # 3: Trials over expected count
    print("\n-- Trials over count --")
    nr_research_center_involved = []
    nr_countries_involved = []
    nr_org_involved = []
    for id in trials_over_count:
        nr_research_center_involved.append(len(results_dict[id]['LocationFacility']))
        nr_countries_involved.append(len(results_dict[id]['LocationCountry']))
        nr_org_involved.append(len(results_dict[id]['CollaboratorName']))
    print("Mean Number of Research Centers involved: ", statistics.mean(nr_research_center_involved))
    print("Mean Number of Countries involved: ", statistics.mean(nr_countries_involved))
    print("Mean Number of Organizations involved: ", statistics.mean(nr_org_involved))

In [55]:
for i in range(1,4):
    print(f"### Result for Phase {i} ###")
    getResults(i)
    print("\n")

### Result for Phase 1 ###
Number of trials in Phase 1: 1241

Meeting all expectations:  29
Meeting duration:  68
Meeting count:  713
Meeting nothing:  431

-- Duration --
Expectation: 3 to 12 months
Mean:  29.92747784045125
Min:  1
Max:  190
Median:  22
Standard deviation:  29.00316268913256

-- Enrollment Count --
Expectation: 20 to 80 patients
Mean:  42.88154713940371
Min:  1
Max:  538
Median:  30
Standard deviation:  43.07779522532922

-- Trials meeting expectations --
Mean Number of Research Centers involved:  1.4137931034482758
Mean Number of Countries involved:  1.4482758620689655
Mean Number of Organizations involved:  0.3448275862068966

-- Trials over duration --
Mean Number of Research Centers involved:  4.380636604774536
Mean Number of Countries involved:  4.9442970822281165
Mean Number of Organizations involved:  0.4482758620689655

-- Trials over count --
Mean Number of Research Centers involved:  6.406779661016949
Mean Number of Countries involved:  7.9491525423728815
Me