# **FDA Drug Shortage Data**

Run on Python 3.12 | No Errors | No Warnings

Data Source: https://www.accessdata.fda.gov/scripts/drugshortages/Drugshortages.cfm

Drug Shortage Home Page: https://www.fda.gov/drugs/drug-safety-and-availability/drug-shortages

Due to ongoing supply issues with GLP-1 drugs, I developed this program to simplify access to the latest availability information reported by Eli Lilly and Novo Nordisk to the FDA. This program processes and organizes the data, presenting it in a clear and user-friendly format that meets my preferences.

In [34]:
# Import packages

# For data manipulation
import pandas as pd

In [35]:
# Load dataset into dataframe and verify

# Load the dataset directly from the web link into dataframe df0, skipping the first line and selecting only the columns needed
df0 = pd.read_csv("https://www.accessdata.fda.gov/scripts/drugshortages/Drugshortages.cfm", 
                  skiprows=1,
                  usecols=['Generic Name',
                           'Company Name',
                           ' Presentation',
                           ' Type of Update',
                           'Date of Update',
                           ' Availability Information',
                           ' Related Information',
                           ' Reason for Shortage',
                           ' Status'])

# Display the first 5 rows of the dataframe
df0.head()

Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status
0,Acyclovir Ointment,"Mylan Institutional, a Viatris Company","Acyclovir Ointment 5% 15gram Tube, Ointment, 5...",New,09/16/2024,,,,To Be Discontinued
1,Acyclovir Ointment,"Mylan Institutional, a Viatris Company","Acyclovir Ointment 5% 5g 2pk, Ointment, 5% 5g ...",New,09/16/2024,,,,To Be Discontinued
2,Acyclovir Tablet,Apotex Corp.,"Tablet, 400 mg/1 (NDC 60505-5306-1)",New,04/26/2024,,Business related decision to discontinue the p...,,To Be Discontinued
3,Acyclovir Tablet,Apotex Corp.,"Tablet, 400 mg/1 (NDC 60505-5306-8)",New,04/26/2024,,Business related decision to discontinue the p...,,To Be Discontinued
4,Acyclovir Tablet,Apotex Corp.,"Tablet, 800 mg/1 (NDC 60505-5307-1)",New,04/26/2024,,Business related decision to discontinue the p...,,To Be Discontinued


In [36]:
# Verify the data types of the columns
df0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   Generic Name               1898 non-null   object
 1   Company Name               1898 non-null   object
 2    Presentation              1898 non-null   object
 3    Type of Update            1898 non-null   object
 4   Date of Update             1898 non-null   object
 5    Availability Information  1424 non-null   object
 6    Related Information       841 non-null    object
 7    Reason for Shortage       539 non-null    object
 8    Status                    1898 non-null   object
dtypes: object(9)
memory usage: 133.6+ KB


In [37]:
# Rename columns the that have leading spaces in the names to remove the spaces
df0.rename(columns={' Presentation': 'Presentation',
                    ' Type of Update': 'Type of Update',
                    ' Availability Information': 'Availability Information',
                    ' Related Information': 'Related Information',
                    ' Reason for Shortage': 'Reason for Shortage',
                    ' Status': 'Status'}, inplace=True)

In [38]:
# Verify the updated column names to ensure the leading spaces have been removed
df0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              1898 non-null   object
 1   Company Name              1898 non-null   object
 2   Presentation              1898 non-null   object
 3   Type of Update            1898 non-null   object
 4   Date of Update            1898 non-null   object
 5   Availability Information  1424 non-null   object
 6   Related Information       841 non-null    object
 7   Reason for Shortage       539 non-null    object
 8   Status                    1898 non-null   object
dtypes: object(9)
memory usage: 133.6+ KB


In [39]:
# Keep only the rows where the generic name starts with 'Tirzepatide', 'Semaglutide', 'Dulaglutide', or 'Liraglutide'
df0 = df0[df0['Generic Name'].str.startswith(('Tirzepatide', 'Semaglutide', 'Dulaglutide', 'Liraglutide'))]
df0.head()


Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status
699,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, .75 mg/.5 mL (NDC 0002-1...",Reverified,09/17/2024,Available,,,Current
700,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 1.5 mg/.5 mL (NDC 0002-1...",Reverified,09/17/2024,Available,,,Current
701,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 3 mg/.5 mL (NDC 0002-223...",Reverified,09/17/2024,Available,,,Current
702,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 4.5 mg/.5 mL (NDC 0002-3...",Reverified,09/17/2024,Available,,,Current
1109,Liraglutide Injection,"Novo Nordisk, Inc.","Victoza, Injection, 6 mg/1 mL (NDC 0169-4060-12)",Reverified,09/16/2024,Limited Availability,Estimated shortage duration TBD,Delay in shipping of the drug,Current


In [40]:
# Verify the changes to ensure that only the desired rows remain
df0.info()

<class 'pandas.core.frame.DataFrame'>
Index: 29 entries, 699 to 1870
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              29 non-null     object
 1   Company Name              29 non-null     object
 2   Presentation              29 non-null     object
 3   Type of Update            29 non-null     object
 4   Date of Update            29 non-null     object
 5   Availability Information  17 non-null     object
 6   Related Information       6 non-null      object
 7   Reason for Shortage       4 non-null      object
 8   Status                    29 non-null     object
dtypes: object(9)
memory usage: 2.3+ KB


In [41]:
# Clean and manipulate the data to make it more readable and useful

# Replace the company names with shorter names
df0['Company Name'] = df0['Company Name'].replace({'Eli Lilly and Co.': 'Eli Lilly', 'Novo Nordisk, Inc.': 'Novo Nordisk'})

# Remove the word 'Injection' from the Generic Name column
df0['Generic Name'] = df0['Generic Name'].str.replace(' Injection', '')

# Split the Presentation column into three new columns: Brand Name, Administration, and Dosage
def split_presentation(pres):
    pres = pres.strip()
    parts = pres.split(',', maxsplit=2)
    if pres.startswith('Injection'):
        # Generic version
        brand_name = 'Liraglutide'
        administration = parts[0].strip()
        dosage = parts[1].strip() if len(parts) > 1 else ''
    else:
        # Brand-name version
        brand_name = parts[0].strip()
        administration = parts[1].strip() if len(parts) > 1 else ''
        dosage = parts[2].strip() if len(parts) > 2 else ''
    return pd.Series([brand_name, administration, dosage], index=['Brand Name', 'Administration', 'Dosage'])

# Apply the function to your DataFrame
df0[['Brand Name', 'Administration', 'Dosage']] = df0['Presentation'].apply(split_presentation)

# Trim leading and trailing spaces from the new columns
df0['Brand Name'] = df0['Brand Name'].str.strip()
df0['Administration'] = df0['Administration'].str.strip()
df0['Dosage'] = df0['Dosage'].str.strip()

# Verify the changes
df0.head()

Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status,Brand Name,Administration,Dosage
699,Dulaglutide,Eli Lilly,"Trulicity, Injection, .75 mg/.5 mL (NDC 0002-1...",Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,.75 mg/.5 mL (NDC 0002-1433-80)
700,Dulaglutide,Eli Lilly,"Trulicity, Injection, 1.5 mg/.5 mL (NDC 0002-1...",Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,1.5 mg/.5 mL (NDC 0002-1434-80)
701,Dulaglutide,Eli Lilly,"Trulicity, Injection, 3 mg/.5 mL (NDC 0002-223...",Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,3 mg/.5 mL (NDC 0002-2236-80)
702,Dulaglutide,Eli Lilly,"Trulicity, Injection, 4.5 mg/.5 mL (NDC 0002-3...",Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,4.5 mg/.5 mL (NDC 0002-3182-80)
1109,Liraglutide,Novo Nordisk,"Victoza, Injection, 6 mg/1 mL (NDC 0169-4060-12)",Reverified,09/16/2024,Limited Availability,Estimated shortage duration TBD,Delay in shipping of the drug,Current,Victoza,Injection,6 mg/1 mL (NDC 0169-4060-12)


In [42]:
# Delete the Presentation column as it's no longer needed
df0.drop('Presentation', axis=1, inplace=True)

# Verify the change
df0.head()

Unnamed: 0,Generic Name,Company Name,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status,Brand Name,Administration,Dosage
699,Dulaglutide,Eli Lilly,Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,.75 mg/.5 mL (NDC 0002-1433-80)
700,Dulaglutide,Eli Lilly,Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,1.5 mg/.5 mL (NDC 0002-1434-80)
701,Dulaglutide,Eli Lilly,Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,3 mg/.5 mL (NDC 0002-2236-80)
702,Dulaglutide,Eli Lilly,Reverified,09/17/2024,Available,,,Current,Trulicity,Injection,4.5 mg/.5 mL (NDC 0002-3182-80)
1109,Liraglutide,Novo Nordisk,Reverified,09/16/2024,Limited Availability,Estimated shortage duration TBD,Delay in shipping of the drug,Current,Victoza,Injection,6 mg/1 mL (NDC 0169-4060-12)


In [43]:
# Further clean the data for better usability

# Drop ending period from Availibility Information column, if any values end with a period
df0['Availability Information'] = df0['Availability Information'].str.rstrip('.')

# Replace Currently avaliable with Available
df0['Availability Information'] = df0['Availability Information'].replace('Currently available', 'Available')

# Drop ending period from Related Information column, if any values end with a period
df0['Related Information'] = df0['Related Information'].str.rstrip('.')

# If Status is Resolved, replace Availability Information with Resolved
df0.loc[df0['Status'] == 'Resolved', 'Availability Information'] = 'Resolved'

# If Availability Information is Available or Resolved and Related Information is blank, replace Related Information with Rationing may apply
df0.loc[(df0['Availability Information'] == 'Available') & (df0['Related Information'].isnull()), 'Related Information'] = 'Rationing may apply'

df0.loc[(df0['Availability Information'] == 'Resolved') & (df0['Related Information'].isnull()), 'Related Information'] = 'Available'

# Replace entries ending with TBD with Shortage duration TBD
df0['Availability Information'] = df0['Availability Information'].replace('Limited availability.  Estimated shortage duration TBD', 'Shortage duration TBD')

# Create a new column called Dose, which is the Dosage column with everything after mg removed
df0['Dose'] = df0['Dosage'].str.split(' mg').str[0] + ' mg'

# Remove the last 3 characters from the Dose column
df0['Dose'] = df0['Dose'].str[:-3]

# Convert the Dose column to numeric and replace any errors with NaN to make it easier to work with 
df0['Dose'] = pd.to_numeric(df0['Dose'], errors='coerce')

# Sort on Brand Name, Date of Update, and Dose
df0.sort_values(by=['Brand Name', 'Date of Update', 'Dose'], inplace=True)

# Reindex the dataframe to reflect the new order
df0.reset_index(drop=True, inplace=True)

# Reorder the columns for better readability
df0 = df0[['Brand Name', 'Generic Name', 'Company Name', 'Administration', 'Dosage', 'Dose', 'Type of Update', 'Date of Update', 'Availability Information', 'Related Information','Reason for Shortage']]

# Verify the changes
df0.head()

Unnamed: 0,Brand Name,Generic Name,Company Name,Administration,Dosage,Dose,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage
0,Liraglutide,Liraglutide,Novo Nordisk,Injection,6 mg/1 mL (NDC 0480-3667-20),6.0,Reverified,09/16/2024,Available,Distributed by Teva,
1,Liraglutide,Liraglutide,Novo Nordisk,Injection,6 mg/1 mL (NDC 0480-3667-22),6.0,Reverified,09/16/2024,Available,Distributed by Teva,
2,Mounjaro,Tirzepatide,Eli Lilly,Injection,2.5 mg/.5 mL (NDC 0002-1506-80),2.5,Revised,10/02/2024,Resolved,Available,
3,Mounjaro,Tirzepatide,Eli Lilly,Injection,5 mg/.5 mL (NDC 0002-1495-80),5.0,Revised,10/02/2024,Resolved,Available,
4,Mounjaro,Tirzepatide,Eli Lilly,Injection,7.5 mg/.5 mL (NDC 0002-1484-80),7.5,Revised,10/02/2024,Resolved,Available,


In [44]:
# Write the cleaned data to a CSV file without the index
df0.to_csv('Drugshortages_cleaned.csv', index=False)