# **FDA Drug Shortage Data**

Run on Python 3.12 | No Errors | No Warnings

Data Source: https://www.accessdata.fda.gov/scripts/drugshortages/Drugshortages.cfm

Drug Shortage Home Page: https://www.fda.gov/drugs/drug-safety-and-availability/drug-shortages

Due to ongoing supply issues with GLP-1 drugs, I developed this program to simplify access to the latest availability information reported by Eli Lilly and Novo Nordisk to the FDA. This program processes and organizes the data, presenting it in a clear and user-friendly format that meets my preferences.

In [1]:
# Import packages

# For data manipulation
import pandas as pd

In [2]:
# Load dataset into dataframe and verify the content

# Load the dataset directly from the web link into dataframe df0, skipping the first line and selecting only the columns needed
df0 = pd.read_csv("https://www.accessdata.fda.gov/scripts/drugshortages/Drugshortages.cfm", 
                  skiprows=1,
                  usecols=['Generic Name', 'Company Name', ' Presentation', ' Type of Update', 'Date of Update', ' Availability Information', ' Related Information', ' Reason for Shortage'])

# Display the first 5 rows of the dataframe
df0.head()

Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage
0,Acyclovir Tablet,Apotex Corp.,"Tablet, 400 mg/1 (NDC 60505-5306-1)",New,04/26/2024,,Business related decision to discontinue the p...,
1,Acyclovir Tablet,Apotex Corp.,"Tablet, 400 mg/1 (NDC 60505-5306-8)",New,04/26/2024,,Business related decision to discontinue the p...,
2,Acyclovir Tablet,Apotex Corp.,"Tablet, 800 mg/1 (NDC 60505-5307-1)",New,04/26/2024,,Business related decision to discontinue the p...,
3,Acyclovir Tablet,Apotex Corp.,"Tablet, 800 mg/1 (NDC 60505-5307-5)",New,04/26/2024,,Business related decision to discontinue the p...,
4,"Albuterol Sulfate Powder, Metered","Teva Pharmaceuticals USA, Inc.","ProAir Digihaler, Powder, Metered, 90 ug (NDC ...",New,04/30/2024,,,


In [3]:
# Verify the data types of the columns
df0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1911 entries, 0 to 1910
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   Generic Name               1911 non-null   object
 1   Company Name               1911 non-null   object
 2    Presentation              1911 non-null   object
 3    Type of Update            1911 non-null   object
 4   Date of Update             1911 non-null   object
 5    Availability Information  1436 non-null   object
 6    Related Information       894 non-null    object
 7    Reason for Shortage       565 non-null    object
dtypes: object(8)
memory usage: 119.6+ KB


In [4]:
# Rename columns the that have leading spaces in the names to remove the spaces
df0.rename(columns={' Presentation': 'Presentation', ' Type of Update': 'Type of Update', ' Availability Information': 'Availability Information', ' Related Information': 'Related Information',' Reason for Shortage': 'Reason for Shortage'}, inplace=True)

In [5]:
# Verify the updated column names to ensure the leading spaces have been removed
df0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1911 entries, 0 to 1910
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              1911 non-null   object
 1   Company Name              1911 non-null   object
 2   Presentation              1911 non-null   object
 3   Type of Update            1911 non-null   object
 4   Date of Update            1911 non-null   object
 5   Availability Information  1436 non-null   object
 6   Related Information       894 non-null    object
 7   Reason for Shortage       565 non-null    object
dtypes: object(8)
memory usage: 119.6+ KB


In [6]:
# Keep only the rows where the generic name starts with 'Tirzepatide', 'Semaglutide', 'Dulaglutide', or 'Liraglutide'
df0 = df0[df0['Generic Name'].str.startswith(('Tirzepatide', 'Semaglutide', 'Dulaglutide', 'Liraglutide'))]
df0.head()


Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage
705,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, .75 mg/.5 mL (NDC 0002-1...",Revised,07/02/2024,Available,,
706,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 1.5 mg/.5 mL (NDC 0002-1...",Revised,07/02/2024,Available,,
707,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 3 mg/.5 mL (NDC 0002-223...",Revised,07/02/2024,Limited Availability,Limited availability through Q3 2024.,Demand increase for the drug
708,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 4.5 mg/.5 mL (NDC 0002-3...",Revised,07/02/2024,Limited Availability,Limited availability through July 2024.,Demand increase for the drug
1110,Liraglutide Injection,"Novo Nordisk, Inc.","Saxenda, Injection, 6 mg/1 mL (NDC 0169-2800-15)",Revised,07/01/2024,Limited Availability,Estimated shortage duration TBD,Demand increase for the drug


In [7]:
# Verify the changes to ensure that only the desired rows remain
df0.info()

<class 'pandas.core.frame.DataFrame'>
Index: 27 entries, 705 to 1887
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              27 non-null     object
 1   Company Name              27 non-null     object
 2   Presentation              27 non-null     object
 3   Type of Update            27 non-null     object
 4   Date of Update            27 non-null     object
 5   Availability Information  27 non-null     object
 6   Related Information       12 non-null     object
 7   Reason for Shortage       12 non-null     object
dtypes: object(8)
memory usage: 1.9+ KB


In [8]:
# Clean and manipulate the data to make it more readable and useful

# Replace the company names with shorter names
df0['Company Name'] = df0['Company Name'].replace({'Eli Lilly and Co.': 'Eli Lilly', 'Novo Nordisk, Inc.': 'Novo Nordisk'})

# Remove the word 'Injection' from the Generic Name column
df0['Generic Name'] = df0['Generic Name'].str.replace(' Injection', '')

# Split the Presentation column into three new columns: Brand Name, Administration, and Dosage
df0[['Brand Name', 'Administration', 'Dosage']] = df0['Presentation'].str.split(',', expand=True)

# Trim leading and trailing spaces from the new columns
df0['Brand Name'] = df0['Brand Name'].str.strip()
df0['Administration'] = df0['Administration'].str.strip()
df0['Dosage'] = df0['Dosage'].str.strip()

# Verify the changes
df0.head()

Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Brand Name,Administration,Dosage
705,Dulaglutide,Eli Lilly,"Trulicity, Injection, .75 mg/.5 mL (NDC 0002-1...",Revised,07/02/2024,Available,,,Trulicity,Injection,.75 mg/.5 mL (NDC 0002-1433-80)
706,Dulaglutide,Eli Lilly,"Trulicity, Injection, 1.5 mg/.5 mL (NDC 0002-1...",Revised,07/02/2024,Available,,,Trulicity,Injection,1.5 mg/.5 mL (NDC 0002-1434-80)
707,Dulaglutide,Eli Lilly,"Trulicity, Injection, 3 mg/.5 mL (NDC 0002-223...",Revised,07/02/2024,Limited Availability,Limited availability through Q3 2024.,Demand increase for the drug,Trulicity,Injection,3 mg/.5 mL (NDC 0002-2236-80)
708,Dulaglutide,Eli Lilly,"Trulicity, Injection, 4.5 mg/.5 mL (NDC 0002-3...",Revised,07/02/2024,Limited Availability,Limited availability through July 2024.,Demand increase for the drug,Trulicity,Injection,4.5 mg/.5 mL (NDC 0002-3182-80)
1110,Liraglutide,Novo Nordisk,"Saxenda, Injection, 6 mg/1 mL (NDC 0169-2800-15)",Revised,07/01/2024,Limited Availability,Estimated shortage duration TBD,Demand increase for the drug,Saxenda,Injection,6 mg/1 mL (NDC 0169-2800-15)


In [9]:
# Delete the Presentation column as it's no longer needed
df0.drop('Presentation', axis=1, inplace=True)

# Verify the change
df0.head()

Unnamed: 0,Generic Name,Company Name,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Brand Name,Administration,Dosage
705,Dulaglutide,Eli Lilly,Revised,07/02/2024,Available,,,Trulicity,Injection,.75 mg/.5 mL (NDC 0002-1433-80)
706,Dulaglutide,Eli Lilly,Revised,07/02/2024,Available,,,Trulicity,Injection,1.5 mg/.5 mL (NDC 0002-1434-80)
707,Dulaglutide,Eli Lilly,Revised,07/02/2024,Limited Availability,Limited availability through Q3 2024.,Demand increase for the drug,Trulicity,Injection,3 mg/.5 mL (NDC 0002-2236-80)
708,Dulaglutide,Eli Lilly,Revised,07/02/2024,Limited Availability,Limited availability through July 2024.,Demand increase for the drug,Trulicity,Injection,4.5 mg/.5 mL (NDC 0002-3182-80)
1110,Liraglutide,Novo Nordisk,Revised,07/01/2024,Limited Availability,Estimated shortage duration TBD,Demand increase for the drug,Saxenda,Injection,6 mg/1 mL (NDC 0169-2800-15)


In [10]:
# Further clean the data for better usability

# Drop ending period from Availibility Information column, if any values end with a period
df0['Availability Information'] = df0['Availability Information'].str.rstrip('.')

# Replace Currently avaliable with Available
df0['Availability Information'] = df0['Availability Information'].replace('Currently available', 'Available')

# Drop ending period from Related Information column, if any values end with a period
df0['Related Information'] = df0['Related Information'].str.rstrip('.')

# If Availability Information is Available and Related Information is blank, replace Related Information with Rationing may apply
df0.loc[(df0['Availability Information'] == 'Available') & (df0['Related Information'].isnull()), 'Related Information'] = 'Rationing may apply'

# Replace entries ending with TBD with Shortage duration TBD
df0['Availability Information'] = df0['Availability Information'].replace('Limited availability.  Estimated shortage duration TBD', 'Shortage duration TBD')

# Create a new column called Dose, which is the Dosage column with everything after mg removed
df0['Dose'] = df0['Dosage'].str.split(' mg').str[0] + ' mg'

# Remove the last 3 characters from the Dose column
df0['Dose'] = df0['Dose'].str[:-3]

# Convert the Dose column to numeric and replace any errors with NaN to make it easier to work with 
df0['Dose'] = pd.to_numeric(df0['Dose'], errors='coerce')

# Sort on Brand Name, Date of Update, and Dose
df0.sort_values(by=['Brand Name', 'Date of Update', 'Dose'], inplace=True)

# Reindex the dataframe to reflect the new order
df0.reset_index(drop=True, inplace=True)

# Reorder the columns for better readability
df0 = df0[['Brand Name', 'Generic Name', 'Company Name', 'Administration', 'Dosage', 'Dose', 'Type of Update', 'Date of Update', 'Availability Information', 'Related Information','Reason for Shortage']]

# Verify the changes
df0.head()

Unnamed: 0,Brand Name,Generic Name,Company Name,Administration,Dosage,Dose,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage
0,Mounjaro,Tirzepatide,Eli Lilly,Injection,2.5 mg/.5 mL (NDC 0002-1506-80),2.5,Revised,07/02/2024,Available,Rationing may apply,
1,Mounjaro,Tirzepatide,Eli Lilly,Injection,5 mg/.5 mL (NDC 0002-1495-80),5.0,Revised,07/02/2024,Available,Rationing may apply,
2,Mounjaro,Tirzepatide,Eli Lilly,Injection,7.5 mg/.5 mL (NDC 0002-1484-80),7.5,Revised,07/02/2024,Available,Rationing may apply,
3,Mounjaro,Tirzepatide,Eli Lilly,Injection,10 mg/.5 mL (NDC 0002-1471-80),10.0,Revised,07/02/2024,Limited Availability,Limited availability through July 2024,Demand increase for the drug
4,Mounjaro,Tirzepatide,Eli Lilly,Injection,12.5 mg/.5 mL (NDC 0002-1460-80),12.5,Revised,07/02/2024,Available,Rationing may apply,


In [11]:
# Write the cleaned data to a CSV file without the index
df0.to_csv('Drugshortages_cleaned.csv', index=False)