# **FDA Drug Shortage Data**

Run on Python 3.13 | No Errors | No Warnings

Data Source: https://dps.fda.gov/drugshortages

Drug Shortage Home Page: https://www.fda.gov/drugs/drug-safety-and-availability/drug-shortages

This program processes FDA-reported availability data for GLP-1 drugs into a structured format, making it easier to analyze and visualize in Tableau.

In [1]:
# Import packages

# For data manipulation
import pandas as pd

In [2]:
# Load dataset into dataframe and verify

# Load the dataset to df0
df0 = pd.read_csv("DrugShortages.csv", usecols=['Generic Name', 'Company Name', 'Presentation', 'Type of Update', 'Date of Update', 'Availability Information', 'Related Information', 'Reason for Shortage', 'Status'])

# Display the first 5 rows of the dataframe
df0.head()

Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status
0,Acyclovir Ointment,"Mylan Institutional, a Viatris Company","Acyclovir Ointment 5% 15gram Tube, Ointment, 5...",New,2024-09-16,,,,To Be Discontinued
1,Acyclovir Ointment,"Mylan Institutional, a Viatris Company","Acyclovir Ointment 5% 5g 2pk, Ointment, 5% 5g ...",New,2024-09-16,,,,To Be Discontinued
2,Acyclovir Tablet,Apotex Corp.,"Tablet, 400 mg/1 (NDC 60505-5306-1)",New,2024-04-26,,Business related decision to discontinue the p...,,To Be Discontinued
3,Acyclovir Tablet,Apotex Corp.,"Tablet, 400 mg/1 (NDC 60505-5306-8)",New,2024-04-26,,Business related decision to discontinue the p...,,To Be Discontinued
4,Acyclovir Tablet,Apotex Corp.,"Tablet, 800 mg/1 (NDC 60505-5307-1)",New,2024-04-26,,Business related decision to discontinue the p...,,To Be Discontinued


In [3]:
# Verify the data types of the columns
df0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1932 entries, 0 to 1931
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              1932 non-null   object
 1   Company Name              1932 non-null   object
 2   Presentation              1932 non-null   object
 3   Type of Update            1932 non-null   object
 4   Date of Update            1932 non-null   object
 5   Availability Information  1425 non-null   object
 6   Related Information       1035 non-null   object
 7   Reason for Shortage       571 non-null    object
 8   Status                    1932 non-null   object
dtypes: object(9)
memory usage: 136.0+ KB


In [4]:
# Verify the updated column names to ensure the leading spaces have been removed
df0.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1932 entries, 0 to 1931
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              1932 non-null   object
 1   Company Name              1932 non-null   object
 2   Presentation              1932 non-null   object
 3   Type of Update            1932 non-null   object
 4   Date of Update            1932 non-null   object
 5   Availability Information  1425 non-null   object
 6   Related Information       1035 non-null   object
 7   Reason for Shortage       571 non-null    object
 8   Status                    1932 non-null   object
dtypes: object(9)
memory usage: 136.0+ KB


In [5]:
# Keep only the rows where the generic name starts with 'Tirzepatide', 'Semaglutide', 'Dulaglutide', or 'Liraglutide'
df0 = df0[df0['Generic Name'].str.startswith(('Tirzepatide', 'Semaglutide', 'Dulaglutide', 'Liraglutide', 'Exenatide'))]
df0.head()


Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status
694,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, .75 mg/.5 mL (NDC 0002-1...",Reverified,2025-02-07,Available,,,Current
695,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 1.5 mg/.5 mL (NDC 0002-1...",Reverified,2025-02-07,Available,,,Current
696,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 3 mg/.5 mL (NDC 0002-223...",Reverified,2025-02-07,Available,,,Current
697,Dulaglutide Injection,Eli Lilly and Co.,"Trulicity, Injection, 4.5 mg/.5 mL (NDC 0002-3...",Reverified,2025-02-07,Available,,,Current
757,"Exenatide Synthetic Injectable Suspension, Ext...",AstraZeneca AB,"Bydureon Bcise, Injectable Suspension, Extende...",New,2024-10-28,,,,To Be Discontinued


In [6]:
# Verify the changes to ensure that only the desired rows remain
df0.info()

<class 'pandas.core.frame.DataFrame'>
Index: 32 entries, 694 to 1901
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Generic Name              32 non-null     object
 1   Company Name              32 non-null     object
 2   Presentation              32 non-null     object
 3   Type of Update            32 non-null     object
 4   Date of Update            32 non-null     object
 5   Availability Information  9 non-null      object
 6   Related Information       5 non-null      object
 7   Reason for Shortage       3 non-null      object
 8   Status                    32 non-null     object
dtypes: object(9)
memory usage: 2.5+ KB


In [7]:
# Clean and manipulate the data to make it more readable and useful

# Replace the company names with shorter names
df0['Company Name'] = df0['Company Name'].replace({'Eli Lilly and Co.': 'Eli Lilly', 'Novo Nordisk, Inc.': 'Novo Nordisk', 'AstraZeneca AB': 'AstraZeneca'})

# Remove the word 'Injection' from the Generic Name column
df0['Generic Name'] = df0['Generic Name'].str.replace(' Injection', '')

# Split the Presentation column into three new columns: Brand Name, Administration, and Dosage
def split_presentation(pres):
    pres = pres.strip()
    parts = pres.split(',', maxsplit=2)
    if pres.startswith('Injection'):
        # Generic version
        brand_name = 'Liraglutide'
        administration = parts[0].strip()
        dosage = parts[1].strip() if len(parts) > 1 else ''
    else:
        # Brand-name version
        brand_name = parts[0].strip()
        administration = parts[1].strip() if len(parts) > 1 else ''
        dosage = parts[2].strip() if len(parts) > 2 else ''
    return pd.Series([brand_name, administration, dosage], index=['Brand Name', 'Administration', 'Dosage'])

# Apply the function to your DataFrame
df0[['Brand Name', 'Administration', 'Dosage']] = df0['Presentation'].apply(split_presentation)

# Trim leading and trailing spaces from the new columns
df0['Brand Name'] = df0['Brand Name'].str.strip()
df0['Administration'] = df0['Administration'].str.strip()
df0['Dosage'] = df0['Dosage'].str.strip()

# Verify the changes
df0.head()

Unnamed: 0,Generic Name,Company Name,Presentation,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status,Brand Name,Administration,Dosage
694,Dulaglutide,Eli Lilly,"Trulicity, Injection, .75 mg/.5 mL (NDC 0002-1...",Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,.75 mg/.5 mL (NDC 0002-1433-80)
695,Dulaglutide,Eli Lilly,"Trulicity, Injection, 1.5 mg/.5 mL (NDC 0002-1...",Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,1.5 mg/.5 mL (NDC 0002-1434-80)
696,Dulaglutide,Eli Lilly,"Trulicity, Injection, 3 mg/.5 mL (NDC 0002-223...",Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,3 mg/.5 mL (NDC 0002-2236-80)
697,Dulaglutide,Eli Lilly,"Trulicity, Injection, 4.5 mg/.5 mL (NDC 0002-3...",Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,4.5 mg/.5 mL (NDC 0002-3182-80)
757,"Exenatide Synthetic Injectable Suspension, Ext...",AstraZeneca,"Bydureon Bcise, Injectable Suspension, Extende...",New,2024-10-28,,,,To Be Discontinued,Bydureon Bcise,Injectable Suspension,"Extended Release, 2 mg/.85 mL (NDC 0310-6540-04)"


In [8]:
# Delete the Presentation column as it's no longer needed
df0.drop('Presentation', axis=1, inplace=True)

# Verify the change
df0.head()

Unnamed: 0,Generic Name,Company Name,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage,Status,Brand Name,Administration,Dosage
694,Dulaglutide,Eli Lilly,Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,.75 mg/.5 mL (NDC 0002-1433-80)
695,Dulaglutide,Eli Lilly,Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,1.5 mg/.5 mL (NDC 0002-1434-80)
696,Dulaglutide,Eli Lilly,Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,3 mg/.5 mL (NDC 0002-2236-80)
697,Dulaglutide,Eli Lilly,Reverified,2025-02-07,Available,,,Current,Trulicity,Injection,4.5 mg/.5 mL (NDC 0002-3182-80)
757,"Exenatide Synthetic Injectable Suspension, Ext...",AstraZeneca,New,2024-10-28,,,,To Be Discontinued,Bydureon Bcise,Injectable Suspension,"Extended Release, 2 mg/.85 mL (NDC 0310-6540-04)"


In [9]:
# Further clean the data for better usability

# Drop ending period from Availibility Information column, if any values end with a period
df0['Availability Information'] = df0['Availability Information'].str.rstrip('.')

# Replace Currently avaliable with Available
df0['Availability Information'] = df0['Availability Information'].replace('Currently available', 'Available')

# If Status contains Discontinued, replace Availability Information with Discontinued
df0.loc[df0['Status'].str.contains('Discontinued'), 'Availability Information'] = 'Discontinued'

# If Administration contains Injectable, replace Administration with Injection
df0.loc[df0['Administration'].str.contains('Injectable'), 'Administration'] = 'Injection'

# If brand Name is Bydureon BCise or Byette, replace Generic Name with Exenatide
df0.loc[df0['Brand Name'].str.contains('Bydureon Bcise|Byetta'), 'Generic Name'] = 'Exenatide'

# Drop ending period from Related Information column, if any values end with a period
df0['Related Information'] = df0['Related Information'].str.rstrip('.')

# If Status is Resolved, replace Availability Information with Resolved
df0.loc[df0['Status'] == 'Resolved', 'Availability Information'] = 'Resolved'

# If Availability Information is Available or Resolved and Related Information is blank, replace Related Information with Rationing may apply
df0.loc[(df0['Availability Information'] == 'Available') & (df0['Related Information'].isnull()), 'Related Information'] = 'Rationing may apply'

df0.loc[(df0['Availability Information'] == 'Resolved') & (df0['Related Information'].isnull()), 'Related Information'] = 'Availability may vary'

# Replace entries ending with TBD with Shortage duration TBD
df0['Availability Information'] = df0['Availability Information'].replace('Limited availability.  Estimated shortage duration TBD', 'Shortage duration TBD')

# Create a new column called Dose, which is the Dosage column with everything after mg removed
df0['Dose'] = df0['Dosage'].str.split(' mg').str[0] + ' mg'

# Remove the last 3 characters from the Dose column
df0['Dose'] = df0['Dose'].str[:-3]

# If Brand Name is Bydureon Bcise, replace Dose with 2
df0.loc[df0['Brand Name'] == 'Bydureon Bcise', 'Dose'] = '2'

# If Brand Name is Byetta, replace Dose with first three characters of Dosage
df0.loc[df0['Brand Name'] == 'Byetta', 'Dose'] = df0['Dosage'].str[:3]

# If Brand Name is Bydureon Bcise, Delete 'Extended Release, ' from the value in the Dosage column
df0.loc[df0['Brand Name'] == 'Bydureon Bcise', 'Dosage'] = df0['Dosage'].str.replace('Extended Release, ', '')

# If Brand Name is Byetta, delete '(250MCG/ML) ' from the value in the Dosage column
df0.loc[df0['Brand Name'] == 'Byetta', 'Dosage'] = df0['Dosage'].str.replace('(250MCG/ML) ', '')

# Convert the Dose column to numeric and replace any errors with NaN to make it easier to work with 
df0['Dose'] = pd.to_numeric(df0['Dose'], errors='coerce')

# If Company Name is AstraZeneca and Availibility Information is NaN, replace Related Information with 'Please select an alternative GLP-1 product'
df0.loc[(df0['Company Name'] == 'AstraZeneca') & (df0['Related Information'].isnull()), 'Related Information'] = 'Please select an alternative GLP-1 product'

# Sort on Brand Name, Date of Update, and Dose
df0.sort_values(by=['Brand Name', 'Date of Update', 'Dose'], inplace=True)

# Reindex the dataframe to reflect the new order
df0.reset_index(drop=True, inplace=True)

# Reorder the columns for better readability
df0 = df0[['Brand Name', 'Generic Name', 'Company Name', 'Administration', 'Dosage', 'Dose', 'Type of Update', 'Date of Update', 'Availability Information', 'Related Information','Reason for Shortage']]

# Verify the changes
df0.head()

Unnamed: 0,Brand Name,Generic Name,Company Name,Administration,Dosage,Dose,Type of Update,Date of Update,Availability Information,Related Information,Reason for Shortage
0,Bydureon Bcise,Exenatide,AstraZeneca,Injection,2 mg/.85 mL (NDC 0310-6540-04),2.0,New,2024-10-28,Discontinued,Please select an alternative GLP-1 product,
1,Byetta,Exenatide,AstraZeneca,Injection,300MCG/1.2ML (NDC 0310-6512-01),300.0,New,2024-10-25,Discontinued,Please select an alternative GLP-1 product,
2,Byetta,Exenatide,AstraZeneca,Injection,600MCG/2.4ML (NDC 0310-6524-01),600.0,New,2024-10-25,Discontinued,Please select an alternative GLP-1 product,
3,Liraglutide,Liraglutide,Novo Nordisk,Injection,6 mg/1 mL (NDC 0480-3667-20),6.0,Reverified,2025-02-18,Available,Distributed by Teva,
4,Liraglutide,Liraglutide,Novo Nordisk,Injection,6 mg/1 mL (NDC 0480-3667-22),6.0,Reverified,2025-02-18,Available,Distributed by Teva,


In [10]:
# Write the cleaned data to a CSV file without the index
df0.to_csv('Drugshortages_cleaned.csv', index=False)