# Using the OpenFDA API to get recall data

As a food safety and compliance professional, I have always had an interested in pulling straight from FDA's openFDA api all of the products subject to recall. When I first started playing with their API, I had a few questions:
- How can I pull recall data into a table?
- How can I get around the limit on each request to get everything I need for a given data range?

In this notebook, we'll explore how to answer these questions and more. If you have any questions, please reach out!

# Set up
First thing is to import the packages we need (only 2!).

In [1]:
# Import packages
import requests
import pandas as pd

# Create function for making API call
In the function "fetch_data", we will make a call to FDA's API url: https://api.fda.gov/food/enforcement.json,
add some filters ("params") and then normalize the loaded json into a dataframe.

In [2]:
def fetch_data(base_url, params):
    """
    Fetch data from the API using the base URL and parameters.
    Params:
        base_url: str, base URL of the API
        params: dict, parameters to be used for the API request
    """
    all_data = []
    while True:
        response = requests.get(base_url, params=params)
        if response.status_code == 200:
            data = response.json()
            if 'results' in data:
                all_data.extend(data['results'])
                params['skip'] += params['limit'] 
            else:
                print("No more results found.")
                break
        elif response.status_code == 404:
            if len(all_data)>0:
                print(f"{len(all_data)} results found - No more results available.")
                break
            else:
                print("No matches found. Check your query parameters.")
                break
        else:
            print("Failed to retrieve data:", response.status_code, response.text)
            break
    return all_data

# Fetch the data with an API call
Ok, below is the base url and the filters ("params") along with the date range. The params "limit" and "skip" are used to handle pagination.

In [3]:
# Instantiate search variables and parameters
base_url = 'https://api.fda.gov/food/enforcement.json'
start_date = '20240101'
end_date = '20240531'
query_params = {
    'search': f'recall_initiation_date:["{start_date}" TO "{end_date}"]',
    'limit': 100,
    'skip': 0 
}

# Fetch all data
data = fetch_data(base_url, query_params)

# Create DataFrame from the collected data
df = pd.json_normalize(data)

# Convert date columns
date_columns = ['recall_initiation_date', 'center_classification_date', 'termination_date', 'report_date']
for column in date_columns:
    df[column] = pd.to_datetime(df[column], format='%Y%m%d', errors='coerce').dt.date

437 results found - No more results available.


# Check results
Let's take a look at the data to confirm the range and columns of the new dataframe.

In [7]:
# Inspect results
print('Min date:', min_date := df.recall_initiation_date.min())
print('Max date:', max_date := df.recall_initiation_date.max())
print(df.columns)
df.head(2)

Min date: 2024-01-02
Max date: 2024-05-17
Index(['status', 'city', 'state', 'country', 'classification', 'product_type',
       'event_id', 'recalling_firm', 'address_1', 'address_2', 'postal_code',
       'voluntary_mandated', 'initial_firm_notification',
       'distribution_pattern', 'recall_number', 'product_description',
       'product_quantity', 'reason_for_recall', 'recall_initiation_date',
       'center_classification_date', 'report_date', 'code_info',
       'more_code_info', 'termination_date'],
      dtype='object')


Unnamed: 0,status,city,state,country,classification,product_type,event_id,recalling_firm,address_1,address_2,...,recall_number,product_description,product_quantity,reason_for_recall,recall_initiation_date,center_classification_date,report_date,code_info,more_code_info,termination_date
0,Ongoing,San Luis,AZ,United States,Class I,Food,93920,HandNatural,874 S Main St,,...,F-1170-2024,"H&NATURAL 2 PACK! BRAZIL SEED 60 PIECES, PURE ...",,Product recalled due to the presence of yellow...,2024-02-09,2024-04-16,2024-04-24,No Lot code on label. Expiration date listed a...,,NaT
1,Ongoing,New Century,KS,United States,Class I,Food,94565,Danisco USA Inc.,4 New Century Pkwy,,...,F-1333-2024,Grindsted Sweetlife CINN. Roll BAS Stabilizer ...,7257.60 kg,Potential Salmonella Contamination,2024-05-03,2024-06-06,2024-06-12,Lot number: 1204607256; 1204607257; 1204607258...,,NaT


In [6]:
# Check the names of products subject to recall, which are truncated in the above formatting
df.product_description.to_list()[:10]

['H&NATURAL 2 PACK! BRAZIL SEED 60 PIECES, PURE NATURAL SEMILLA DE BRASIL FOR 60 DAYS, 5 GRAMS PER BOX, 2 BLACK BOXES.',
 'Grindsted Sweetlife CINN. Roll BAS Stabilizer & Emulsifier System, 22.68 kg/ 50 lb bag',
 'GRANDE     1.5OZ FRESH BOCCONCINI MOZZ 1/3#         SUPC/ITEM # 19132P',
 'Sysco Classic Riced Cauliflower packed in a 4lb plastic bag (6 units of 4 lb. per carton box).',
 'Vanilla Ice Cream Cookie Sandwich. Sandwiches packaged in a gold or clear bags. 50 sandwiches per case and weighting 4oz each sandwich.',
 'California Wine Wafer - Mocha Chocolate and Original Wine Wafer - Mocha Chocolate [Two sizes: retail package of 8 wafers (7 oz) and retail gift package of 2 wafers (2 oz)]',
 'Charles Boggini Pink Lemonde; 1 or 5 U.S. Gallons',
 'Fresh Local , Country Corner Dairy,  YELLOW COLBY CHEESE, INGREDIENTS: Pasteurized Milk, Salt , Rennet, Calcium Chloride,  Cheese Culture, Contains Milk. Product comes in various sizes, 6 oz. up to 42 lbs. and is in vacuum sealed plastic pack

# Save results to CSV file

In [9]:
# Save csv to file
df.to_csv(f'../data/fda-food-recall_{min_date}-to-{max_date}.csv', index=False)