# Accessing USDA Recalls

As a food safety and compliance professional, I have always had an interested in pulling straight from USDA's api all of the products subject to recall. When I first started playing with their API, I had a few questions:
- How can I pull recall data into a table?
- How can I get around the limit on each request to get everything I need for a given data range?

In this notebook, we'll explore how to answer these questions and more. If you have any questions, please reach out!

For more on USDA's API documentation, visit this [link](https://www.fsis.usda.gov/sites/default/files/media_file/documents/Recall-API-documentation.pdf])

In [7]:
# Import packages
import requests
import pandas as pd

In [8]:
def fetch_data(base_url, params):
    all_data = []
    while True:
        response = requests.get(base_url, params=params)
        if response.status_code == 200:
            data = response.json()
            if 'results' in data:
                all_data.extend(data['results'])
                params['skip'] += params['limit'] 
            else:
                print("No more results found.")
                break
        elif response.status_code == 404:
            if len(all_data)>0:
                print(f"{len(all_data)} results found - No more results available.")
                break
            else:
                print("No matches found. Check your query parameters.")
                break
        else:
            print("Failed to retrieve data:", response.status_code, response.text)
            break
    return all_data

In [28]:
# Set base url
base_url = "https://www.fsis.usda.gov/fsis/api/recall/v/1"

# Set query parameters
query_params = {
    'field_states_id': 'All', 
    'field_year_id': 'All',  
    'field_archive_recall': 'All',
    'field_closed_year_id': 'All',
    'field_processing_id': 'All',
    'field_product_items_value': 'All',
    'field_recall_classification_id': 'All',
    'field_recall_reason_id': 'All',
    'field_recall_type_id': 'All',
    'field_related_to_outbreak': 'All',
    'field_translation_language': 'en'
}

with requests.Session() as session:
    # Set user-agent to avoid error
    session.headers.update({'User-Agent': 'Mozilla/5.0'})

    # Fetch data
    response = session.get(base_url, params=query_params, timeout=30)

# Check if the request was successful and parse data
if response.status_code == 200:
    data = response.json()
    
    # Create DataFrame from the collected data
    df = pd.json_normalize(data)
    
    # Convert date columns
    date_columns = ['field_recall_date', 'field_last_modified_date', 'field_closed_date']
    # Convert datetime to date
    for column in date_columns:
        df[column] = pd.to_datetime(df[column], format='%Y-%m-%d', errors='coerce').dt.date
else:
    print(f"Failed to retrieve data with status code: {response.status_code}")

In [35]:
# Inspect results
print('Shape:', df.shape)
print('Min date:', min_date := df.field_recall_date.min())
print('Max date:', max_date := df.field_recall_date.max())
print(df.columns)
df.head(2)

Shape: (1468, 28)
Min date: 2010-01-09
Max date: 2024-06-12
Index(['field_title', 'field_active_notice', 'field_states',
       'field_archive_recall', 'field_closed_date', 'field_closed_year',
       'field_company_media_contact', 'field_distro_list',
       'field_en_press_release', 'field_establishment', 'field_labels',
       'field_media_contact', 'field_risk_level', 'field_last_modified_date',
       'field_press_release', 'field_processing', 'field_product_items',
       'field_qty_recovered', 'field_recall_classification',
       'field_recall_date', 'field_recall_number', 'field_recall_reason',
       'field_recall_type', 'field_related_to_outbreak', 'field_summary',
       'field_year', 'langcode', 'field_has_spanish'],
      dtype='object')


Unnamed: 0,field_title,field_active_notice,field_states,field_archive_recall,field_closed_date,field_closed_year,field_company_media_contact,field_distro_list,field_en_press_release,field_establishment,...,field_recall_classification,field_recall_date,field_recall_number,field_recall_reason,field_recall_type,field_related_to_outbreak,field_summary,field_year,langcode,field_has_spanish
0,FSIS Issues Public Health Alert For Not-Ready...,True,Tennessee,False,NaT,,\n Company Contact\n \n Farmstead Butcher...,,,Farmstead Butcher Block LLC,...,Public Health Alert,2024-06-12,PHA-06122024-01,Unfit for Human Consumption,Public Health Alert,False,"<p><strong>WASHINGTON, June 12, 2024 </strong>...",2024,English,False
1,"Bonneval Foods, LLC Recalls Barbecue Pork Skin...",True,Louisiana,False,NaT,,"\n Company Contact\n \n Bonneval Foods, L...",,,Bonneval Foods LLC,...,Class I,2024-06-12,018-2024,"Misbranding, Unreported Allergens",Active Recall,False,"<p><strong>WASHINGTON, June 12, 2024</strong> ...",2024,English,True


In [33]:
# Inspect product items
df.field_product_items.to_list()[:10]

['•\tPackages purchased by weight at the retail counter containing “FARMSTEAD LOCAL HOUSE SMOKED BACON,” bearing sell by dates MAY.2.24 through JUN.27.24. ',
 '•\t1.7-oz. plastic bags containing “BONNEVAL’S BBQ Pork Skins”. All product available in retail or in consumer’s pantries is included in this recall. ',
 '11.75-oz. bowls containing “Bistro GRANDE SOUTHWESTERN STYLE WITH CHICKEN WITH SALSA RANCH Dressing” with use by date “JUN 12 2024,” time stamp “08:59,” lot code “217638176,” and establishment number “P-27497” printed on the label.',
 '10-oz. boxes containing four “WOW BAO BAO THAI-STYLE CURRY CHICKEN” with “best if used by” date “4/12/25” printed on the side of the box.',
 'Various weight cardboard cases labeled as “Frigorífico Casa Blanca S.A. AGUJA CHUCK ROLL” with case code JP0001 containing individually vacuum sealed products displaying “FRICASA”., Various weight cardboard cases labeled as “Frigorífico Casa Blanca S.A. ASADO SIN HUESO SHORT RIB” with case code JP0002 cont

In [37]:
# Save csv to file
df.to_csv(f'../data/usda-food-recall-{min_date}-to-{max_date}.csv', index=False)