# Test USDA Bootstrap Flow Independently

This notebook tests the bootstrap_usda_commodities flow before registering it with Prefect.

**What this tests:**
- Can we import the flow?
- Does the flow run without errors?
- Are commodities created in the database?
- Can we verify the results?

## Step 1: Import Required Libraries

In [28]:
import sys
import os

# Add src to path so imports work
sys.path.insert(0, 'src/ca_biositing/pipeline')
sys.path.insert(0, 'src/ca_biositing/datamodels')

from prefect import flow, task, get_run_logger
from sqlmodel import Session, select
from typing import List, Optional

print("✓ Imports successful")

✓ Imports successful


## Step 2: Import Database Models

In [29]:
# Step 2: Import Database Models and Utilities
try:
    from ca_biositing.datamodels import UsdaCommodity
    from ca_biositing.datamodels.database import engine
    print("✓ Database models imported")
except ImportError as e:
    print(f"✗ Import error: {e}")
    print("Make sure database is running: pixi run start-services")

✗ Import error: cannot import name 'UsdaCommodity' from 'ca_biositing.datamodels' (c:\Users\meili\forked\ca-biositing\src/ca_biositing/datamodels\ca_biositing\datamodels\__init__.py)
Make sure database is running: pixi run start-services


## Step 3: Define a Task to Add Commodities

In [30]:
# Step 3: Define a Task to Add Commodities to Database
@task
def populate_usda_commodities(commodities: List[str]) -> int:
    """
    Adds USDA commodity records to the database.
    
    Args:
        commodities: List of commodity names like ["ALMONDS", "CORN", "WHEAT"]
    
    Returns:
        Number of commodities added
    """
    # Import inside task so it has access in task context
    from ca_biositing.datamodels.database import engine, UsdaCommodity
    
    logger = get_run_logger()
    count = 0
    
    try:
        with Session(engine) as session:
            for name in commodities:
                # Check if already exists
                existing = session.exec(
                    select(UsdaCommodity).where(UsdaCommodity.name == name)
                ).first()
                
                if not existing:
                    commodity = UsdaCommodity(
                        name=name,
                        usda_source="NASS",
                        description=f"NASS commodity: {name}"
                    )
                    session.add(commodity)
                    count += 1
                    logger.info(f"Added: {name}")
                else:
                    logger.warning(f"Already exists: {name}")
            
            session.commit()
            logger.info(f"✓ Committed {count} new commodities")
    
    except Exception as e:
        logger.error(f"Error adding commodities: {e}")
        raise
    
    return count

print("✓ Task defined")

✓ Task defined


## Step 4: Create a Flow

In [31]:
# Step 4: Create a Flow That Uses the Task
@flow(name="bootstrap-usda-commodities-test")
def bootstrap_flow():
    """
    Test flow: Populate USDA commodities.
    
    This flow adds commodity names to the usda_commodity table.
    Use this for testing before registering with Prefect.
    """
    commodities = ["ALMONDS", "CORN", "WHEAT", "SOYBEANS"]
    count = populate_usda_commodities(commodities)
    return count

print("✓ Flow defined: bootstrap_flow")

✓ Flow defined: bootstrap_flow


## Step 5: Run the Flow Locally

**Prerequisites:**
- Docker services must be running: `pixi run start-services`

In [32]:
# Step 5: Run the Flow Locally
print("=" * 50)
print("Running bootstrap flow locally...")
print("=" * 50)

try:
    result = bootstrap_flow()
    print(f"\n✓ Flow completed successfully!")
    print(f"✓ {result} commodities added")
except Exception as e:
    print(f"✗ Flow failed: {e}")
    import traceback
    traceback.print_exc()

Running bootstrap flow locally...


✗ Flow failed: cannot import name 'UsdaCommodity' from 'ca_biositing.datamodels.database' (c:\Users\meili\forked\ca-biositing\src/ca_biositing/datamodels\ca_biositing\datamodels\database.py)


Traceback (most recent call last):
  File "C:\Users\meili\AppData\Local\Temp\ipykernel_43944\4054871461.py", line 7, in <module>
    result = bootstrap_flow()
  File "c:\Users\meili\forked\ca-biositing\.pixi\envs\default\Lib\site-packages\prefect\flows.py", line 1713, in __call__
    return run_flow(
        flow=self,
    ...<2 lines>...
        return_type=return_type,
    )
  File "c:\Users\meili\forked\ca-biositing\.pixi\envs\default\Lib\site-packages\prefect\flow_engine.py", line 1582, in run_flow
    ret_val = run_flow_sync(**kwargs)
  File "c:\Users\meili\forked\ca-biositing\.pixi\envs\default\Lib\site-packages\prefect\flow_engine.py", line 1427, in run_flow_sync
    return engine.state if return_type == "state" else engine.result()
                                                       ~~~~~~~~~~~~~^^
  File "c:\Users\meili\forked\ca-biositing\.pixi\envs\default\Lib\site-packages\prefect\flow_engine.py", line 363, in result
    raise self._raised
  File "c:\Users\meili\forked\c

## Step 6: Validate Results

Verify that commodities were actually created in the database.

In [33]:
# Step 6: Verify Commodities in Database
# Import inside cell to ensure access to database
try:
    from ca_biositing.datamodels.database import engine, UsdaCommodity
except ImportError:
    from src.ca_biositing.datamodels.database import engine, UsdaCommodity

print("\n" + "=" * 50)
print("Verifying commodities in database...")
print("=" * 50)

try:
    with Session(engine) as session:
        # Get all commodities
        commodities = session.exec(select(UsdaCommodity)).all()
        
        print(f"\n✓ Found {len(commodities)} commodities in database:\n")
        
        for commodity in commodities:
            print(f"  ID: {commodity.id} | Name: {commodity.name} | Source: {commodity.usda_source}")
        
        # Verify we can get them by name
        print("\n" + "=" * 50)
        print("Testing commodity lookup by name...")
        print("=" * 50)
        
        test_names = ["ALMONDS", "CORN", "WHEAT"]
        for name in test_names:
            commodity = session.exec(
                select(UsdaCommodity).where(UsdaCommodity.name == name)
            ).first()
            
            if commodity:
                print(f"✓ Found: {name} (ID: {commodity.id})")
            else:
                print(f"✗ Not found: {name}")

except Exception as e:
    print(f"✗ Error verifying commodities: {e}")
    import traceback
    traceback.print_exc()

ModuleNotFoundError: No module named 'src.ca_biositing.datamodels.database'

### ✅ Phase 7: Testing (Step-by-Step)

**Important**: Tests 1-3 are INDEPENDENT LOCAL TESTS (separate from Phase 2-5
"test locally")

- Test 1: Can code read database?
- Test 2: Can code call USDA API?
- Test 3: Does extract function work?
- Test 4: Does full Docker pipeline work?

#### Test 1: Database Utility (Do Mappings Exist?)

In [34]:
# Run locally (NOT in Docker)
from src.ca_biositing.pipeline.ca_biositing.pipeline.utils.commodity_mapper import get_mapped_commodity_ids

ids = get_mapped_commodity_ids()
print(f"IDs: {ids}")

Error querying mapped commodities: (psycopg2.OperationalError) could not translate host name "db" to address: Name or service not known

(Background on this error at: https://sqlalche.me/e/20/e3q8)
IDs: None


**Expected**: `IDs: [2, 5]` (or whatever IDs you mapped in Phase 6) **If 0 or
None**: Go back to Phase 6, re-run verification query

#### Test 2: USDA API (Can We Fetch Data?)

In [None]:
# Run locally
from src.ca_biositing.pipeline.ca_biositing.pipeline.utils.usda_nass_to_pandas import usda_nass_to_df

df = usda_nass_to_df(
    api_key="your_api_key",
    state="CA",
    commodity_ids=[2, 5],
    year=2023
)
print(f"Rows: {len(df)}")
print(df.head())
```

NameError: name 'CORN' is not defined

**Expected**: DataFrame with 50+ rows, columns like 'Commodity', 'Value', 'Year'
**If error**: Check API key is correct, internet connection works

#### Test 3: Extract Function (Full Local Test)

In [None]:
# Run locally
from src.ca_biositing.pipeline.ca_biositing.pipeline.etl.extract.usda_census_survey import extract

df = extract()
print(f"Rows: {len(df)}")
print(df.head())

**Expected**: DataFrame with data (row count depends on commodities mapped) **If
error**: Tests 1-2 must pass first

#### Test 4: Full Docker Pipeline

In [None]:
pixi run deploy
pixi run run-etl
pixi run service-logs

**Expected**: See "Task finished successfully" in logs, check
http://0.0.0.0:4200 for green checkmarks **If error**: See Phase 8
troubleshooting

## ✅ Next Steps

**If all tests passed:**
- The flow works correctly
- You can now register it with Prefect
- Add it to deployment configuration

**Next in Phase 5 of USDA_IMPLEMENTATION_CHECKLIST.md:**
- Register the flow with Prefect
- Add to deployment configuration
- Deploy: `pixi run deploy`