# Update gridVeg Point Intercepts in BigQuery

This notebook appends new point intercept data to BigQuery tables from a CSV file stored in GCS.

**Operation**: APPEND new rows (not replace entire table)

**Target Tables**:
- `gridVeg_point_intercept_vegetation` - vegetation intercept data (4 height layers)
- `gridVeg_point_intercept_ground` - ground cover intercept data

## Requirements
- Google Cloud credentials configured
- Configuration file: copy `config.example.yml` to `config.yml` and fill in your values
- Required packages: google-cloud-bigquery, google-cloud-storage, pandas, pyyaml


In [1]:
# Import required libraries
import yaml
import pandas as pd
from pathlib import Path
from google.cloud import bigquery
from google.cloud import storage
from datetime import datetime

print("Libraries imported successfully")


Libraries imported successfully


In [2]:
# Load configuration from YAML file
config_path = Path("../config.yml")

if not config_path.exists():
    raise FileNotFoundError(
        f"Configuration file not found: {config_path}\n"
        "Please copy config.example.yml to config.yml and fill in your values."
    )

with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

# Extract configuration values for gridVeg point intercepts
GCS_CSV_URL = config['gridveg_point_intercepts']['gcs']['csv_url']
BACKUP_BUCKET = config['gridveg_point_intercepts']['gcs'].get('backup_bucket')
BACKUP_PREFIX = config['gridveg_point_intercepts']['gcs'].get('backup_prefix', 'backups/gridveg_point_intercepts')

BQ_TABLE_VEGETATION = config['gridveg_point_intercepts']['bigquery']['table_vegetation']
BQ_TABLE_GROUND = config['gridveg_point_intercepts']['bigquery']['table_ground']
BQ_PROJECT = config['gridveg_point_intercepts']['bigquery'].get('project')

# Verify required config values
if not GCS_CSV_URL or GCS_CSV_URL.startswith('gs://your-'):
    raise ValueError("Please configure gridveg_point_intercepts.gcs.csv_url in config.yml")
if not BQ_TABLE_VEGETATION or 'your-project' in BQ_TABLE_VEGETATION:
    raise ValueError("Please configure gridveg_point_intercepts.bigquery.table_vegetation in config.yml")
if not BQ_TABLE_GROUND or 'your-project' in BQ_TABLE_GROUND:
    raise ValueError("Please configure gridveg_point_intercepts.bigquery.table_ground in config.yml")

print("✓ Configuration loaded successfully")
print(f"  CSV URL: {GCS_CSV_URL[:60]}..." if len(GCS_CSV_URL) > 60 else f"  CSV URL: {GCS_CSV_URL}")
print(f"  Vegetation Table: {BQ_TABLE_VEGETATION}")
print(f"  Ground Table: {BQ_TABLE_GROUND}")
print(f"  Backup: gs://{BACKUP_BUCKET}/{BACKUP_PREFIX}" if BACKUP_BUCKET else "  Backup: Not configured")


✓ Configuration loaded successfully
  CSV URL: gs://mpg-data-warehouse/gridVeg/src/2025/2025-09-18_gridVeg_...
  Vegetation Table: mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_vegetation
  Ground Table: mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_ground
  Backup: gs://mpg-data-warehouse/gridVeg/bak


In [3]:
# Initialize clients
bq_client = bigquery.Client(project=BQ_PROJECT) if BQ_PROJECT else bigquery.Client()
storage_client = storage.Client(project=BQ_PROJECT) if BQ_PROJECT else storage.Client()

print(f"✓ Clients initialized")
print(f"  Project: {bq_client.project}")


✓ Clients initialized
  Project: mpg-data-warehouse


In [4]:
# Read CSV from GCS (new data)
print("Reading CSV from GCS...")
df_new = pd.read_csv(GCS_CSV_URL)

print(f"✓ CSV loaded successfully:")
print(f"  Rows: {len(df_new)}")
print(f"  Columns: {list(df_new.columns)}")
print(f"\nFirst few rows:")
df_new.head()


Reading CSV from GCS...
✓ CSV loaded successfully:
  Rows: 7799
  Columns: ['Survey Data::__kp_Survey', 'Survey Data::_kf_Site', 'Survey Data::SurveyDate', 'Survey Data::SurveyYear', 'PointTrans', '_kf_Hit1_serial', 'Height', '_kf_Hit2_serial', '_kf_Hit3_serial', '_kf_Hit4_serial', 'GroundCover']

First few rows:


Unnamed: 0,Survey Data::__kp_Survey,Survey Data::_kf_Site,Survey Data::SurveyDate,Survey Data::SurveyYear,PointTrans,_kf_Hit1_serial,Height,_kf_Hit2_serial,_kf_Hit3_serial,_kf_Hit4_serial,GroundCover
0,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N1,529,,,,,L
1,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N2,360,,,,,L
2,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N3,529,,405.0,,,L
3,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N4,529,,,,,WDL
4,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N5,334,,,,,WDL


## Transform CSV Data

The source CSV will be transformed into two separate datasets:

### Vegetation Table Transformations
- Rename columns to match BigQuery schema
- Convert date format from mm/dd/yy to ISO format (YYYY-MM-DD)
- Filter out 2010 records (per requirements)
- Select columns: survey_ID, grid_point, date, year, transect_point, height_intercept_1, intercept_1-4

### Ground Table Transformations
- Rename columns to match BigQuery schema  
- Convert date format from mm/dd/yy to ISO format (YYYY-MM-DD)
- Filter out 2010 records (per requirements)
- Select columns: survey_ID, grid_point, date, year, transect_point, intercept_1, intercept_ground_code


In [5]:
# Define column mapping for VEGETATION table
column_mapping_vegetation = {
    'Survey Data::__kp_Survey': 'survey_ID',
    'Survey Data::_kf_Site': 'grid_point',
    'Survey Data::SurveyDate': 'date',
    'Survey Data::SurveyYear': 'year',
    'PointTrans': 'transect_point',
    'Height': 'height_intercept_1',
    '_kf_Hit1_serial': 'intercept_1',
    '_kf_Hit2_serial': 'intercept_2',
    '_kf_Hit3_serial': 'intercept_3',
    '_kf_Hit4_serial': 'intercept_4'
}

print("Vegetation table column mapping:")
for csv_col, bq_col in column_mapping_vegetation.items():
    print(f"  {csv_col:30s} → {bq_col}")


Vegetation table column mapping:
  Survey Data::__kp_Survey       → survey_ID
  Survey Data::_kf_Site          → grid_point
  Survey Data::SurveyDate        → date
  Survey Data::SurveyYear        → year
  PointTrans                     → transect_point
  Height                         → height_intercept_1
  _kf_Hit1_serial                → intercept_1
  _kf_Hit2_serial                → intercept_2
  _kf_Hit3_serial                → intercept_3
  _kf_Hit4_serial                → intercept_4


In [6]:
# Define column mapping for GROUND table
column_mapping_ground = {
    'Survey Data::__kp_Survey': 'survey_ID',
    'Survey Data::_kf_Site': 'grid_point',
    'Survey Data::SurveyDate': 'date',
    'Survey Data::SurveyYear': 'year',
    'PointTrans': 'transect_point',
    '_kf_Hit1_serial': 'intercept_1',
    'GroundCover': 'intercept_ground_code'
}

print("Ground table column mapping:")
for csv_col, bq_col in column_mapping_ground.items():
    print(f"  {csv_col:30s} → {bq_col}")


Ground table column mapping:
  Survey Data::__kp_Survey       → survey_ID
  Survey Data::_kf_Site          → grid_point
  Survey Data::SurveyDate        → date
  Survey Data::SurveyYear        → year
  PointTrans                     → transect_point
  _kf_Hit1_serial                → intercept_1
  GroundCover                    → intercept_ground_code


In [7]:
# Transform data for VEGETATION table
print("=" * 60)
print("TRANSFORMING DATA FOR VEGETATION TABLE")
print("=" * 60)

# Select and rename columns
df_vegetation = df_new[list(column_mapping_vegetation.keys())].copy()
df_vegetation = df_vegetation.rename(columns=column_mapping_vegetation)

print(f"\n✓ Columns renamed")
print(f"  Transformed columns: {list(df_vegetation.columns)}")

# Convert date from m/d/yy to proper datetime/date format
df_vegetation['date'] = pd.to_datetime(df_vegetation['date'], format='%m/%d/%y').dt.date

print(f"✓ Date format converted to date type")
print(f"  Sample dates: {df_vegetation['date'].head(3).tolist()}")

# Filter out 2010 records
rows_before = len(df_vegetation)
df_vegetation = df_vegetation[df_vegetation['year'] != 2010].copy()
rows_after = len(df_vegetation)

print(f"\n✓ Filtered out 2010 records")
print(f"  Rows before: {rows_before}")
print(f"  Rows after:  {rows_after}")
print(f"  Removed:     {rows_before - rows_after}")

print(f"\nVegetation data preview:")
df_vegetation.head()


TRANSFORMING DATA FOR VEGETATION TABLE

✓ Columns renamed
  Transformed columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'height_intercept_1', 'intercept_1', 'intercept_2', 'intercept_3', 'intercept_4']
✓ Date format converted to date type
  Sample dates: [datetime.date(2025, 5, 21), datetime.date(2025, 5, 21), datetime.date(2025, 5, 21)]

✓ Filtered out 2010 records
  Rows before: 7799
  Rows after:  7799
  Removed:     0

Vegetation data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,height_intercept_1,intercept_1,intercept_2,intercept_3,intercept_4
0,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N1,,529,,,
1,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N2,,360,,,
2,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N3,,529,405.0,,
3,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N4,,529,,,
4,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N5,,334,,,


In [8]:
# Transform data for GROUND table
print("=" * 60)
print("TRANSFORMING DATA FOR GROUND TABLE")
print("=" * 60)

# Select and rename columns
df_ground = df_new[list(column_mapping_ground.keys())].copy()
df_ground = df_ground.rename(columns=column_mapping_ground)

print(f"\n✓ Columns renamed")
print(f"  Transformed columns: {list(df_ground.columns)}")

# Convert date from m/d/yy to proper datetime/date format
df_ground['date'] = pd.to_datetime(df_ground['date'], format='%m/%d/%y').dt.date

print(f"✓ Date format converted to date type")
print(f"  Sample dates: {df_ground['date'].head(3).tolist()}")

# Filter out 2010 records
rows_before = len(df_ground)
df_ground = df_ground[df_ground['year'] != 2010].copy()
rows_after = len(df_ground)

print(f"\n✓ Filtered out 2010 records")
print(f"  Rows before: {rows_before}")
print(f"  Rows after:  {rows_after}")
print(f"  Removed:     {rows_before - rows_after}")

print(f"\nGround data preview:")
df_ground.head()


TRANSFORMING DATA FOR GROUND TABLE

✓ Columns renamed
  Transformed columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'intercept_1', 'intercept_ground_code']
✓ Date format converted to date type
  Sample dates: [datetime.date(2025, 5, 21), datetime.date(2025, 5, 21), datetime.date(2025, 5, 21)]

✓ Filtered out 2010 records
  Rows before: 7799
  Rows after:  7799
  Removed:     0

Ground data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,intercept_1,intercept_ground_code
0,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N1,529,L
1,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N2,360,L
2,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N3,529,L
3,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N4,529,WDL
4,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N5,334,WDL


## Read Existing BigQuery Tables

Load the current data from both BigQuery tables to compare with the new data.


In [9]:
# Read existing VEGETATION table from BigQuery
print(f"Reading existing VEGETATION data from {BQ_TABLE_VEGETATION}...")
query = f"SELECT * FROM `{BQ_TABLE_VEGETATION}`"

try:
    df_existing_vegetation = bq_client.query(query).to_dataframe()
    print(f"✓ Existing vegetation table loaded:")
    print(f"  Rows: {len(df_existing_vegetation)}")
    print(f"  Columns: {list(df_existing_vegetation.columns)}")
    print(f"\nExisting vegetation data preview:")
    display(df_existing_vegetation.head())
except Exception as e:
    print(f"⚠ Error reading table: {e}")
    print("  This may be expected if the table doesn't exist yet.")
    df_existing_vegetation = None


Reading existing VEGETATION data from mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_vegetation...
✓ Existing vegetation table loaded:
  Rows: 291045
  Columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'height_intercept_1', 'intercept_1', 'intercept_2', 'intercept_3', 'intercept_4']

Existing vegetation data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,height_intercept_1,intercept_1,intercept_2,intercept_3,intercept_4
0,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,N3,,525,12,,
1,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,N23,,525,5,529.0,
2,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,N30,,525,5,,
3,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,E5,,525,67,,
4,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,E17,,525,12,,


In [10]:
# Read existing GROUND table from BigQuery
print(f"Reading existing GROUND data from {BQ_TABLE_GROUND}...")
query = f"SELECT * FROM `{BQ_TABLE_GROUND}`"

try:
    df_existing_ground = bq_client.query(query).to_dataframe()
    print(f"✓ Existing ground table loaded:")
    print(f"  Rows: {len(df_existing_ground)}")
    print(f"  Columns: {list(df_existing_ground.columns)}")
    print(f"\nExisting ground data preview:")
    display(df_existing_ground.head())
except Exception as e:
    print(f"⚠ Error reading table: {e}")
    print("  This may be expected if the table doesn't exist yet.")
    df_existing_ground = None


Reading existing GROUND data from mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_ground...
✓ Existing ground table loaded:
  Rows: 291045
  Columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'intercept_1', 'intercept_ground_code']

Existing ground data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,intercept_1,intercept_ground_code
0,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E29,360,
1,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E41,360,
2,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E43,360,
3,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E46,360,
4,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E44,360,


## Compare New vs Existing Data

Identify which rows in the new data are not already in the existing tables.

**Unique identifier**: `survey_ID + transect_point` (each survey has multiple transect points)


In [11]:
# Compare VEGETATION datasets
print("=" * 60)
print("COMPARING VEGETATION DATA")
print("=" * 60)

if df_existing_vegetation is not None:
    print(f"\nRow count:")
    print(f"  Existing: {len(df_existing_vegetation)}")
    print(f"  New CSV:  {len(df_vegetation)}")
    
    # Column comparison
    existing_cols = set(df_existing_vegetation.columns)
    new_cols = set(df_vegetation.columns)
    
    if existing_cols == new_cols:
        print(f"\n✓ Columns match ({len(new_cols)} columns)")
    else:
        print("\n⚠ Column differences detected:")
        if new_cols - existing_cols:
            print(f"  New columns: {new_cols - existing_cols}")
        if existing_cols - new_cols:
            print(f"  Missing columns: {existing_cols - new_cols}")
    
    # Create composite key for comparison
    df_existing_vegetation['_composite_key'] = (
        df_existing_vegetation['survey_ID'].astype(str) + '|' + 
        df_existing_vegetation['transect_point'].astype(str)
    )
    df_vegetation['_composite_key'] = (
        df_vegetation['survey_ID'].astype(str) + '|' + 
        df_vegetation['transect_point'].astype(str)
    )
    
    existing_keys = set(df_existing_vegetation['_composite_key'])
    new_keys = set(df_vegetation['_composite_key'])
    
    # Find records to append
    keys_to_append = new_keys - existing_keys
    
    if keys_to_append:
        df_vegetation_to_append = df_vegetation[df_vegetation['_composite_key'].isin(keys_to_append)].copy()
        # Drop the temporary composite key
        df_vegetation_to_append = df_vegetation_to_append.drop(columns=['_composite_key'])
        
        print(f"\n✓ Found {len(df_vegetation_to_append)} new vegetation records to append")
        
        # Show year breakdown
        print(f"\nNew records by year:")
        year_counts = df_vegetation_to_append['year'].value_counts().sort_index()
        for year, count in year_counts.items():
            print(f"  {year}: {count} records")
    else:
        df_vegetation_to_append = None
        print("\n⚠ No new records found - all keys already exist in table")
    
    # Check for duplicates
    duplicate_keys = existing_keys & new_keys
    if duplicate_keys:
        print(f"\n⚠ Warning: {len(duplicate_keys)} records already exist in table")
        print(f"  These will be skipped during append.")
else:
    # No existing table, all records are new
    df_vegetation_to_append = df_vegetation.copy()
    print(f"✓ No existing table - will create new table with {len(df_vegetation_to_append)} records")


COMPARING VEGETATION DATA

Row count:
  Existing: 291045
  New CSV:  7799

✓ Columns match (10 columns)

✓ Found 7799 new vegetation records to append

New records by year:
  2025: 7799 records


In [12]:
# Compare GROUND datasets
print("=" * 60)
print("COMPARING GROUND DATA")
print("=" * 60)

if df_existing_ground is not None:
    print(f"\nRow count:")
    print(f"  Existing: {len(df_existing_ground)}")
    print(f"  New CSV:  {len(df_ground)}")
    
    # Column comparison
    existing_cols = set(df_existing_ground.columns)
    new_cols = set(df_ground.columns)
    
    if existing_cols == new_cols:
        print(f"\n✓ Columns match ({len(new_cols)} columns)")
    else:
        print("\n⚠ Column differences detected:")
        if new_cols - existing_cols:
            print(f"  New columns: {new_cols - existing_cols}")
        if existing_cols - new_cols:
            print(f"  Missing columns: {existing_cols - new_cols}")
    
    # Create composite key for comparison
    df_existing_ground['_composite_key'] = (
        df_existing_ground['survey_ID'].astype(str) + '|' + 
        df_existing_ground['transect_point'].astype(str)
    )
    df_ground['_composite_key'] = (
        df_ground['survey_ID'].astype(str) + '|' + 
        df_ground['transect_point'].astype(str)
    )
    
    existing_keys = set(df_existing_ground['_composite_key'])
    new_keys = set(df_ground['_composite_key'])
    
    # Find records to append
    keys_to_append = new_keys - existing_keys
    
    if keys_to_append:
        df_ground_to_append = df_ground[df_ground['_composite_key'].isin(keys_to_append)].copy()
        # Drop the temporary composite key
        df_ground_to_append = df_ground_to_append.drop(columns=['_composite_key'])
        
        print(f"\n✓ Found {len(df_ground_to_append)} new ground records to append")
        
        # Show year breakdown
        print(f"\nNew records by year:")
        year_counts = df_ground_to_append['year'].value_counts().sort_index()
        for year, count in year_counts.items():
            print(f"  {year}: {count} records")
    else:
        df_ground_to_append = None
        print("\n⚠ No new records found - all keys already exist in table")
    
    # Check for duplicates
    duplicate_keys = existing_keys & new_keys
    if duplicate_keys:
        print(f"\n⚠ Warning: {len(duplicate_keys)} records already exist in table")
        print(f"  These will be skipped during append.")
else:
    # No existing table, all records are new
    df_ground_to_append = df_ground.copy()
    print(f"✓ No existing table - will create new table with {len(df_ground_to_append)} records")


COMPARING GROUND DATA

Row count:
  Existing: 291045
  New CSV:  7799

✓ Columns match (7 columns)

✓ Found 7799 new ground records to append

New records by year:
  2025: 7799 records


## Diagnose Data Type Issues

Before attempting upload, let's understand the data types and potential issues.


In [13]:
# Check current data types in transformed dataframes
print("=" * 60)
print("VEGETATION DATA TYPES")
print("=" * 60)
print(df_vegetation.dtypes)
print(f"\nSample date value: {df_vegetation['date'].iloc[0]}")
print(f"Date value type: {type(df_vegetation['date'].iloc[0])}")

print("\n" + "=" * 60)
print("GROUND DATA TYPES")
print("=" * 60)
print(df_ground.dtypes)

# Check BigQuery schema expectations
print("\n" + "=" * 60)
print("BIGQUERY SCHEMA EXPECTATIONS")
print("=" * 60)

table_veg = bq_client.get_table(BQ_TABLE_VEGETATION)
print("\nVEGETATION table schema:")
for field in table_veg.schema:
    print(f"  {field.name:25s}: {field.field_type:10s} (mode: {field.mode})")

table_ground = bq_client.get_table(BQ_TABLE_GROUND)
print("\nGROUND table schema:")
for field in table_ground.schema:
    print(f"  {field.name:25s}: {field.field_type:10s} (mode: {field.mode})")


VEGETATION DATA TYPES
survey_ID              object
grid_point              int64
date                   object
year                    int64
transect_point         object
height_intercept_1    float64
intercept_1             int64
intercept_2           float64
intercept_3           float64
intercept_4           float64
_composite_key         object
dtype: object

Sample date value: 2025-05-21
Date value type: <class 'datetime.date'>

GROUND DATA TYPES
survey_ID                object
grid_point                int64
date                     object
year                      int64
transect_point           object
intercept_1               int64
intercept_ground_code    object
_composite_key           object
dtype: object

BIGQUERY SCHEMA EXPECTATIONS

VEGETATION table schema:
  survey_ID                : STRING     (mode: NULLABLE)
  grid_point               : INTEGER    (mode: NULLABLE)
  date                     : DATE       (mode: NULLABLE)
  year                     : INTEGER    (mode:

### Understanding the PyArrow Error

The `ArrowInvalid: Got bytestring of length 8 (expected 16)` error occurs because:

1. **Python date objects are 8 bytes** (year, month, day encoded)
2. **PyArrow expects different sizes** depending on the target type
3. **The error message "expected 16"** suggests it's looking for a UUID/GUID (128 bits = 16 bytes)

**Common Causes:**
- **Date columns**: Python `date` objects → BigQuery `DATE` type (PyArrow struggles with this conversion)
- **Nullable integers**: Pandas `Int64` (nullable) → BigQuery `INTEGER` (PyArrow can't serialize the nullable type properly)
- **Mixed string/UUID columns**: If `survey_ID` contains both integers and UUIDs, type inference fails

**Solutions:**
1. **CSV Upload Method** (recommended): Upload as CSV, let BigQuery parse types
2. **Change BigQuery Schema**: Use `TIMESTAMP` instead of `DATE`, `FLOAT64` instead of `INTEGER` for nullable columns
3. **Fix Data Types**: Convert all columns to PyArrow-compatible types (strings, float64, regular int64)
