# Update gridVeg Point Intercepts in BigQuery

This notebook appends new point intercept data to BigQuery tables from a CSV file stored in GCS.

**Operation**: APPEND new rows (not replace entire table)

**Target Tables**:
- `gridVeg_point_intercept_vegetation` - vegetation intercept data (4 height layers)
- `gridVeg_point_intercept_ground` - ground cover intercept data

## Requirements
- Google Cloud credentials configured
- Configuration file: copy `config.example.yml` to `config.yml` and fill in your values
- Required packages: google-cloud-bigquery, google-cloud-storage, pandas, pyyaml


In [1]:
# Import required libraries
import yaml
import pandas as pd
from pathlib import Path
from google.cloud import bigquery
from google.cloud import storage
from datetime import datetime

print("Libraries imported successfully")


Libraries imported successfully


In [2]:
# Load configuration from YAML file
config_path = Path("../config.yml")

if not config_path.exists():
    raise FileNotFoundError(
        f"Configuration file not found: {config_path}\n"
        "Please copy config.example.yml to config.yml and fill in your values."
    )

with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

# Extract configuration values for gridVeg point intercepts
GCS_CSV_URL = config['gridveg_point_intercepts']['gcs']['csv_url']
BACKUP_BUCKET = config['gridveg_point_intercepts']['gcs'].get('backup_bucket')
BACKUP_PREFIX = config['gridveg_point_intercepts']['gcs'].get('backup_prefix', 'backups/gridveg_point_intercepts')

BQ_TABLE_VEGETATION = config['gridveg_point_intercepts']['bigquery']['table_vegetation']
BQ_TABLE_GROUND = config['gridveg_point_intercepts']['bigquery']['table_ground']
BQ_PROJECT = config['gridveg_point_intercepts']['bigquery'].get('project')

# Verify required config values
if not GCS_CSV_URL or GCS_CSV_URL.startswith('gs://your-'):
    raise ValueError("Please configure gridveg_point_intercepts.gcs.csv_url in config.yml")
if not BQ_TABLE_VEGETATION or 'your-project' in BQ_TABLE_VEGETATION:
    raise ValueError("Please configure gridveg_point_intercepts.bigquery.table_vegetation in config.yml")
if not BQ_TABLE_GROUND or 'your-project' in BQ_TABLE_GROUND:
    raise ValueError("Please configure gridveg_point_intercepts.bigquery.table_ground in config.yml")

print("✓ Configuration loaded successfully")
print(f"  CSV URL: {GCS_CSV_URL[:60]}..." if len(GCS_CSV_URL) > 60 else f"  CSV URL: {GCS_CSV_URL}")
print(f"  Vegetation Table: {BQ_TABLE_VEGETATION}")
print(f"  Ground Table: {BQ_TABLE_GROUND}")
print(f"  Backup: gs://{BACKUP_BUCKET}/{BACKUP_PREFIX}" if BACKUP_BUCKET else "  Backup: Not configured")


✓ Configuration loaded successfully
  CSV URL: gs://mpg-data-warehouse/gridVeg/src/2025/2025-09-18_gridVeg_...
  Vegetation Table: mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_vegetation
  Ground Table: mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_ground
  Backup: gs://mpg-data-warehouse/gridVeg/bak


In [3]:
# Initialize clients
bq_client = bigquery.Client(project=BQ_PROJECT) if BQ_PROJECT else bigquery.Client()
storage_client = storage.Client(project=BQ_PROJECT) if BQ_PROJECT else storage.Client()

print(f"✓ Clients initialized")
print(f"  Project: {bq_client.project}")


✓ Clients initialized
  Project: mpg-data-warehouse


In [4]:
# Read CSV from GCS (new data)
print("Reading CSV from GCS...")
df_new = pd.read_csv(GCS_CSV_URL)

print(f"✓ CSV loaded successfully:")
print(f"  Rows: {len(df_new)}")
print(f"  Columns: {list(df_new.columns)}")
print(f"\nFirst few rows:")
df_new.head()


Reading CSV from GCS...
✓ CSV loaded successfully:
  Rows: 7799
  Columns: ['Survey Data::__kp_Survey', 'Survey Data::_kf_Site', 'Survey Data::SurveyDate', 'Survey Data::SurveyYear', 'PointTrans', '_kf_Hit1_serial', 'Height', '_kf_Hit2_serial', '_kf_Hit3_serial', '_kf_Hit4_serial', 'GroundCover']

First few rows:


Unnamed: 0,Survey Data::__kp_Survey,Survey Data::_kf_Site,Survey Data::SurveyDate,Survey Data::SurveyYear,PointTrans,_kf_Hit1_serial,Height,_kf_Hit2_serial,_kf_Hit3_serial,_kf_Hit4_serial,GroundCover
0,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N1,529,,,,,L
1,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N2,360,,,,,L
2,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N3,529,,405.0,,,L
3,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N4,529,,,,,WDL
4,B45700C5-D391-4679-8579-217DCB1385A2,227,5/21/25,2025,N5,334,,,,,WDL


## Transform CSV Data

The source CSV will be transformed into two separate datasets:

### Vegetation Table Transformations
- Rename columns to match BigQuery schema
- Convert date format from mm/dd/yy to ISO format (YYYY-MM-DD)
- Filter out 2010 records (per requirements)
- Select columns: survey_ID, grid_point, date, year, transect_point, height_intercept_1, intercept_1-4

### Ground Table Transformations
- Rename columns to match BigQuery schema  
- Convert date format from mm/dd/yy to ISO format (YYYY-MM-DD)
- Filter out 2010 records (per requirements)
- Select columns: survey_ID, grid_point, date, year, transect_point, intercept_1, intercept_ground_code


In [5]:
# Define column mapping for VEGETATION table
column_mapping_vegetation = {
    'Survey Data::__kp_Survey': 'survey_ID',
    'Survey Data::_kf_Site': 'grid_point',
    'Survey Data::SurveyDate': 'date',
    'Survey Data::SurveyYear': 'year',
    'PointTrans': 'transect_point',
    'Height': 'height_intercept_1',
    '_kf_Hit1_serial': 'intercept_1',
    '_kf_Hit2_serial': 'intercept_2',
    '_kf_Hit3_serial': 'intercept_3',
    '_kf_Hit4_serial': 'intercept_4'
}

print("Vegetation table column mapping:")
for csv_col, bq_col in column_mapping_vegetation.items():
    print(f"  {csv_col:30s} → {bq_col}")


Vegetation table column mapping:
  Survey Data::__kp_Survey       → survey_ID
  Survey Data::_kf_Site          → grid_point
  Survey Data::SurveyDate        → date
  Survey Data::SurveyYear        → year
  PointTrans                     → transect_point
  Height                         → height_intercept_1
  _kf_Hit1_serial                → intercept_1
  _kf_Hit2_serial                → intercept_2
  _kf_Hit3_serial                → intercept_3
  _kf_Hit4_serial                → intercept_4


In [6]:
# Define column mapping for GROUND table
column_mapping_ground = {
    'Survey Data::__kp_Survey': 'survey_ID',
    'Survey Data::_kf_Site': 'grid_point',
    'Survey Data::SurveyDate': 'date',
    'Survey Data::SurveyYear': 'year',
    'PointTrans': 'transect_point',
    '_kf_Hit1_serial': 'intercept_1',
    'GroundCover': 'intercept_ground_code'
}

print("Ground table column mapping:")
for csv_col, bq_col in column_mapping_ground.items():
    print(f"  {csv_col:30s} → {bq_col}")


Ground table column mapping:
  Survey Data::__kp_Survey       → survey_ID
  Survey Data::_kf_Site          → grid_point
  Survey Data::SurveyDate        → date
  Survey Data::SurveyYear        → year
  PointTrans                     → transect_point
  _kf_Hit1_serial                → intercept_1
  GroundCover                    → intercept_ground_code


In [7]:
# Transform data for VEGETATION table
print("=" * 60)
print("TRANSFORMING DATA FOR VEGETATION TABLE")
print("=" * 60)

# Select and rename columns
df_vegetation = df_new[list(column_mapping_vegetation.keys())].copy()
df_vegetation = df_vegetation.rename(columns=column_mapping_vegetation)

print(f"\n✓ Columns renamed")
print(f"  Transformed columns: {list(df_vegetation.columns)}")

# Convert date from m/d/yy to YYYY-MM-DD string format for CSV upload
df_vegetation['date'] = pd.to_datetime(df_vegetation['date'], format='%m/%d/%y').dt.strftime('%Y-%m-%d')

print(f"✓ Date format converted to YYYY-MM-DD string")
print(f"  Sample dates: {df_vegetation['date'].head(3).tolist()}")

# Filter out 2010 records
rows_before = len(df_vegetation)
df_vegetation = df_vegetation[df_vegetation['year'] != 2010].copy()
rows_after = len(df_vegetation)

print(f"\n✓ Filtered out 2010 records")
print(f"  Rows before: {rows_before}")
print(f"  Rows after:  {rows_after}")
print(f"  Removed:     {rows_before - rows_after}")

print(f"\nVegetation data preview:")
df_vegetation.head()


TRANSFORMING DATA FOR VEGETATION TABLE

✓ Columns renamed
  Transformed columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'height_intercept_1', 'intercept_1', 'intercept_2', 'intercept_3', 'intercept_4']
✓ Date format converted to YYYY-MM-DD string
  Sample dates: ['2025-05-21', '2025-05-21', '2025-05-21']

✓ Filtered out 2010 records
  Rows before: 7799
  Rows after:  7799
  Removed:     0

Vegetation data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,height_intercept_1,intercept_1,intercept_2,intercept_3,intercept_4
0,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N1,,529,,,
1,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N2,,360,,,
2,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N3,,529,405.0,,
3,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N4,,529,,,
4,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N5,,334,,,


In [8]:
# Transform data for GROUND table
print("=" * 60)
print("TRANSFORMING DATA FOR GROUND TABLE")
print("=" * 60)

# Select and rename columns
df_ground = df_new[list(column_mapping_ground.keys())].copy()
df_ground = df_ground.rename(columns=column_mapping_ground)

print(f"\n✓ Columns renamed")
print(f"  Transformed columns: {list(df_ground.columns)}")

# Convert date from m/d/yy to YYYY-MM-DD string format for CSV upload
df_ground['date'] = pd.to_datetime(df_ground['date'], format='%m/%d/%y').dt.strftime('%Y-%m-%d')

print(f"✓ Date format converted to YYYY-MM-DD string")
print(f"  Sample dates: {df_ground['date'].head(3).tolist()}")

# Filter out 2010 records
rows_before = len(df_ground)
df_ground = df_ground[df_ground['year'] != 2010].copy()
rows_after = len(df_ground)

print(f"\n✓ Filtered out 2010 records")
print(f"  Rows before: {rows_before}")
print(f"  Rows after:  {rows_after}")
print(f"  Removed:     {rows_before - rows_after}")

print(f"\nGround data preview:")
df_ground.head()


TRANSFORMING DATA FOR GROUND TABLE

✓ Columns renamed
  Transformed columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'intercept_1', 'intercept_ground_code']
✓ Date format converted to YYYY-MM-DD string
  Sample dates: ['2025-05-21', '2025-05-21', '2025-05-21']

✓ Filtered out 2010 records
  Rows before: 7799
  Rows after:  7799
  Removed:     0

Ground data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,intercept_1,intercept_ground_code
0,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N1,529,L
1,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N2,360,L
2,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N3,529,L
3,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N4,529,WDL
4,B45700C5-D391-4679-8579-217DCB1385A2,227,2025-05-21,2025,N5,334,WDL


## Read Existing BigQuery Tables

Load the current data from both BigQuery tables to compare with the new data.


In [9]:
# Read existing VEGETATION table from BigQuery
print(f"Reading existing VEGETATION data from {BQ_TABLE_VEGETATION}...")
query = f"SELECT * FROM `{BQ_TABLE_VEGETATION}`"

try:
    df_existing_vegetation = bq_client.query(query).to_dataframe()
    print(f"✓ Existing vegetation table loaded:")
    print(f"  Rows: {len(df_existing_vegetation)}")
    print(f"  Columns: {list(df_existing_vegetation.columns)}")
    print(f"\nExisting vegetation data preview:")
    display(df_existing_vegetation.head())
except Exception as e:
    print(f"⚠ Error reading table: {e}")
    print("  This may be expected if the table doesn't exist yet.")
    df_existing_vegetation = None


Reading existing VEGETATION data from mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_vegetation...
✓ Existing vegetation table loaded:
  Rows: 291045
  Columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'height_intercept_1', 'intercept_1', 'intercept_2', 'intercept_3', 'intercept_4']

Existing vegetation data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,height_intercept_1,intercept_1,intercept_2,intercept_3,intercept_4
0,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,N3,,525,12,,
1,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,N23,,525,5,529.0,
2,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,N30,,525,5,,
3,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,E5,,525,67,,
4,01E14610-A291-4929-98A9-687132272D70,306,2023-05-24,2023,E17,,525,12,,


In [10]:
# Read existing GROUND table from BigQuery
print(f"Reading existing GROUND data from {BQ_TABLE_GROUND}...")
query = f"SELECT * FROM `{BQ_TABLE_GROUND}`"

try:
    df_existing_ground = bq_client.query(query).to_dataframe()
    print(f"✓ Existing ground table loaded:")
    print(f"  Rows: {len(df_existing_ground)}")
    print(f"  Columns: {list(df_existing_ground.columns)}")
    print(f"\nExisting ground data preview:")
    display(df_existing_ground.head())
except Exception as e:
    print(f"⚠ Error reading table: {e}")
    print("  This may be expected if the table doesn't exist yet.")
    df_existing_ground = None


Reading existing GROUND data from mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_ground...
✓ Existing ground table loaded:
  Rows: 291045
  Columns: ['survey_ID', 'grid_point', 'date', 'year', 'transect_point', 'intercept_1', 'intercept_ground_code']

Existing ground data preview:


Unnamed: 0,survey_ID,grid_point,date,year,transect_point,intercept_1,intercept_ground_code
0,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E29,360,
1,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E41,360,
2,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E43,360,
3,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E46,360,
4,0B3C78C3-5612-4BC3-A14F-B4D7A7041C35,586,2022-05-16,2022,E44,360,


## Compare New vs Existing Data

Identify which rows in the new data are not already in the existing tables.

**Unique identifier**: `survey_ID + transect_point` (each survey has multiple transect points)


In [11]:
# Compare VEGETATION datasets
print("=" * 60)
print("COMPARING VEGETATION DATA")
print("=" * 60)

if df_existing_vegetation is not None:
    print(f"\nRow count:")
    print(f"  Existing: {len(df_existing_vegetation)}")
    print(f"  New CSV:  {len(df_vegetation)}")
    
    # Column comparison
    existing_cols = set(df_existing_vegetation.columns)
    new_cols = set(df_vegetation.columns)
    
    if existing_cols == new_cols:
        print(f"\n✓ Columns match ({len(new_cols)} columns)")
    else:
        print("\n⚠ Column differences detected:")
        if new_cols - existing_cols:
            print(f"  New columns: {new_cols - existing_cols}")
        if existing_cols - new_cols:
            print(f"  Missing columns: {existing_cols - new_cols}")
    
    # Create composite key for comparison
    df_existing_vegetation['_composite_key'] = (
        df_existing_vegetation['survey_ID'].astype(str) + '|' + 
        df_existing_vegetation['transect_point'].astype(str)
    )
    df_vegetation['_composite_key'] = (
        df_vegetation['survey_ID'].astype(str) + '|' + 
        df_vegetation['transect_point'].astype(str)
    )
    
    existing_keys = set(df_existing_vegetation['_composite_key'])
    new_keys = set(df_vegetation['_composite_key'])
    
    # Find records to append
    keys_to_append = new_keys - existing_keys
    
    if keys_to_append:
        df_vegetation_to_append = df_vegetation[df_vegetation['_composite_key'].isin(keys_to_append)].copy()
        # Drop the temporary composite key
        df_vegetation_to_append = df_vegetation_to_append.drop(columns=['_composite_key'])
        
        print(f"\n✓ Found {len(df_vegetation_to_append)} new vegetation records to append")
        
        # Show year breakdown
        print(f"\nNew records by year:")
        year_counts = df_vegetation_to_append['year'].value_counts().sort_index()
        for year, count in year_counts.items():
            print(f"  {year}: {count} records")
    else:
        df_vegetation_to_append = None
        print("\n⚠ No new records found - all keys already exist in table")
    
    # Check for duplicates
    duplicate_keys = existing_keys & new_keys
    if duplicate_keys:
        print(f"\n⚠ Warning: {len(duplicate_keys)} records already exist in table")
        print(f"  These will be skipped during append.")
else:
    # No existing table, all records are new
    df_vegetation_to_append = df_vegetation.copy()
    print(f"✓ No existing table - will create new table with {len(df_vegetation_to_append)} records")


COMPARING VEGETATION DATA

Row count:
  Existing: 291045
  New CSV:  7799

✓ Columns match (10 columns)

✓ Found 7799 new vegetation records to append

New records by year:
  2025: 7799 records


In [12]:
# Compare GROUND datasets
print("=" * 60)
print("COMPARING GROUND DATA")
print("=" * 60)

if df_existing_ground is not None:
    print(f"\nRow count:")
    print(f"  Existing: {len(df_existing_ground)}")
    print(f"  New CSV:  {len(df_ground)}")
    
    # Column comparison
    existing_cols = set(df_existing_ground.columns)
    new_cols = set(df_ground.columns)
    
    if existing_cols == new_cols:
        print(f"\n✓ Columns match ({len(new_cols)} columns)")
    else:
        print("\n⚠ Column differences detected:")
        if new_cols - existing_cols:
            print(f"  New columns: {new_cols - existing_cols}")
        if existing_cols - new_cols:
            print(f"  Missing columns: {existing_cols - new_cols}")
    
    # Create composite key for comparison
    df_existing_ground['_composite_key'] = (
        df_existing_ground['survey_ID'].astype(str) + '|' + 
        df_existing_ground['transect_point'].astype(str)
    )
    df_ground['_composite_key'] = (
        df_ground['survey_ID'].astype(str) + '|' + 
        df_ground['transect_point'].astype(str)
    )
    
    existing_keys = set(df_existing_ground['_composite_key'])
    new_keys = set(df_ground['_composite_key'])
    
    # Find records to append
    keys_to_append = new_keys - existing_keys
    
    if keys_to_append:
        df_ground_to_append = df_ground[df_ground['_composite_key'].isin(keys_to_append)].copy()
        # Drop the temporary composite key
        df_ground_to_append = df_ground_to_append.drop(columns=['_composite_key'])
        
        print(f"\n✓ Found {len(df_ground_to_append)} new ground records to append")
        
        # Show year breakdown
        print(f"\nNew records by year:")
        year_counts = df_ground_to_append['year'].value_counts().sort_index()
        for year, count in year_counts.items():
            print(f"  {year}: {count} records")
    else:
        df_ground_to_append = None
        print("\n⚠ No new records found - all keys already exist in table")
    
    # Check for duplicates
    duplicate_keys = existing_keys & new_keys
    if duplicate_keys:
        print(f"\n⚠ Warning: {len(duplicate_keys)} records already exist in table")
        print(f"  These will be skipped during append.")
else:
    # No existing table, all records are new
    df_ground_to_append = df_ground.copy()
    print(f"✓ No existing table - will create new table with {len(df_ground_to_append)} records")


COMPARING GROUND DATA

Row count:
  Existing: 291045
  New CSV:  7799

✓ Columns match (7 columns)

✓ Found 7799 new ground records to append

New records by year:
  2025: 7799 records


## Upload Data to BigQuery

Upload new records to BigQuery using CSV format:

1. **Prepare Data**: Convert all columns to strings (avoids pandas nullable integer → float conversion)
2. **Write to CSV**: Create temporary CSV files with string data
3. **Upload to BigQuery**: Load CSV files directly to BigQuery (BigQuery handles type parsing automatically)
4. **Cleanup**: Remove temporary files

This approach avoids PyArrow serialization issues when using `load_table_from_dataframe`.


In [None]:
# Prepare data for CSV upload by converting all columns to strings
# For float columns that should be integers, convert to int first (removing .0)
# BigQuery will handle the type conversions based on the existing table schema

print("Preparing data for CSV export...")

if df_vegetation_to_append is not None and len(df_vegetation_to_append) > 0:
    df_vegetation_csv = df_vegetation_to_append.copy()
    
    # Integer columns that may be float64 due to NaN
    int_cols = ['grid_point', 'year', 'intercept_1', 'intercept_2', 'intercept_3', 'intercept_4']
    
    # Convert float columns to int (fillna with empty string will happen in string conversion)
    for col in int_cols:
        if col in df_vegetation_csv.columns:
            # For NaN values, pandas will convert them to 'nan' string, which we'll replace with ''
            df_vegetation_csv[col] = df_vegetation_csv[col].apply(
                lambda x: str(int(x)) if pd.notna(x) else ''
            )
    
    # Convert remaining columns to strings
    for col in df_vegetation_csv.columns:
        if col not in int_cols:
            df_vegetation_csv[col] = df_vegetation_csv[col].astype(str).replace('nan', '').replace('<NA>', '')
    
    print(f"✓ Vegetation data prepared: {len(df_vegetation_csv)} rows, all columns as strings")
else:
    df_vegetation_csv = None
    print("✓ No vegetation data to prepare")

if df_ground_to_append is not None and len(df_ground_to_append) > 0:
    df_ground_csv = df_ground_to_append.copy()
    
    # Integer columns that may be float64 due to NaN
    int_cols = ['grid_point', 'year', 'intercept_1']
    
    # Convert float columns to int (fillna with empty string will happen in string conversion)
    for col in int_cols:
        if col in df_ground_csv.columns:
            df_ground_csv[col] = df_ground_csv[col].apply(
                lambda x: str(int(x)) if pd.notna(x) else ''
            )
    
    # Convert remaining columns to strings
    for col in df_ground_csv.columns:
        if col not in int_cols:
            df_ground_csv[col] = df_ground_csv[col].astype(str).replace('nan', '').replace('<NA>', '')
    
    print(f"✓ Ground data prepared: {len(df_ground_csv)} rows, all columns as strings")
else:
    df_ground_csv = None
    print("✓ No ground data to prepare")


Preparing data for CSV export...
✓ Vegetation data prepared: 7799 rows, all columns as strings
✓ Ground data prepared: 7799 rows, all columns as strings


In [14]:
# Upload VEGETATION data
print("=" * 60)
print("UPLOADING VEGETATION DATA")
print("=" * 60)

if df_vegetation_csv is not None and len(df_vegetation_csv) > 0:
    import tempfile
    import os
    
    print(f"\nRecords to upload: {len(df_vegetation_csv)}")
    print(f"Target table: {BQ_TABLE_VEGETATION}")
    
    # Write to temporary CSV file (all columns as strings)
    with tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False, newline='') as f:
        temp_csv_path = f.name
        df_vegetation_csv.to_csv(f, index=False)
    
    print(f"✓ Wrote {len(df_vegetation_csv)} rows to temporary CSV")
    
    # Configure load job
    job_config = bigquery.LoadJobConfig(
        source_format=bigquery.SourceFormat.CSV,
        skip_leading_rows=1,  # Skip header row
        write_disposition="WRITE_APPEND",  # Append to existing table
        autodetect=False,  # Use existing table schema
    )
    
    # Upload CSV file to BigQuery
    print(f"Uploading to BigQuery...")
    try:
        with open(temp_csv_path, 'rb') as source_file:
            load_job = bq_client.load_table_from_file(
                source_file,
                BQ_TABLE_VEGETATION,
                job_config=job_config
            )
        
        # Wait for job to complete
        load_job.result()
        
        print(f"\n✓ Vegetation upload completed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"  Rows loaded: {load_job.output_rows}")
        print(f"  Job ID: {load_job.job_id}")
        
    finally:
        # Clean up temporary file
        if os.path.exists(temp_csv_path):
            os.unlink(temp_csv_path)
            print(f"✓ Cleaned up temporary file")
else:
    print("\nNo new vegetation records to upload.")


UPLOADING VEGETATION DATA

Records to upload: 7799
Target table: mpg-data-warehouse.vegetation_point_intercept_gridVeg.gridVeg_point_intercept_vegetation
✓ Wrote 7799 rows to temporary CSV
Uploading to BigQuery...
✓ Cleaned up temporary file


BadRequest: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 762; errors: 100. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 762; errors: 100. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 762; errors: 100; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 4 byte_offset_to_start_of_line: 249 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 9 byte_offset_to_start_of_line: 594 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 18 byte_offset_to_start_of_line: 1217 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "334.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 24 byte_offset_to_start_of_line: 1636 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 25 byte_offset_to_start_of_line: 1710 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 33 byte_offset_to_start_of_line: 2266 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "361.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 43 byte_offset_to_start_of_line: 2961 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "334.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 45 byte_offset_to_start_of_line: 3104 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "361.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 54 byte_offset_to_start_of_line: 3727 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 73 byte_offset_to_start_of_line: 5036 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "361.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 107 byte_offset_to_start_of_line: 7381 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "361.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 129 byte_offset_to_start_of_line: 8900 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "334.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 138 byte_offset_to_start_of_line: 9526 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 144 byte_offset_to_start_of_line: 9945 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "82.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 150 byte_offset_to_start_of_line: 10363 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 171 byte_offset_to_start_of_line: 11811 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "197.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 181 byte_offset_to_start_of_line: 12505 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 182 byte_offset_to_start_of_line: 12579 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 194 byte_offset_to_start_of_line: 13412 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "334.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 238 byte_offset_to_start_of_line: 16443 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 240 byte_offset_to_start_of_line: 16586 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 249 byte_offset_to_start_of_line: 17211 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 271 byte_offset_to_start_of_line: 18723 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 301 byte_offset_to_start_of_line: 20797 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 306 byte_offset_to_start_of_line: 21143 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 355 byte_offset_to_start_of_line: 24517 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 357 byte_offset_to_start_of_line: 24658 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 363 byte_offset_to_start_of_line: 25073 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "82.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 367 byte_offset_to_start_of_line: 25353 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 407 byte_offset_to_start_of_line: 28113 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 409 byte_offset_to_start_of_line: 28254 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 411 byte_offset_to_start_of_line: 28395 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 427 byte_offset_to_start_of_line: 29504 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 428 byte_offset_to_start_of_line: 29578 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 431 byte_offset_to_start_of_line: 29790 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 447 byte_offset_to_start_of_line: 30899 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 449 byte_offset_to_start_of_line: 31042 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 454 byte_offset_to_start_of_line: 31390 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 458 byte_offset_to_start_of_line: 31667 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 462 byte_offset_to_start_of_line: 31945 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "411.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 465 byte_offset_to_start_of_line: 32156 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 469 byte_offset_to_start_of_line: 32436 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "5.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 470 byte_offset_to_start_of_line: 32507 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 473 byte_offset_to_start_of_line: 32719 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 474 byte_offset_to_start_of_line: 32793 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 477 byte_offset_to_start_of_line: 33005 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "411.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 497 byte_offset_to_start_of_line: 34388 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 498 byte_offset_to_start_of_line: 34462 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 502 byte_offset_to_start_of_line: 34743 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 517 byte_offset_to_start_of_line: 35774 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 524 byte_offset_to_start_of_line: 36262 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "265.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 532 byte_offset_to_start_of_line: 36819 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 550 byte_offset_to_start_of_line: 38065 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 552 byte_offset_to_start_of_line: 38208 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 558 byte_offset_to_start_of_line: 38621 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 562 byte_offset_to_start_of_line: 38899 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 564 byte_offset_to_start_of_line: 39042 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 595 byte_offset_to_start_of_line: 41186 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 596 byte_offset_to_start_of_line: 41260 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 606 byte_offset_to_start_of_line: 41947 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 614 byte_offset_to_start_of_line: 42491 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 629 byte_offset_to_start_of_line: 43513 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 634 byte_offset_to_start_of_line: 43856 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 635 byte_offset_to_start_of_line: 43929 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 643 byte_offset_to_start_of_line: 44478 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 651 byte_offset_to_start_of_line: 45026 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 661 byte_offset_to_start_of_line: 45698 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "82.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 662 byte_offset_to_start_of_line: 45770 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "82.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 672 byte_offset_to_start_of_line: 46454 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 674 byte_offset_to_start_of_line: 46595 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 678 byte_offset_to_start_of_line: 46872 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 679 byte_offset_to_start_of_line: 46945 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 681 byte_offset_to_start_of_line: 47086 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 682 byte_offset_to_start_of_line: 47159 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 683 byte_offset_to_start_of_line: 47232 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 684 byte_offset_to_start_of_line: 47305 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 689 byte_offset_to_start_of_line: 47650 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 695 byte_offset_to_start_of_line: 48063 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 698 byte_offset_to_start_of_line: 48272 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 699 byte_offset_to_start_of_line: 48345 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 701 byte_offset_to_start_of_line: 48486 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "525.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 711 byte_offset_to_start_of_line: 49162 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "411.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 717 byte_offset_to_start_of_line: 49574 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 720 byte_offset_to_start_of_line: 49783 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "525.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 721 byte_offset_to_start_of_line: 49856 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 728 byte_offset_to_start_of_line: 50336 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "82.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 731 byte_offset_to_start_of_line: 50542 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 732 byte_offset_to_start_of_line: 50614 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 734 byte_offset_to_start_of_line: 50755 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 735 byte_offset_to_start_of_line: 50828 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 737 byte_offset_to_start_of_line: 50969 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "82.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 738 byte_offset_to_start_of_line: 51046 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 739 byte_offset_to_start_of_line: 51118 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 740 byte_offset_to_start_of_line: 51191 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 741 byte_offset_to_start_of_line: 51263 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 743 byte_offset_to_start_of_line: 51403 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "405.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 751 byte_offset_to_start_of_line: 51952 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "525.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 761 byte_offset_to_start_of_line: 52628 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "529.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 762 byte_offset_to_start_of_line: 52701 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"; reason: invalid, message: Error while reading data, error message: Unable to parse; line_number: 763 byte_offset_to_start_of_line: 52773 column_index: 7 column_name: "intercept_2" column_type: INT64 value: "225.0"

In [None]:
# Upload GROUND data
print("=" * 60)
print("UPLOADING GROUND DATA")
print("=" * 60)

if df_ground_csv is not None and len(df_ground_csv) > 0:
    import tempfile
    import os
    
    print(f"\nRecords to upload: {len(df_ground_csv)}")
    print(f"Target table: {BQ_TABLE_GROUND}")
    
    # Write to temporary CSV file (all columns as strings)
    with tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False, newline='') as f:
        temp_csv_path = f.name
        df_ground_csv.to_csv(f, index=False)
    
    print(f"✓ Wrote {len(df_ground_csv)} rows to temporary CSV")
    
    # Configure load job
    job_config = bigquery.LoadJobConfig(
        source_format=bigquery.SourceFormat.CSV,
        skip_leading_rows=1,  # Skip header row
        write_disposition="WRITE_APPEND",  # Append to existing table
        autodetect=False,  # Use existing table schema
    )
    
    # Upload CSV file to BigQuery
    print(f"Uploading to BigQuery...")
    try:
        with open(temp_csv_path, 'rb') as source_file:
            load_job = bq_client.load_table_from_file(
                source_file,
                BQ_TABLE_GROUND,
                job_config=job_config
            )
        
        # Wait for job to complete
        load_job.result()
        
        print(f"\n✓ Ground upload completed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"  Rows loaded: {load_job.output_rows}")
        print(f"  Job ID: {load_job.job_id}")
        
    finally:
        # Clean up temporary file
        if os.path.exists(temp_csv_path):
            os.unlink(temp_csv_path)
            print(f"✓ Cleaned up temporary file")
else:
    print("\nNo new ground records to upload.")


## Verify Uploads

Query the tables to verify the data was uploaded correctly.


In [None]:
# Verify uploads by querying the tables
print("=" * 60)
print("VERIFICATION")
print("=" * 60)

# Verify VEGETATION table
print("\nVEGETATION Table:")
query_veg = f"SELECT COUNT(*) as total_rows, COUNT(DISTINCT year) as distinct_years FROM `{BQ_TABLE_VEGETATION}`"
result_veg = bq_client.query(query_veg).to_dataframe()
print(f"  Total rows: {result_veg['total_rows'].iloc[0]}")
print(f"  Distinct years: {result_veg['distinct_years'].iloc[0]}")

# Check 2025 data specifically
query_2025_veg = f"SELECT COUNT(*) as count_2025 FROM `{BQ_TABLE_VEGETATION}` WHERE year = 2025"
result_2025_veg = bq_client.query(query_2025_veg).to_dataframe()
print(f"  2025 rows: {result_2025_veg['count_2025'].iloc[0]}")

# Verify GROUND table
print("\nGROUND Table:")
query_ground = f"SELECT COUNT(*) as total_rows, COUNT(DISTINCT year) as distinct_years FROM `{BQ_TABLE_GROUND}`"
result_ground = bq_client.query(query_ground).to_dataframe()
print(f"  Total rows: {result_ground['total_rows'].iloc[0]}")
print(f"  Distinct years: {result_ground['distinct_years'].iloc[0]}")

# Check 2025 data specifically
query_2025_ground = f"SELECT COUNT(*) as count_2025 FROM `{BQ_TABLE_GROUND}` WHERE year = 2025"
result_2025_ground = bq_client.query(query_2025_ground).to_dataframe()
print(f"  2025 rows: {result_2025_ground['count_2025'].iloc[0]}")

# Data integrity check
if df_vegetation_to_append is not None and len(df_vegetation_to_append) > 0:
    expected_2025_veg = len(df_vegetation_to_append)
    actual_2025_veg = result_2025_veg['count_2025'].iloc[0]
    print(f"\n✓ Vegetation verification:")
    print(f"  Expected new 2025 rows: {expected_2025_veg}")
    print(f"  Actual 2025 rows in table: {actual_2025_veg}")
    
if df_ground_to_append is not None and len(df_ground_to_append) > 0:
    expected_2025_ground = len(df_ground_to_append)
    actual_2025_ground = result_2025_ground['count_2025'].iloc[0]
    print(f"\n✓ Ground verification:")
    print(f"  Expected new 2025 rows: {expected_2025_ground}")
    print(f"  Actual 2025 rows in table: {actual_2025_ground}")


## Summary

Complete summary of the upload operation.


In [None]:
# Generate summary report
print("=" * 60)
print("GRIDVEG POINT INTERCEPTS UPLOAD SUMMARY")
print("=" * 60)

print(f"\n📅 Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

print(f"\n📂 Source:")
print(f"  CSV: {GCS_CSV_URL.split('/')[-1]}")
print(f"  Location: {'/'.join(GCS_CSV_URL.split('/')[:-1])}")

print(f"\n🎯 Target Tables:")
print(f"  Vegetation: {BQ_TABLE_VEGETATION}")
print(f"  Ground:     {BQ_TABLE_GROUND}")
print(f"  Project:    {bq_client.project}")

print(f"\n📊 Data Changes:")

# Vegetation summary
if df_vegetation_to_append is not None and len(df_vegetation_to_append) > 0:
    print(f"\n  VEGETATION TABLE:")
    print(f"    Rows uploaded: {len(df_vegetation_to_append)}")
    if df_existing_vegetation is not None:
        print(f"    Previous total: {len(df_existing_vegetation)}")
        print(f"    New total: {len(df_existing_vegetation) + len(df_vegetation_to_append)}")
    
    # Year breakdown
    year_counts = df_vegetation_to_append['year'].value_counts().sort_index()
    print(f"\n    Uploaded records by year:")
    for year, count in year_counts.items():
        print(f"      {year}: {count} records")
else:
    print(f"\n  VEGETATION TABLE: No changes")

# Ground summary
if df_ground_to_append is not None and len(df_ground_to_append) > 0:
    print(f"\n  GROUND TABLE:")
    print(f"    Rows uploaded: {len(df_ground_to_append)}")
    if df_existing_ground is not None:
        print(f"    Previous total: {len(df_existing_ground)}")
        print(f"    New total: {len(df_existing_ground) + len(df_ground_to_append)}")
    
    # Year breakdown
    year_counts = df_ground_to_append['year'].value_counts().sort_index()
    print(f"\n    Uploaded records by year:")
    for year, count in year_counts.items():
        print(f"      {year}: {count} records")
else:
    print(f"\n  GROUND TABLE: No changes")

print(f"\n🔄 Transformations Applied:")
print(f"  ✓ Renamed columns to match BigQuery schema")
print(f"  ✓ Converted date format to YYYY-MM-DD")
print(f"  ✓ Filtered out 2010 records")
print(f"  ✓ Split data into vegetation and ground tables")

print(f"\n📤 Upload Method:")
print(f"  ✓ CSV upload (avoids PyArrow serialization issues)")
print(f"  ✓ BigQuery handles type parsing automatically")

has_changes = (df_vegetation_to_append is not None and len(df_vegetation_to_append) > 0) or \
              (df_ground_to_append is not None and len(df_ground_to_append) > 0)

if has_changes:
    print(f"\n✅ Upload completed successfully!")
else:
    print(f"\n✅ No changes needed - tables are up to date!")
print("=" * 60)
