# BigQuery Connection Setup for Belgian Brewery Project

This notebook establishes and tests the connection to Google BigQuery for the Belgian Brewery "Glide Template" Strategy Project.

## Prerequisites
- Google Cloud Project with BigQuery API enabled
- Google Cloud CLI (`gcloud`) installed OR Service account key JSON file
- Required Python packages installed

## Authentication Methods
1. **Google Cloud CLI** (Recommended for development)
2. **Service Account Key File** (Alternative method)

In [None]:
# Install required packages
# !pip install google-cloud-bigquery pandas db-dtypes python-dotenv geopy

# Verify gcloud installation
import subprocess
import sys

def check_gcloud_installation():
    try:
        result = subprocess.run(['gcloud', '--version'], capture_output=True, text=True)
        if result.returncode == 0:
            print("✅ Google Cloud CLI is installed:")
            print(result.stdout)
            return True
        else:
            print("❌ Google Cloud CLI not found")
            return False
    except FileNotFoundError:
        print("❌ Google Cloud CLI not installed")
        return False

gcloud_available = check_gcloud_installation()

## Import Required Libraries

In [None]:
# Import required libraries for BigQuery connection
import pandas as pd
from google.cloud import bigquery
import os
from google.oauth2 import service_account
import json
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

## Authentication Setup

### Method 1: Google Cloud CLI (Recommended)

If you don't have `gcloud` installed, follow these steps:

**macOS (using Homebrew):**
```bash
# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Google Cloud CLI
brew install google-cloud-sdk
```

**macOS/Linux (using curl):**
```bash
# Download and install
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
```

**After installation, authenticate:**
```bash
# Initialize gcloud and select your project
gcloud init

# Authenticate for application default credentials
gcloud auth application-default login
```

### Method 2: Service Account Key File

1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Navigate to IAM & Admin > Service Accounts
3. Create a new service account with BigQuery permissions
4. Download the JSON key file
5. Set the path in your `.env` file

In [None]:
# Authentication logic
PROJECT_ID = None
credentials = None

# Method 1: Try Google Cloud CLI first
if gcloud_available:
    try:
        # Get current project from gcloud
        result = subprocess.run(['gcloud', 'config', 'get-value', 'project'], 
                              capture_output=True, text=True)
        if result.returncode == 0 and result.stdout.strip():
            PROJECT_ID = result.stdout.strip()
            print(f"✅ Using gcloud authentication")
            print(f"✅ Project ID from gcloud: {PROJECT_ID}")
            # Use application default credentials
            client = bigquery.Client(project=PROJECT_ID)
        else:
            print("❌ No project set in gcloud. Run: gcloud config set project YOUR_PROJECT_ID")
    except Exception as e:
        print(f"❌ Error with gcloud authentication: {e}")

# Method 2: Fall back to service account key file
if PROJECT_ID is None:
    print("🔄 Trying service account key file...")
    service_account_key_path = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
    
    if service_account_key_path and os.path.exists(service_account_key_path):
        try:
            with open(service_account_key_path) as f:
                service_account_info = json.load(f)
            
            credentials = service_account.Credentials.from_service_account_info(service_account_info)
            PROJECT_ID = service_account_info['project_id']
            client = bigquery.Client(project=PROJECT_ID, credentials=credentials)
            print(f"✅ Using service account key file")
            print(f"✅ Project ID: {PROJECT_ID}")
        except Exception as e:
            print(f"❌ Error with service account: {e}")
    else:
        print("❌ Service account key file not found")

# Method 3: Manual setup (if both above fail)
if PROJECT_ID is None:
    print("\n📝 Manual setup required:")
    print("1. Install gcloud CLI, or")
    print("2. Set GOOGLE_APPLICATION_CREDENTIALS in .env file, or") 
    print("3. Manually set PROJECT_ID below")
    
    # Uncomment and set your project ID manually
    # PROJECT_ID = "your-project-id-here"
    # client = bigquery.Client(project=PROJECT_ID)

## BigQuery Client Setup

In [None]:
# Set dataset configuration
DATASET_ID = "belgian_brewery"  # Fixed: removed invalid characters

if PROJECT_ID:
    print(f"🎯 BigQuery Project: {PROJECT_ID}")
    print(f"🎯 Dataset: {DATASET_ID}")
    
    try:
        # Test the client connection
        datasets = list(client.list_datasets())
        print(f"✅ Successfully connected to BigQuery!")
        print(f"📊 Available datasets: {len(datasets)}")
        
        if datasets:
            for dataset in datasets[:5]:  # Show first 5 datasets
                print(f"  - {dataset.dataset_id}")
        
    except Exception as e:
        print(f"❌ Error connecting to BigQuery: {e}")
else:
    print("❌ Cannot proceed without PROJECT_ID. Please set up authentication first.")

## Test Connection with Simple Query

In [None]:
if PROJECT_ID:
    # Test the connection with a simple query
    test_query = """
    SELECT 
        'BigQuery connection successful!' as status,
        CURRENT_DATETIME() as timestamp,
        @@project_id as project_id
    """

    try:
        # Execute the test query
        query_job = client.query(test_query)
        results = query_job.result()
        
        # Convert to DataFrame for better display
        df_test = results.to_dataframe()
        print("🎉 Connection Test Results:")
        print(df_test)
        
    except Exception as e:
        print(f"❌ Error executing test query: {e}")
else:
    print("⏭️ Skipping connection test - authentication not configured")

## Create Dataset for Belgian Brewery Project

In [None]:
if PROJECT_ID:
    # Create dataset for the Belgian brewery project
    dataset_id = f"{PROJECT_ID}.{DATASET_ID}"

    try:
        # Check if dataset already exists
        dataset = client.get_dataset(dataset_id)
        print(f"✅ Dataset {dataset_id} already exists.")
    except:
        # Create the dataset if it doesn't exist
        try:
            dataset = bigquery.Dataset(dataset_id)
            dataset.location = "US"  # or "EU" depending on your preference
            dataset.description = "Dataset for Belgian Brewery Glide Template Strategy Project"
            
            dataset = client.create_dataset(dataset, timeout=30)
            print(f"🎉 Created dataset {dataset_id}")
        except Exception as e:
            print(f"❌ Error creating dataset: {e}")
else:
    print("⏭️ Skipping dataset creation - authentication not configured")

## Prepare for Data Upload

Next steps will involve:
1. Loading the Belgian beers and breweries data from Google Sheets
2. Uploading raw data to BigQuery tables
3. Setting up dbt for data transformation

In [None]:
# Function to upload DataFrame to BigQuery
def upload_dataframe_to_bigquery(df, table_name, dataset_id=DATASET_ID, if_exists='replace'):
    """
    Upload a pandas DataFrame to BigQuery
    
    Args:
        df: pandas DataFrame to upload
        table_name: name of the BigQuery table
        dataset_id: BigQuery dataset ID
        if_exists: what to do if table exists ('replace', 'append', 'fail')
    """
    table_id = f"{PROJECT_ID}.{dataset_id}.{table_name}"
    
    # Configure the job
    job_config = bigquery.LoadJobConfig(
        write_disposition="WRITE_TRUNCATE" if if_exists == 'replace' else "WRITE_APPEND",
        autodetect=True  # Automatically detect schema
    )
    
    try:
        # Upload the DataFrame
        job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()  # Wait for the job to complete
        
        print(f"Successfully uploaded {len(df)} rows to {table_id}")
        return True
        
    except Exception as e:
        print(f"Error uploading data to {table_id}: {e}")
        return False

# Test function with sample data
sample_data = pd.DataFrame({
    'test_column': ['value1', 'value2', 'value3'],
    'timestamp': pd.Timestamp.now()
})

print("Testing upload function with sample data:")
upload_dataframe_to_bigquery(sample_data, 'connection_test', if_exists='replace')

## Verify Sample Upload

In [None]:
# Query the test table to verify upload worked
verify_query = f"""
SELECT * FROM `{PROJECT_ID}.{DATASET_ID}.connection_test`
LIMIT 10
"""

try:
    query_job = client.query(verify_query)
    results = query_job.result()
    df_verify = results.to_dataframe()
    
    print("Sample data successfully uploaded and retrieved:")
    print(df_verify)
    
except Exception as e:
    print(f"Error querying test table: {e}")

## Next Steps

✅ BigQuery connection established  
✅ Dataset created  
✅ Upload function tested  

**Ready for the main project workflow:**

1. **Data Ingestion**: Load Belgian brewery data from Google Sheets
2. **Data Enrichment**: Use Python + geocoding API to get brewery locations  
3. **dbt Setup**: Create transformation pipeline
4. **Hex Dashboard**: Build analytics dashboard

Update the `PROJECT_ID` variable above with your actual Google Cloud Project ID to proceed.