# OpenAI to Z Challenge - Project Setup

This notebook sets up the initial project structure for Kaggle environment and demonstrates basic functionality.

**Note**: This notebook is optimized for Kaggle's environment with pre-installed packages.

In [None]:
# Kaggle Environment Setup
import os
import sys

print(f"Python version: {sys.version}")
print(f"Running on: {'Kaggle' if '/kaggle' in os.getcwd() else 'Local'}")

# Check Kaggle-specific paths and resources
if '/kaggle' in os.getcwd():
    print(f"Working directory: {os.getcwd()}")
    print(f"Input data: {os.listdir('/kaggle/input') if os.path.exists('/kaggle/input') else 'No input data'}")
    print(f"GPU available: {os.environ.get('KAGGLE_GPU_TYPE', 'None')}")
    
    # Set up Kaggle paths
    sys.path.append('/kaggle/working')
    WORK_DIR = '/kaggle/working'
    INPUT_DIR = '/kaggle/input'
else:
    # Local development paths
    sys.path.append('../')
    WORK_DIR = '.'
    INPUT_DIR = '../data'

print("✓ Environment setup complete!")

In [None]:
# Import standard packages (pre-installed on Kaggle)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import requests
import json
from pathlib import Path

# Try to import geospatial packages
try:
    import geopandas as gpd
    print("✓ GeoPandas available")
    geo_available = True
except ImportError:
    print("⚠ GeoPandas not available")
    geo_available = False

# Check for OpenAI (might need to install in Kaggle)
try:
    import openai
    print("✓ OpenAI package available")
except ImportError:
    print("⚠ OpenAI package not available - install with: !pip install openai")

# Import local modules with error handling
try:
    if '/kaggle' not in os.getcwd():
        from src.config import Config
        from src.ai_analysis.openai_client import OpenAIAnalyzer
        print("✓ Local modules imported successfully")
    else:
        raise ImportError("Using Kaggle environment - creating inline config")
except ImportError as e:
    print(f"Creating basic config for Kaggle environment...")
    
    class Config:
        STUDY_BOUNDS = {
            'north': -10.0,   # Northern Australia
            'south': -45.0,   # Southern Australia  
            'east': 155.0,    # Eastern Australia
            'west': 110.0     # Western Australia
        }
        
        # Kaggle-specific settings
        KAGGLE_WORK_DIR = WORK_DIR
        KAGGLE_INPUT_DIR = INPUT_DIR

print("Project setup complete!")
print(f"Study area bounds: {Config.STUDY_BOUNDS}")
print(f"Geospatial support: {geo_available}")

## Data Collection Strategy (Kaggle Environment)

1. **Satellite Imagery**: 
   - **Kaggle Datasets**: Search for Australian satellite imagery datasets
   - **API Access**: Use requests to fetch data from public APIs
   - **Pre-downloaded Data**: Upload datasets to Kaggle for processing
2. **LIDAR Data**: NASA GEDI datasets available on Kaggle
3. **Historical Sources**: Text datasets, digitized documents
4. **Indigenous Knowledge**: Curated datasets with traditional knowledge

### Kaggle-Specific Data Sources:
- **Kaggle Datasets**: Public datasets with Australian geographical data
- **API Integration**: OpenStreetMap, NASA APIs (within rate limits)
- **Uploaded Files**: Custom datasets uploaded to your Kaggle account
- **External URLs**: Direct downloads of open data

### Available Packages on Kaggle:
- pandas, numpy, matplotlib, seaborn (pre-installed)
- folium, plotly (pre-installed)
- scikit-learn, tensorflow, pytorch (pre-installed)
- requests, beautifulsoup4 (pre-installed)

In [None]:
# Create study area map (optimized for Kaggle)
center_lat = (Config.STUDY_BOUNDS['north'] + Config.STUDY_BOUNDS['south']) / 2
center_lon = (Config.STUDY_BOUNDS['east'] + Config.STUDY_BOUNDS['west']) / 2

# Create map with Kaggle-friendly settings
m = folium.Map(
    location=[center_lat, center_lon], 
    zoom_start=5,
    tiles='OpenStreetMap'  # Reliable tile source for Kaggle
)

# Add study area rectangle
folium.Rectangle(
    bounds=[
        [Config.STUDY_BOUNDS['south'], Config.STUDY_BOUNDS['west']],
        [Config.STUDY_BOUNDS['north'], Config.STUDY_BOUNDS['east']]
    ],
    color='red',
    fill=True,
    fillOpacity=0.2,
    popup="Australian Study Area"
).add_to(m)

# Add markers for major cities as reference points
cities = [
    {"name": "Sydney", "lat": -33.8688, "lon": 151.2093},
    {"name": "Melbourne", "lat": -37.8136, "lon": 144.9631},
    {"name": "Perth", "lat": -31.9505, "lon": 115.8605},
    {"name": "Darwin", "lat": -12.4634, "lon": 130.8456}
]

for city in cities:
    folium.Marker(
        [city["lat"], city["lon"]], 
        popup=city["name"],
        icon=folium.Icon(color='blue', icon='info-sign')
    ).add_to(m)

# Save map for Kaggle (since display might be limited)
map_path = f"{WORK_DIR}/study_area_map.html"
m.save(map_path)
print(f"Map saved to: {map_path}")

m

## Next Steps (Kaggle Workflow)

1. **Upload datasets** to Kaggle or find existing Australian datasets
2. **Set up OpenAI API** using Kaggle Secrets for API keys
3. **Create data processing pipeline** using Kaggle's compute resources
4. **Utilize Kaggle's GPU** for any ML/AI processing

### Kaggle-Specific Setup:
```python
# Install additional packages if needed
!pip install openai python-dotenv

# Access Kaggle secrets for API keys
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
openai_key = user_secrets.get_secret("openai_api_key")
```

### File Management in Kaggle:
- **Input**: `/kaggle/input/` (read-only datasets)
- **Working**: `/kaggle/working/` (temporary files)
- **Output**: Save results to working directory for download

In [None]:
# Kaggle environment validation
print("=== Kaggle Environment Check ===")
print(f"Working directory: {os.getcwd()}")
print(f"Available disk space: {os.statvfs('.').f_bavail * os.statvfs('.').f_frsize // (1024**3)} GB")
print(f"Python executable: {sys.executable}")

# Check for internet connectivity (important for API calls)
try:
    response = requests.get("https://httpbin.org/get", timeout=5)
    print("✓ Internet connectivity: Available")
except:
    print("✗ Internet connectivity: Limited")

# List available input datasets
if os.path.exists('/kaggle/input'):
    input_datasets = os.listdir('/kaggle/input')
    print(f"Input datasets: {input_datasets}")
else:
    print("No input datasets found")

print("=== Setup Complete ===")