# OpenAI-to-Z Challenge: Data Acquisition (Google Colab)

This notebook is designed to run in Google Colab. It handles the data acquisition for Checkpoint 1 of the OpenAI-to-Z Challenge.

**Objectives:**
1. Install necessary dependencies.
2. Authenticate with Google Earth Engine.
3. Fetch one Sentinel-2 scene for the Acre, Brazil region.
4. Fetch one LiDAR tile from a specified URL.
5. Package the downloaded data into a single `amazon_data.zip` file for easy download.

## 1. Install Dependencies

In [None]:
!pip install -q earthengine-api geemap rasterio laspy

## 2. Authenticate and Initialize

In [None]:
import os
import ee
import geemap
import requests
from tqdm import tqdm
import zipfile

# This will trigger the authentication flow in Colab.
try:
    ee.Initialize()
except Exception:
    ee.Authenticate()
    ee.Initialize()

print("Earth Engine is initialized.")

# Create directories for raw data
os.makedirs('data/raw/sentinel2', exist_ok=True)
os.makedirs('data/raw/lidar', exist_ok=True)

## 3. Define Area of Interest (AOI)

We'll focus on the Acre state in western Brazil, centered around coordinates **9.5°S, 70.5°W**, a region known for its geoglyphs.

In [None]:
# Define the bounding box for the Acre region
lon_min, lon_max = -72, -69
lat_min, lat_max = -11, -8

aoi = ee.Geometry.Rectangle([lon_min, lat_min, lon_max, lat_max])

# Create a map to visualize the AOI (optional)
map_aoi = geemap.Map(center=[-9.5, -70.5], zoom=7)
map_aoi.addLayer(aoi, {'color': 'FF0000'}, 'Area of Interest')
map_aoi

## 4. Search and Download a Sentinel-2 Scene

We will search for a recent, low-cloud Sentinel-2 scene within our AOI and download it.

In [None]:
sentinel_collection = (
    ee.ImageCollection('COPERNICUS/S2_SR')
    .filterBounds(aoi)
    .filterDate('2023-01-01', '2023-12-31')
    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))
    .sort('CLOUDY_PIXEL_PERCENTAGE')
)

image_list = sentinel_collection.toList(sentinel_collection.size())
count = image_list.size().getInfo()

if count > 0:
    selected_image = ee.Image(image_list.get(0))
    scene_id = selected_image.id().getInfo()
    output_path = f"data/raw/sentinel2/{scene_id.replace('/', '_')}.tif"
    
    print(f"Found {count} scenes. Downloading the least cloudy one: {scene_id}")
    
    geemap.ee_export_image(
        selected_image,
        filename=output_path,
        scale=10,  # 10m resolution for Sentinel-2
        region=aoi,
        file_per_band=False
    )
    print(f"Download complete. File saved to: {output_path}")
else:
    print("No scenes found matching the criteria.")

## 5. Download LiDAR Data

We will download a sample LiDAR file from the **ORNL DAAC: Dataset 1644**, which covers forest sites in the Brazilian Amazon.

In [None]:
def download_file(url, directory):
    """Downloads a file from a URL with a progress bar into a directory."""" 
    filename = os.path.join(directory, url.split('/')[-1])
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        total_size = int(response.headers.get('content-length', 0))
        
        with open(filename, 'wb') as f, tqdm(
            desc=filename.split('/')[-1],
            total=total_size,
            unit='iB',
            unit_scale=True,
            unit_divisor=1024,
        ) as bar:
            for chunk in response.iter_content(chunk_size=8192):
                size = f.write(chunk)
                bar.update(size)
        print(f"Successfully downloaded {filename}")
        return filename
    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")
        return None

# URL for a known LiDAR dataset in the Brazilian Amazon region from ORNL DAAC.
lidar_url = "https://daac.ornl.gov/daacdata/cms/LiDAR_Forest_Inventory_Brazil/data/FLB_7006_20140728_131235.laz"
lidar_filename_path = os.path.join('data/raw/lidar', lidar_url.split('/')[-1])

print(f"Preparing to download LiDAR data from: {lidar_url}")
download_file(lidar_url, 'data/raw/lidar')

## 6. Package Data for Download

This final step will zip the contents of the `data/raw` directory into a single file named `amazon_data.zip`. You can then download this file from the Colab file explorer (the folder icon on the left).

In [None]:
def zip_directory(folder_path, zip_path):
    """Zips the contents of an entire folder."""
    with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_path):
            for file in files:
                zipf.write(os.path.join(root, file), 
                           os.path.relpath(os.path.join(root, file), 
                                           os.path.join(folder_path, '..')))
    print(f"Successfully created zip file at {zip_path}")

zip_directory('data/raw', 'amazon_data.zip')

### Next Steps

1. Find `amazon_data.zip` in the file explorer panel on the left.
2. Right-click on it and select "Download".
3. Unzip the file on your local machine.
4. Proceed with the `01_checkpoint_1_analysis.ipynb` notebook locally.