# OpenAI-to-Z Challenge: Data Acquisition

This notebook covers **Checkpoint 1: Data Acquisition** for the OpenAI-to-Z Challenge. Our goal is to fetch one Sentinel-2 scene and one OpenTopography LiDAR tile that overlap in the western Amazon region of Acre, Brazil.

## 1. Setup and Imports

First, we'll import the necessary Python libraries and authenticate with Google Earth Engine (GEE).

In [None]:
import os
import ee
import geemap
import requests
from tqdm import tqdm

# Authenticate and initialize Earth Engine
try:
    ee.Initialize()
except Exception as e:
    print("Earth Engine initialization failed. Please authenticate.")
    ee.Authenticate()
    ee.Initialize()

# Create directories for raw data
os.makedirs('data/raw/sentinel2', exist_ok=True)
os.makedirs('data/raw/lidar', exist_ok=True)

## 2. Define Area of Interest (AOI)

We'll focus on the Acre state in western Brazil, centered around coordinates **9.5°S, 70.5°W**, a region known for its geoglyphs.

In [None]:
# Define the bounding box for the Acre region
lon_min, lon_max = -72, -69
lat_min, lat_max = -11, -8

aoi = ee.Geometry.Rectangle([lon_min, lat_min, lon_max, lat_max])

# Create a map to visualize the AOI
map_aoi = geemap.Map(center=[-9.5, -70.5], zoom=7)
map_aoi.addLayer(aoi, {'color': 'FF0000'}, 'Area of Interest')
map_aoi

## 3. Search for Sentinel-2 Scenes

Using `geemap`, we will search for recent, low-cloud Sentinel-2 scenes within our AOI.

In [None]:
sentinel_collection = (
    ee.ImageCollection('COPERNICUS/S2_SR')
    .filterBounds(aoi)
    .filterDate('2023-01-01', '2023-12-31')
    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))
    .sort('CLOUDY_PIXEL_PERCENTAGE')
)

# Get the list of available images
image_list = sentinel_collection.toList(sentinel_collection.size())
print(f"Found {image_list.size().getInfo()} Sentinel-2 scenes.")

# Print details of the first 5 scenes
for i in range(min(5, image_list.size().getInfo())):
    image = ee.Image(image_list.get(i))
    scene_id = image.id().getInfo()
    cloud_cover = image.get('CLOUDY_PIXEL_PERCENTAGE').getInfo()
    date = ee.Date(image.get('system:time_start')).format('YYYY-MM-dd').getInfo()
    print(f"{i+1}: ID={scene_id}, Date={date}, Cloud Cover={cloud_cover:.2f}%")

## 4. Download a Selected Sentinel-2 Scene

Let's select the first (least cloudy) scene from the list and download it.

In [None]:
if image_list.size().getInfo() > 0:
    selected_image = ee.Image(image_list.get(0))
    scene_id = selected_image.id().getInfo()
    output_path = f"data/raw/sentinel2/{scene_id}.tif"
    
    print(f"Downloading scene: {scene_id}...")
    geemap.ee_export_image(
        selected_image,
        filename=output_path,
        scale=10,  # 10m resolution for Sentinel-2
        region=aoi,
        file_per_band=False
    )
    print(f"Download complete. File saved to: {output_path}")
else:
    print("No scenes found matching the criteria.")

## 5. Search for LiDAR Data on OpenTopography

OpenTopography is a key source for LiDAR data. While their API is more complex for direct searching, we can browse their catalog to find relevant datasets for the Acre region. Datasets from the USGS 3DEP program or NASA's CMS program are good candidates.

**Action Required:**
1. Go to the [OpenTopography Data Catalog](https://portal.opentopography.org/datasets).
2. Pan and zoom to the Acre, Brazil region (around -9.5, -70.5).
3. Look for datasets that cover this area. A good example is the [LiDAR Surveys over Selected Forest Research Sites, Brazilian Amazon, 2008-2018](https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1644) hosted by ORNL DAAC and accessible via OpenTopography.
4. Once you find a suitable `.laz` file, copy its download URL and paste it below.

In [None]:
# Placeholder for the LiDAR data URL
# Example from a known dataset in the region.
# You will need to replace this with the URL you find.
lidar_url = "https://opentopography.s3.sdsc.edu/pc-bulk/CMS_LiDAR_Brazil_2018/CMS_LiDAR_Brazil_2018_laz/FLB_7006_20140728_131235.laz" # Example URL
lidar_filename = "data/raw/lidar/" + lidar_url.split('/')[-1]

print(f"Preparing to download LiDAR data from: {lidar_url}")

## 6. Download Selected LiDAR Data

This cell will download the LiDAR file from the URL specified above.

In [None]:
def download_file(url, filename):
    """Downloads a file from a URL with a progress bar."""
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status() # Raise an exception for bad status codes
        total_size = int(response.headers.get('content-length', 0))
        
        with open(filename, 'wb') as f, tqdm(
            desc=filename.split('/')[-1],
            total=total_size,
            unit='iB',
            unit_scale=True,
            unit_divisor=1024,
        ) as bar:
            for chunk in response.iter_content(chunk_size=8192):
                size = f.write(chunk)
                bar.update(size)
        print(f"Successfully downloaded {filename}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")

if lidar_url and 'https://' in lidar_url:
    download_file(lidar_url, lidar_filename)
else:
    print("Please provide a valid LiDAR data URL in the cell above.")