# Creating the Datasets

This notebook will show how we created the datasets used in the CNN. 

First we import the libraries, which includes some different satelites, standard pythin libraries and google drive api clients.

In [42]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import ee
import urllib.request
import datetime
import os
import base64
import requests
import warnings
warnings.filterwarnings("ignore")
import tempfile
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaFileUpload
from sentinelhub import SentinelHubRequest, MimeType, CRS, BBox, DataCollection
from sentinelhub import SHConfig
from sentinelhub import WmsRequest, CRS, MimeType, CustomUrlParam
from PIL import Image
import random
from math import radians, sin, cos, sqrt, atan2
from sklearn.neighbors import DistanceMetric
from scipy.spatial.distance import cdist

## Getting the Satellite Images

To find the images of places where a fire will occur, so we have a chance to see if we can predict it, we have found a dataset of fires occuring in the state in Oregon in the US from 2000-2022. This includes the time of detection, exact location, size, and what caused the fire, plus many other details.

In [2]:
full_data = pd.read_csv('ODF_Fire_Occurrence_Data_2000-2022.csv')

In [3]:
full_data.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,23480,23481,23482,23483,23484,23485,23486,23487,23488,23489
Serial,102649,131239,58256,59312,61657,98529,63735,68019,68067,68224,...,129090,128550,131662,128862,129111,124548,132141,124065,131292,131287
FireCategory,STAT,STAT,STAT,STAT,STAT,STAT,STAT,STAT,STAT,STAT,...,STAT,STAT,STAT,STAT,STAT,STAT,STAT,STAT,STAT,STAT
FireYear,2015,2022,2000,2000,2001,2014,2002,2003,2003,2003,...,2022,2022,2022,2022,2022,2021,2022,2021,2022,2022
Area,EOA,EOA,EOA,EOA,SOA,SOA,NOA,NOA,EOA,EOA,...,SOA,SOA,EOA,NOA,SOA,EOA,SOA,EOA,EOA,EOA
DistrictName,Klamath-Lake,Walker Range - WRFPA,Central Oregon,Northeast Oregon,Southwest Oregon,Douglas - DFPA,West Oregon,West Oregon,Northeast Oregon,Walker Range - WRFPA,...,Southwest Oregon,Southwest Oregon,Klamath-Lake,North Cascade,Coos - CFPA,Walker Range - WRFPA,Western Lane,Northeast Oregon,Northeast Oregon,Central Oregon
UnitName,Klamath,Crescent,John Day,La Grande,Grants Pass,DFPA Central,Philomath,Dallas,Wallowa,Crescent,...,Grants Pass,Medford,Klamath,Molalla,Gold Beach,Crescent,Veneta,Pendleton,Pendleton,Sisters
FullFireNumber,15-981082-16,22-991220-23,00-952011-01,00-971024-01,01-712133-02,14-733192-15,02-551001-03,03-552013-04,03-974016-04,03-991228-04,...,22-712039-23,22-711354-22,22-981071-23,22-581069-22,22-723009-23,21-991258-21,22-781066-23,21-973052-21,22-973014-23,22-955070-23
FireName,Bass 497,Hay Fire,Slick Ear #2,Woodley,QUEENS BRANCH,Chilcoot,WREN,Ritner Creek,Big Tamarack,COIDC 918,...,Riverbanks Rd 5075,Grouse Ridge,Lobert 336,Marmot Rd Pile,Bagley Creek,Road 2430,Spruce Path,Bone Canyon,Milepost 231,That Way 774
Size_class,B,A,B,C,A,A,A,A,A,A,...,A,A,B,B,A,B,A,C,A,A
EstTotalAcres,3.2,,0.75,80.0,0.1,0.01,0.01,0.01,0.01,0.0,...,0.01,0.1,,9.0,0.01,0.75,0.01,67.43,0.1,0.01


Here we collect some of the columns that may be relevant for our analysis

In [3]:
# Collecting relevant columns
data = full_data[['FireYear','Size_class','FullFireNumber','EstTotalAcres','CauseBy','Lat_DD','Long_DD','FireName','ReportDateTime','Discover_DateTime','Control_DateTime','DistFireNumber']]

Cleaning the data by dropping NA rows, converting to datetime for the satelite scraping and creating the coordinates form the latitude and longitude columns

In [4]:
#Cleaning the data
data = data.dropna() # Drop rows with no discover time
data['Discover_DateTime'] = pd.to_datetime(data['Discover_DateTime']) # Convert to datetime 
data['Control_DateTime'] = pd.to_datetime(data['Control_DateTime']) # Convert to datetime 
data['ReportDateTime'] = pd.to_datetime(data['ReportDateTime']) # Convert to datetime 
data['Coordinates'] = data.apply(lambda row: [row['Lat_DD'], row['Long_DD']], axis=1) # Getting the coordinates

Creating columns 1, 3, and 6 months before the fire was detected, to use for different intervals for the scraping

In [5]:
# Getting the dates for the images
data['6months'] = data['Discover_DateTime'] - pd.DateOffset(months=6)
data['3months'] = data['Discover_DateTime'] - pd.DateOffset(months=3)
data['1months'] = data['Discover_DateTime'] - pd.DateOffset(months=1)

## Getting the images

Here we are using the Earth Engine to scrape the images

Use this to get access to the Earth Engine:
https://developers.google.com/earth-engine/guides/python_install

Username: forestfiresgroupglobaldtu

In [9]:
# Authenticate Earth Engine
ee.Authenticate()

Enter verification code: 4/1AVHEtk5CrvfHqz949qbW8XO205OoPj4i4750n0rUacxjkCpO6iGskFDtA4g

Successfully saved authorization token.


## Earth Engine

In [10]:
# initialize Earth Engine
ee.Initialize()

### Google Drive Save

All the images was directly saved to google drive with the google api client, to save space on the local computer.

The first satelite we tried was the Landsat 8, which has a lot of data available, but after scarping the images we decided to not use it as the quality was to low when trying to get images that was under 2km x 2km.

#### Landsat 8
Landsat 8, launched in February 2013, is an Earth observation satellite operated by the United States Geological Survey (USGS) and NASA. Landsat 8 operates in a sun-synchronous orbit, capturing images of the Earth's surface in a continuous and systematic manner.

Landsat 8 has a 16-day revisit time for any specific location on Earth. This means that Landsat 8 captures images of the same location approximately once every 16 days. The satellite acquires images in multiple spectral bands, including visible, near-infrared, and thermal infrared bands.

It's worth noting that Landsat 8 works in tandem with Landsat 9, which was launched in September 2021. Both satellites share the same orbital plane and follow the same ground track, with Landsat 9 offset by 8 days from Landsat 8. This effectively increases the revisit time for the Landsat program to approximately once every 8 days for any specific location on Earth.

In [20]:
# Function to get the image URL
def get_image_url(coordinates, timestamp):
    latitude, longitude = coordinates
    # convert the timestamp to a string in the format 'YYYY-MM-DD'
    date_str = timestamp.strftime('%Y-%m-%d')
    
    # convert the string to a datetime object
    date = datetime.datetime.strptime(date_str, '%Y-%m-%d')

    # Define the image collection and filter
    image_collection = ee.ImageCollection("LANDSAT/LC08/C01/T1_SR") \
        .filterDate(date - datetime.timedelta(days=30), date + datetime.timedelta(days=30)) \
        .filterBounds(ee.Geometry.Point(longitude, latitude))
    
    # Check if there are any images within the 30-day window
    image_count = image_collection.size().getInfo()
    if image_count == 0:
        #print(f'No images found within 30 days of {timestamp} for coordinates {coordinates}. Skipping row.')
        return
    
    # Get the least cloudy image
    least_cloudy = image_collection.sort('CLOUD_COVER').first()

    # Define the visualization parameters
    vis_params = {
        'bands': ['B4', 'B3', 'B2'],
        'min': 0,
        'max': 3000,
        'gamma': 1.4
    }

    # Define the region to get the image
    region = ee.Geometry.Point(longitude, latitude).buffer(1000).bounds().getInfo()['coordinates']

    # Get the image URL
    image_url = least_cloudy.getThumbURL({
        'region': region,
        'scale': 12,  # Set the scale to match the native resolution
        'format': 'png',
        'resampling_method': 'bicubic',
        **vis_params
    })

    return image_url

# Function to save the image to Google Drive
def save_image_to_drive(image_url, file_name, folder_id=None):
    # Set up Google Drive API
    creds = service_account.Credentials.from_service_account_file('forestfiredtu-ef1cef2f2e43.json')
    service = build('drive', 'v3', credentials=creds)

    # Download the image
    response = requests.get(image_url)
    if response.status_code != 200:
        print(f"Failed to download image: {file_name}.png")
        return
    image_data = response.content

    # Save the image to a temporary file
    with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as temp_image:
        temp_image.write(image_data)
        temp_image.flush()

        # Save the image to Google Drive
        file_metadata = {
            'name': f'{file_name}.png',
            'mimeType': 'image/png'
        }

        if folder_id:
            file_metadata['parents'] = [folder_id]

        media = MediaFileUpload(temp_image.name, mimetype='image/png', resumable=True)

        try:
            file = service.files().create(body=file_metadata, media_body=media,
                                          fields='id').execute()
            print(F'File ID: "{file.get("id")}".')
        except HttpError as error:
            print(F'An error occurred: {error}')
            file = None

        # Remove the temporary file
        os.unlink(temp_image.name)

    return file
# Specify your Google Drive folder ID 
folder_id = '1As3zsmcQIIGIAwTdwgtPIcLQuLCclXsZ'

# Iterate over the DataFrame and call the functions
for index, row in df.iterrows():
    image_url = get_image_url(row['Coordinates'], row['6months'])
    if image_url:
        save_image_to_drive(image_url, row['FireName'], folder_id)


File ID: "1D_0XphtFY0jqAX5DEIyRWjP6pTxAUnA3".
File ID: "1wT2M-mDI7s3zRrTyHACLZQvrGdH2hSUG".
File ID: "1iWIZz_02_Jir7hoIpE4KAItqFC_YtOc_".
File ID: "1v1gvPIu8f8QGLhg_9Y_G490qo_HNsJvv".
File ID: "1VUX9paJhbWGYsIPm4lhinefqJ-GLWOEE".
File ID: "18S8sPyNPIxoBtKdDKxAe0KYOO6QPpZNX".
File ID: "1tTVldD5M6a1l0WFQJ99RrhkwT5OXbJpn".
File ID: "1hD86j5kD93QOj3yfd1D-0xdD7Oz-GtXu".
File ID: "1QBOdWyh-PMP5HiqGIQ58s_T2DDymVxG1".
File ID: "1oPhWrwfX-IQG6JDYGQ9UNCN42-BfPyiq".
File ID: "1d6Ng8cyRPAW8yKD7qUPPSyA8-14wvD6M".
File ID: "1rS2RZbFbaaUfBLiveIeEFBFolWHrBqJz".
File ID: "1E2vkNR8m55zSlzEQKp9JuNX65FW2Wwgc".
File ID: "1zhmPlei-8pACmcfEzrWnuTFCyxSwC9Nt".
File ID: "1RIM5nGkslI6WWoy3Zj7bhE9pjLGPgLsh".
File ID: "1bN7tEzwBzXCNA4oDDvyLScXYIgtcBgMl".
File ID: "14oqGf140uCtsryinf0vC17Ck7aLKV8Sc".
File ID: "1C9DWz38v2fZk3S5UL7-6ZOu-wyV-WVHG".
File ID: "15Wvgy3fR9iDWbDuhUB7vxDFk0lgq2SPR".
File ID: "1Uo3GNJcyimiZosuOL7ajtzfRW6fJ3TGr".
File ID: "1Z0WgUz0ZaAYHdq0-sOx-8k7EI2rEeWkv".
File ID: "1S29-bRlbYmWaP-ztnOGPPrM

The next satelite we tried was the Sentinel 2, which provided better pictures, but it could still not produce images of a good enough quality with images smaller than 1km x 1km. 

#### Sentinel 2 - only data from 2015

In [30]:
# Function to get the image URL
def get_image_url(coordinates, timestamp):
    latitude, longitude = coordinates
    # convert the timestamp to a string in the format 'YYYY-MM-DD'
    date_str = timestamp.strftime('%Y-%m-%d')
    
    # convert the string to a datetime object
    date = datetime.datetime.strptime(date_str, '%Y-%m-%d')

    # Define the image collection and filter
    image_collection = ee.ImageCollection("COPERNICUS/S2") \
        .filterDate(date - datetime.timedelta(days=30), date + datetime.timedelta(days=30)) \
        .filterBounds(ee.Geometry.Point(longitude, latitude)) \
        .filterMetadata('CLOUDY_PIXEL_PERCENTAGE', 'less_than', 5)
    # Check if there are any images within the 30-day window
    image_count = image_collection.size().getInfo()
    if image_count == 0:
        #print(f'No images found within 30 days of {timestamp} for coordinates {coordinates}. Skipping row.')
        return

    # Get the least cloudy image
    least_cloudy = image_collection.sort('CLOUDY_PIXEL_PERCENTAGE').first()

    # Define the visualization parameters
    vis_params = {
        'bands': ['B4', 'B3', 'B2'],
        'min': 0,
        'max': 3000,
        'gamma': 1.4
    }

    # Define the region to get the image
    region = ee.Geometry.Point(longitude, latitude).buffer(750).bounds().getInfo()['coordinates']

    # Get the image URL
    image_url = least_cloudy.getThumbURL({
        'region': region,
        'scale': 10,  # Set the scale to match the native resolution
        'format': 'png',
        'resampling_method': 'bicubic',
        **vis_params
    })

    return image_url

# Function to save the image to Google Drive
def save_image_to_drive(image_url, file_name, folder_id=None):
    # Set up Google Drive API
    creds = service_account.Credentials.from_service_account_file('forestfiredtu-ef1cef2f2e43.json')
    service = build('drive', 'v3', credentials=creds)

    # Download the image
    response = requests.get(image_url)
    if response.status_code != 200:
        print(f"Failed to download image: {file_name}.png")
        return
    image_data = response.content

    # Save the image to a temporary file
    with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as temp_image:
        temp_image.write(image_data)
        temp_image.flush()

        # Save the image to Google Drive
        file_metadata = {
            'name': f'{file_name}.png',
            'mimeType': 'image/png'
        }

        if folder_id:
            file_metadata['parents'] = [folder_id]

        media = MediaFileUpload(temp_image.name, mimetype='image/png', resumable=True)

        try:
            file = service.files().create(body=file_metadata, media_body=media,
                                          fields='id').execute()
            print(F'File ID: "{file.get("id")}".')
        except HttpError as error:
            print(F'An error occurred: {error}')
            file = None

        # Remove the temporary file
        os.unlink(temp_image.name)

    return file
# Specify your Google Drive folder ID 
folder_id = '13Yklp_2PlRh-HV1dmOMYA77L4fARfuIE'

# Iterate over the DataFrame and call the functions
for index, row in test.iterrows():
    image_url = get_image_url(row['Coordinates'], row['3months'])
    if image_url:
        save_image_to_drive(image_url, row['FireName'], folder_id)


File ID: "1a2fKCzTze1qxQXD_fpuECrF7FxZgCSLO".


The satelite that we ended up using was the NAIP which can provide high quality images, and we decided on images with 250m x 250m. The only disadvantage with NAIP is that it is only updated ever 2-3 years so we could not get as many pictures as we had hoped. The scraped images are between 2-7 months before the fire occured

#### NAIP
The National Agriculture Imagery Program (NAIP) imagery is typically updated on a two to three-year cycle. This means that for a specific area within the United States, new NAIP images are usually acquired every two or three years. However, this frequency may vary depending on factors like budget, weather conditions, and other factors that can impact aerial image acquisition.

NAIP primarily focuses on capturing imagery during the agricultural growing season (late spring through early fall) to support various agricultural programs and applications. The images have a resolution of 1 meter, and they are available in natural color (RGB) and, in some cases, near-infrared (NIR) bands.

In [24]:
#Split the data, as the scrape usually stops after 4000-5000 rows
data1 = data[:4000]
data2 = data[4000:8000]
data3 = data[8000:12000]
data4 = data[12000:16000]
data5 = data[16000:20000]
data6 = data[20000:]

In [19]:
# Function to get the image URL
def get_image_url(coordinates, timestamp):
    latitude, longitude = coordinates
    # convert the timestamp to a string in the format 'YYYY-MM-DD'
    date_str = timestamp.strftime('%Y-%m-%d')
    
    # convert the string to a datetime object
    date = datetime.datetime.strptime(date_str, '%Y-%m-%d')

    # Define the image collection and filter
    image_collection = ee.ImageCollection("USDA/NAIP/DOQQ") \
        .filterDate(date - datetime.timedelta(days=120), date + datetime.timedelta(days=30)) \
        .filterBounds(ee.Geometry.Point(longitude, latitude))
    
    # Check if there are any images within the 1-year window
    image_count = image_collection.size().getInfo()
    if image_count == 0:
        #print(f'No images found within 1 year of {timestamp} for coordinates {coordinates}. Skipping row.')
        return

    # Get the most recent image
    most_recent = image_collection.sort('system:time_start', False).first()

    # Define the visualization parameters
    vis_params = {
        'bands': ['R', 'G', 'B'],
        'min': 0,
        'max': 255,
    }

    # Define the region to get the image
    region = ee.Geometry.Point(longitude, latitude).buffer(250).bounds().getInfo()['coordinates']

    # Get the image URL
    image_url = most_recent.getThumbURL({
        'region': region,
        'scale': 1,  # Set the scale to match the native resolution
        'format': 'png',
        'resampling_method': 'bicubic',
        **vis_params
    })

    return image_url

# Function to save the image to Google Drive
def save_image_to_drive(image_url, file_name, folder_id=None):
    # Set up Google Drive API
    creds = service_account.Credentials.from_service_account_file('forestfiredtu-ef1cef2f2e43.json')
    service = build('drive', 'v3', credentials=creds)

    # Download the image
    response = requests.get(image_url)
    if response.status_code != 200:
        print(f"Failed to download image: {file_name}.png")
        return
    image_data = response.content

    # Save the image to a temporary file
    with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as temp_image:
        temp_image.write(image_data)
        temp_image.flush()

        # Save the image to Google Drive
        file_metadata = {
            'name': f'{file_name}.png',
            'mimeType': 'image/png'
        }

        if folder_id:
            file_metadata['parents'] = [folder_id]

        media = MediaFileUpload(temp_image.name, mimetype='image/png', resumable=True)

        try:
            file = service.files().create(body=file_metadata, media_body=media,
                                          fields='id').execute()
            print(F'File ID: "{file.get("id")}".')
        except HttpError as error:
            print(F'An error occurred: {error}')
            file = None

        # Remove the temporary file
        os.unlink(temp_image.name)

    return file
# Specify your Google Drive folder ID 
folder_id = '1ats_A6B5WJx2Gdkd426zOpVUSL6N_LEz'
j = 0
# Iterate over the DataFrame and call the functions
for index, row in data6.iterrows():
    image_url = get_image_url(row['Coordinates'], row['3months'])
    j += 1
    if j % 100 == 0:
        print(j)
    if image_url:
        save_image_to_drive(image_url, row['FullFireNumber'], folder_id)


File ID: "1qF3pMZr84DVCR_mm4obsEXwHuYiovdJP".
File ID: "14PE3z2Hi04sm5lre7v9i7oCv3Ni9Mvzg".
File ID: "1XyKdwpG0bYsTEK3mFKlpfepCwB_cC5x5".
File ID: "1F9Ya65s6ADBbEORz6s3kDjVpdt0qWfrm".
File ID: "1SQjla7YHxCZUublJ3qB9OCcfVeIK2MmT".
File ID: "1a1qiTOx2f52sJ-3dC7bE91WnvMiT1QFq".
File ID: "1y7rgsbU1HpNiyymm6MsMkg6vGffYr7aR".
File ID: "1m7k_AEnTrYcuxvDsvVWctyqo594lfzn4".
File ID: "1rAfZudhUNs45P5_X9sNKfN7cWWY5YEdV".
File ID: "10t0Dtfi64nBhdVAnzteU3Omi7EwDf18I".
File ID: "1f-teEgL5do1bsu7yGolF9_VcU6a0-7M5".
File ID: "1WQG9rp-NCC0OXK0vJC9Om2wfpP9x7FfH".
100
File ID: "15rMUwc7OSHgscrRpe_ILmUHAGTYBScJI".
File ID: "10230sm_Q1q6-W16TR-FFLxWJnmULgr78".
File ID: "1KSp1gOzt2LXg6rsRj1nTFQ4LeLfTLvFI".
File ID: "12JTgbSqBrHxquIY8a9l81fdIOB48ugTO".
File ID: "1I5-6J6Ivb1MVTmz6mYP8U43C_f0cItsn".
File ID: "1Hduz4JEjxzWxKK-rHtiLMe0kGM6pbXVE".
File ID: "1b8F9ldx60-A1cfSkjHJkRm3WYqQIZkmZ".
File ID: "1Ui2AS-M6WfTMdIH7F-eKPqWxLD_T5kTa".
File ID: "1ZLXmzJpHYHKOa_SFJwKruisBSSmppYKf".
File ID: "1BPkvMCTTPY5ruoTLJTo

File ID: "1YOJxvdg5Z2RZtOc6X_5beYD_Zv8zlwvd".
File ID: "1E3c3kLHzQ9RbhZigy3ikwBUm8vBHkXu6".
File ID: "1b_qKz9UQpmhr5kZXMTHx_YyRKnoTzWnn".
File ID: "12IMp1MjCkfaPAbELsLHNzjSEFGDKCs3p".
File ID: "1ci0ezqsjKp7oosSAasvneX9VfeCZoT-X".
File ID: "1VjH_WECBYJX4ZrrrTSZPtX53dyfvw9R1".
File ID: "1l-9voKQ2BCY5NLF17NOz3n4vtb2nVNrQ".
File ID: "1zi1R07Ybky4lMnewW1-KfmVctcZM5O7s".
1700
File ID: "12laswlmBdOyM5xdsqsRQ5X9ZeiPwqXlF".
File ID: "1X9a7QprA6TQwn8cWM6gej-PH5JrgmpXW".
File ID: "1rvSwqqCiwTLHzOUNC5w9vbHH--lBBmK0".
File ID: "1C4WBof5rL37beWn3bH8ssDBOSLIpUXM1".
File ID: "1trUFpkcIER9uEP_rrPAzmSN4E60doq-3".
File ID: "1Splv1LwHIHdNCqM0Y1ZpHNuJIbCpkV0F".
File ID: "1IiZBRB14h0qYqvo3R6FRpjNJcT7EN-iU".
File ID: "18RNRFFhJT8RbJFA0n7Ppor53FoB4WIoR".
File ID: "1aBBwc_18HusPh72UPOZGvsW29jXhzGlI".
1800
File ID: "1v1-MZI2dmo9YOt8hCICfW4BiTUdwSpMf".
File ID: "1kD_nuobJMZh0WOzVDSkUY5LWIux0a0JF".
File ID: "1x88WvmJfiN2nzCp8f4UmisZcGEGc2NJx".
1900
2000


KeyboardInterrupt: 

## Creating non fire dataset

To create the dataset for non fires we have taken a random timestamp between 2017-2022, random coordinates in the state of Oregon, and a ID number

In [19]:
# Function to generate a random timestamp between May and November of 2017-2021
def random_timestamp():
    year = random.choice(range(2020, 2021))
    month = random.choice(range(7, 12))
    day = random.choice(range(1, 30))
    return datetime.datetime(year, month, day)

# Function to generate random coordinates within Oregon
def random_coordinates():
    lat_min, lat_max = 42.0, 46.3
    lon_min, lon_max = -124.6, -116.5
    lat = random.uniform(lat_min, lat_max)
    lon = random.uniform(lon_min, lon_max)
    return [lat, lon]

# Function to generate a unique 9-digit ID number
def unique_id(existing_ids):
    id_num = random.randint(100000000, 999999999)
    while id_num in existing_ids:
        id_num = random.randint(100000000, 999999999)
    return id_num

# Generate dataset
timestamps = [random_timestamp() for _ in range(1000)]
coordinates = [random_coordinates() for _ in range(1000)]

ids = set()
unique_ids = []
for _ in range(1000):
    unique_id_num = unique_id(ids)
    ids.add(unique_id_num)
    unique_ids.append(unique_id_num)

data = {'Timestamp': timestamps, 'Coordinates': coordinates, 'ID Number': unique_ids}
df = pd.DataFrame(data)



In [20]:
df['3months'] = df['Timestamp'] - pd.DateOffset(months=3)

We then scraped the non fire images

In [26]:
# Function to get the image URL
def get_image_url(coordinates, timestamp):
    latitude, longitude = coordinates
    # convert the timestamp to a string in the format 'YYYY-MM-DD'
    date_str = timestamp.strftime('%Y-%m-%d')
    
    # convert the string to a datetime object
    date = datetime.datetime.strptime(date_str, '%Y-%m-%d')

    # Define the image collection and filter
    image_collection = ee.ImageCollection("USDA/NAIP/DOQQ") \
        .filterDate(date - datetime.timedelta(days=120), date + datetime.timedelta(days=30)) \
        .filterBounds(ee.Geometry.Point(longitude, latitude))
    
    # Check if there are any images within the 1-year window
    image_count = image_collection.size().getInfo()
    if image_count == 0:
        #print(f'No images found within 1 year of {timestamp} for coordinates {coordinates}. Skipping row.')
        return

    # Get the most recent image
    most_recent = image_collection.sort('system:time_start', False).first()

    # Define the visualization parameters
    vis_params = {
        'bands': ['R', 'G', 'B'],
        'min': 0,
        'max': 255,
    }

    # Define the region to get the image
    region = ee.Geometry.Point(longitude, latitude).buffer(250).bounds().getInfo()['coordinates']

    # Get the image URL
    image_url = most_recent.getThumbURL({
        'region': region,
        'scale': 1,  # Set the scale to match the native resolution
        'format': 'png',
        'resampling_method': 'bicubic',
        **vis_params
    })

    return image_url

# Function to save the image to Google Drive
def save_image_to_drive(image_url, file_name, folder_id=None):
    # Set up Google Drive API
    creds = service_account.Credentials.from_service_account_file('forestfiredtu-ef1cef2f2e43.json')
    service = build('drive', 'v3', credentials=creds)

    # Download the image
    response = requests.get(image_url)
    if response.status_code != 200:
        print(f"Failed to download image: {file_name}.png")
        return
    image_data = response.content

    # Save the image to a temporary file
    with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as temp_image:
        temp_image.write(image_data)
        temp_image.flush()

        # Save the image to Google Drive
        file_metadata = {
            'name': f'{file_name}.png',
            'mimeType': 'image/png'
        }

        if folder_id:
            file_metadata['parents'] = [folder_id]

        media = MediaFileUpload(temp_image.name, mimetype='image/png', resumable=True)

        try:
            file = service.files().create(body=file_metadata, media_body=media,
                                          fields='id').execute()
            print(F'File ID: "{file.get("id")}".')
        except HttpError as error:
            print(F'An error occurred: {error}')
            file = None

        # Remove the temporary file
        os.unlink(temp_image.name)

    return file
# Specify your Google Drive folder ID 
folder_id = '10tExCQHC4pEz7M8qrTBQ6EcS-lqfDDft'
j = 0
# Iterate over the DataFrame and call the functions
for index, row in df2.iterrows():
    image_url = get_image_url(row['Coordinates'], row['3months'])
    j += 1
    if j % 100 == 0:
        print(j)
    if image_url:
        save_image_to_drive(image_url, row['ID Number'], folder_id)


File ID: "1PTaAiyPQo-kUwHIYx97W-oCBKpy_Z_r7".
File ID: "1StEkr4I5eG1jGcfnb0iZgnN_ca5MkTPf".
File ID: "1cEAEu0dP9CUGBjW-4q0Z0-wf8v071vM9".
File ID: "14T9oVBwtXo-KKGmNqup_FtZ_Pckcik_w".
File ID: "1FRXkb32Eo-oStUZSZSj1KBxlNJo-LCIi".
File ID: "1MUvpn-45MFBeefMrwre-XpdRrJ2rm2q9".
File ID: "1dIh4nUnNH0s3drZRm0JPi2zfLqUihhnd".
File ID: "1FkLE1zRWYOc5_nRaGVpoo5xOc6YcXA5C".
File ID: "1VUzQPcT4dta_9I2VH9MLdBJTG5M6TxMm".
File ID: "1Y-fzxFhTGPvz88u5MWe_u0sm2Mmqf9HX".
File ID: "1I-5Qhc9xTzzGCdI0HmjHl69YULakP3Yq".
File ID: "1zGw95b9wtH6Dlj_uvXOvefzwLpPaRKdj".
File ID: "10-scKOcxP88RATVFH8ghRfZq5Be_1WJY".
File ID: "1VCAPjfm_kHLSWA7bHnY4lyrox-ZtJpIe".
File ID: "1cTCfiak2T8kvUqW-ODgoMI_HQ8GtMp2G".
File ID: "1YiXieLj8J-LXluDdWL2uhgzvrVxCXDX3".
File ID: "1rWBYi3wLrxruQOFahTlDOlErY8hZCJ9R".
File ID: "1mI84scZCiFo4FxTOL8VWS2LEMMDg61Kv".
File ID: "1U9rc3qOwon67-YR4on0Ex_SzCa2FeHJS".
File ID: "1dRLxUc3qJezps029ntCm-AsRBbgursDc".
File ID: "13J3UjssmWU_egDO8GzdUUtOALMVeLxza".
File ID: "1ggxvjDota9qhL5BJW3MjFUe

This notebook showed how we used the Earth Engine API to create our dataset, that we will use in the rest of the project to predict fire hazard in Oregon.