# Meteorological Data Collection
To collect meteorological data for starting points of wildfires, we used [NASA POWER APIs](https://power.larc.nasa.gov/docs/services/api/). 
POWER stands for Prediction Of Worldwide Energy Resources and the goal of the project is to provide solar & meterorological data from NASA research for support of renewable energy, building energy efficiency, & agricultural needs.

Terms of Use:
> These data were obtained from the NASA Langley Research Center POWER Project funded through the NASA Earth Science Directorate Applied Science Program.

The meteorological data/parameters in POWER are **space-based** and come from two models: 
* [(GMAO MERRA-2)](https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/) that stands for the Goddard’s Global Modeling and Assimilation Office Modern Era Retrospective-Analysis for Research and Applications
* [GEOS 5.12.4](https://gmao.gsfc.nasa.gov/news/geos_system_news/2016/FP-IT_NRT_G5.12.4.php): GMAO Forward Processing – Instrument Teams (FP-IT) Near Real Time (NRT) products. 
The difference between two models is that MERRA-2 is better postprocessed, but not available for the last 1-2 months. Wildfires data covers a range from from July 2020 to January 2023, so probably s couple of the most recent observations came from GEOS FP-IT model because the meteorological data was queried on 01-02-2023.

POWER data has global coverage and organized as a grid:
* For the meteorological datasets spatial resolutions (grid cell size) are ½° latitude by ⅝° longitude (~50 km)
* Precipitation resolution is 0.1° x 0.1 (10 km)

More details on data are available on NASA POWER documentation [page](https://power.larc.nasa.gov/docs/methodology/meteorology/).

In [1]:
# Import
import pandas as pd
import numpy as np

import requests
import re
import io

from datetime import timedelta

# importing sys
import sys
 
# adding Config file
sys.path.insert(0, '../config/')

from config import Config

In [2]:
# Load Wildfire Dataset
df = pd.read_csv('../../data/processed/wildfire.csv')

# Convert dates into pandas datetime
df['FireDiscoveryDateTime'] = pd.to_datetime(df['FireDiscoveryDateTime'], infer_datetime_format=True, errors = 'coerce')
df['ControlDateTime'] = pd.to_datetime(df['ControlDateTime'], infer_datetime_format=True, errors = 'coerce')

In [3]:
# Checking dimensions
df.shape

(21541, 20)

During the meteorological data collection we gathered the following datasets:
* Basic meteorological metrics for wildfire date range, including temperature, wind, relative air humidity
* Historical precipitation for preceding 6 months of the fire start date
* Extended meteorological data like UV index and soil humidity.

### Basic Meteorology data

In [33]:
def build_request_param(id, lat, long, start_date, end_date, result_df, base_uri, date_filter='%Y%m%d'):
    """Generate api request and process results to store in pandas dataframe form

    Args:
        id (int): Wildfire id
        lat (float): Latitude of fire
        long (float): Longitude of fire
        start_date (Datetime): Fire start date
        end_date (Datetime): Fire end date
        result_df (DataFrame): Result dataframe
        base_uri (string): Base api uri

    
        date_filter (str, optional): _description_. Defaults to '%Y%m%d'.

    Returns:
        DataFrame: result dataframe
    """
    # Parameters
    params = f"latitude={lat}&longitude={long}&start={start_date.strftime(date_filter)}&end={end_date.strftime(date_filter)}"

    # Attach params to base uri
    base_uri += params

    try:
        # Call NASA endpoint
        res = requests.get(base_uri)

        # Extract table from csv format
        split_text = res.text.split("-END HEADER-")

        # Check if response has result for the fire
        if len(split_text) >= 2:
            # Convert text to dataframe
            response_df = pd.read_csv(io.StringIO(split_text[1]))

            # Add necessary columns
            response_df["LAT"] = lat
            response_df["LONG"] = long
            response_df["PID"] = id

            # Concat to result dataframe
            result_df = pd.concat([result_df, response_df])

            return result_df
            
        # if endpoint has no result than just return result dataframe
        return result_df
    except ConnectionError:
        print('Error: Connection error during api call. Please check!')

In [28]:
# Create nasa dataframe to hold results from endpoint calls
nasa_df = pd.DataFrame()

# Define weather parameters needs to be pulled
parameters = "parameters=T2M,T2M_MAX,QV2M,PRECTOTCORR,WS2M,WS2M_MAX,WS10M,WS10M_MAX,GWETTOP,GWETPROF"

# Adjust base uri to include weather parameters
base_uri = f"https://power.larc.nasa.gov/api/temporal/daily/point?time-standard=lst&header=true&format=csv&community=sb&{parameters}&"

# Call endpoint by passing each wildfire
for i in df.index:
    nasa_df = build_request_param(
        i,
        df.loc[i, "InitialLatitude"],
        df.loc[i, "InitialLongitude"],
        df.loc[i, "FireDiscoveryDateTime"],
        df.loc[i, "ControlDateTime"],
        nasa_df,
        base_uri,
    )

In [30]:
# Store weather data to folder
nasa_df.to_csv(Config().get_raw_meteorology_path("nasa_weather"), index=False)

### Extra meteorology measurements
For this one and historical precipitation, we used locations of fires that lasted longer than 1 day and affected more than one acre.

In [24]:
# Filter wildfire to big fires lasting more than 1 day
filtered_df = (df[df['DailyAcres'] > 1]) 
filtered_df = filtered_df[((filtered_df['ControlDateTime'] - filtered_df['FireDiscoveryDateTime']) > timedelta(days=1))]
filtered_df.shape

(2863, 20)

In [54]:
filtered_df.head(1)

Unnamed: 0,X,Y,ContainmentDateTime,ControlDateTime,DailyAcres,DiscoveryAcres,FireCause,FireDiscoveryDateTime,IncidentTypeCategory,IncidentTypeKind,InitialLatitude,InitialLongitude,IrwinID,LocalIncidentIdentifier,POOCounty,POODispatchCenterID,POOFips,POOState,UniqueFireIdentifier,id
1,-115.748812,40.617506,2020-08-03 23:00:00+00:00,2020-09-02 15:00:00+00:00,5985.9,5.0,Natural,2020-07-19 23:00:00+00:00,WF,FI,40.602563,-115.719777,{91E0CBAB-A24E-4590-B6C6-2B4A46907E8A},10145,Elko,NVEIC,32007,US-NV,2020-NVECFX-010145,1


In [34]:
# Define result df for big fires
nasa_big_fire_df = pd.DataFrame()

# Define weather parameters for big fires
parameters = "parameters=T2MDEW,T2MWET,RH2M,CLRSKY_SFC_PAR_TOT,ALLSKY_SFC_PAR_TOT,ALLSKY_SFC_UV_INDEX"

# Attach parameters to base uri
base_uri = f"https://power.larc.nasa.gov/api/temporal/daily/point?time-standard=lst&header=true&format=csv&community=sb&{parameters}&"

# Call endpoint for each fire
for i in filtered_df.index:
    nasa_big_fire_df = build_request_param(
        filtered_df.loc[i, 'id'],
        filtered_df.loc[i, "InitialLatitude"],
        filtered_df.loc[i, "InitialLongitude"],
        filtered_df.loc[i, "FireDiscoveryDateTime"],
        filtered_df.loc[i, "ControlDateTime"],
        nasa_big_fire_df,
        base_uri,
    )

In [35]:
nasa_big_fire_df.head(1)

Unnamed: 0,YEAR,MO,DY,T2MDEW,T2MWET,RH2M,CLRSKY_SFC_PAR_TOT,ALLSKY_SFC_PAR_TOT,ALLSKY_SFC_UV_INDEX,LAT,LONG,ID
0,2020,7,19,3.02,13.91,27.31,155.74,135.5,2.55,40.602563,-115.719777,1


In [36]:
# Store weather for big fire
nasa_big_fire_df.to_csv(Config().get_raw_meteorology_path("nasa_weather_extra"), index=False)

### Historical precipitation for 6 months preceding to the fire start date

In [None]:
# Define dataframe to store 6 months rain and snow in the past
nasa_last_180_prec_df = pd.DataFrame()

# Define parameters
parameters = "parameters=PRECTOTCORR_SUM,PRECSNO"

# Attach params to base uri
base_uri = f"https://power.larc.nasa.gov/api/temporal/monthly/point?header=true&format=csv&community=sb&{parameters}&"

# Call api 
for i in filtered_df.index:
    nasa_last_180_prec_df = build_request_param(
        filtered_df.loc[i, 'id'],
        filtered_df.loc[i, "InitialLatitude"],
        filtered_df.loc[i, "InitialLongitude"],
        filtered_df.loc[i, "FireDiscoveryDateTime"] - pd.tseries.offsets.DateOffset(months=6),  # Substract 6 months from fire discovery day
        filtered_df.loc[i, "FireDiscoveryDateTime"],
        nasa_last_180_prec_df,
        base_uri,
        date_filter='%Y'
    )

In [82]:
# Store result into data folder
nasa_last_180_prec_df.to_csv(
    Config().get_raw_meteorology_path("nasa_weather_last_180days"), index=False
)