# Processing VIIRS Data

The VIIRS data originates from the Fire Information for Resource Management System, which is a NASA platform that provides near real-time active fire locations across the world. This data is downloaded in the form of a csv containing the spatiotemporal location of a given fire. As such, it is point process data.

In order to make use of such information, we have to turn it into pixelized raster information. As such, I modified an existing script to run it through the pipeline. This script relies on a clustering algorithm called DBSCAN, which is a method to group sets of points via their location in either geographic or feature space. The advantage of DBSCAN is that it does not need a pre-specified number of clusters (as opposed to K-means, for example). 

The general outline of this pipeline is as follows:

- Connect to FIRMS API using a unique identifier given by NASA
- Download all data from 2020
- Run the rasterization algorithm on each date, first removing all daytime observations per Matt Jolly's suggestion
- Save both the boolean labeled mask and continuous Fire Radiative Power metric to disk

In [1]:
import pandas as pd
import urllib.request as request
from datetime import datetime
import geopandas as gpd
import glob
from rioxarray.merge import merge_arrays
import numpy as np



from src.data_sources import cluster_fires, create_chip_bounds, fires_from_topleft



In [2]:
# Write loop to pull all data from 2021

In [2]:
bbox = "-124.848974,24.396308,-66.885444,49.384358"
FIRMS_API_KEY = '58ee6e88ea288308039c476b13723cb7'
dates_2019 = pd.date_range(start='2019-01-01', end='2019-12-31').strftime('%Y-%m-%d').tolist()

In [2]:
def get_viirs_data(date_list, api_key, bbox, path):
    '''
    Connect with FIRMS API to access VIIRS detection data from specific dates and areas
    
    :param date_list: list, each date encoded in '%Y-%m-%d'
    :param api_key: str, from NASA email
    :param bbox: str, bbox of the region of interest
    :param path: str, path of export location
    '''
    
    base_url = 'https://firms.modaps.eosdis.nasa.gov/api/area/csv/'

    for date in date_list:
        url = f'{api_key}/VIIRS_SNPP_SP/{bbox}/1/{date}'
        
        fname = path + 'VIIRS_' + date + '.csv'
        
        request.urlretrieve(base_url + url, filename = fname)
    
# request.urlretrieve(f'https://firms.modaps.eosdis.nasa.gov/api/area/csv/{FIRMS_API_KEY}/VIIRS_SNPP_SP/{bbox}/1/2020-07-10', filename = 'temp/__VIIRS_2020-07-10.csv')

In [6]:
years = ['2010','2011','2012']
for y in years:
    export_path = y + '/'
    start_date, end_date = y + '-01-01', y + '-12-31'
    
    date_list = pd.date_range(start = start_date, end = end_date).strftime('%Y-%m-%d').tolist()
    
    get_viirs_data(date_list, FIRMS_API_KEY, bbox, export_path)
    
    print(y + ' Done.')

KeyboardInterrupt: 

In [6]:
def create_date_time(df):
    date = df['acq_date']
    hour = [str(h) for h in pd.to_datetime(df['acq_time'], format = '%H%M').dt.hour]
    
    df['acq_date_hour'] = [d + '_' + h for d,h in zip(list(date), hour)]
    return df

def process_viirs(csv_list, export_path):
    i = 0
    string = '='
    
    mod = np.floor(len(csv_list) / 100)
    
    for each_csv in csv_list:
        
        data = pd.read_csv(each_csv)
        
        if data.empty == True:
            continue

        # Filter by night time
#         data = data[data.daynight == 'N']
        
        # Create "clusterable" date, hour column
        data = create_date_time(data)
        
        gdf = gpd.GeoDataFrame(
                data, geometry=gpd.points_from_xy(data.longitude, data.latitude), crs="EPSG:4326") 
   
        clustered_fires = cluster_fires(gdf, min_cluster_points=25, timescale = 'hour')
        
                
        if not isinstance(clustered_fires, pd.DataFrame):
            continue
            
        chip_bounds = create_chip_bounds(clustered_fires)
    
        fires_from_topleft(chip_bounds,  gdf, 
                           fname_suffix = '_Num', path = str(export_path + 'bool/'),
                           export = True, variable = 'bool', return_merged = False)
        
        # Also export frp
        fires_from_topleft(chip_bounds, gdf, 
                           fname_suffix = '_Num', path = str(export_path + 'frp/'),
                           export = True, variable = 'frp', return_merged = False)
        
        if i % mod == 0:
            string += "="
            print(string)
            
            if len(string) >= 115:
                string = ''
        


In [3]:
import os

os.chdir('../Data/Training/VIIRS')

!ls

[34mRasterized[m[m [34mRaw[m[m


The new algorithm does not make variable sizes
- Multiple fires still occur within one single raster image
- Unclear what Chip Bounds is doing
- THINK

In [8]:
??fires_from_topleft

In [10]:
years = ['2021']
    
    #'2017','2018','2019', '2020', 
for y in years:
    csv_list = glob.glob('Raw/' + y + '/*.csv')
    
    export_path = 'Rasterized/Individual_Fires/variable_size/' + y + '/'
    if not os.path.exists(export_path):
        os.mkdir(export_path)
        
    process_viirs(csv_list, export_path = export_path)
    
    print(y + ' Done.')

==
===
====
=====
=
==
===
====
=====


=
==
===
====
=====


=
==
===
====
=====
2021 Done.


It appears, from examining the export directory, that the pipeline is not exporting the correct data. My hunch is that the output list encompasses more than just one date / hour observational event. From the above print statement, I see that, for example, 2017-08-05 has 5 observations at 8 AM (which is another bug),and 5 observations at 10AM.

Need to:
- Examine a sample VIIRS csv to see when the observation was made, shouldn't this be only nighttime observations??
- Perhaps modify the original Satellite VU fires_from_topleft function to export each individual fire as a separate TIF
- Run through my modified fires_from_topleft algorithm, step by step, to see what I'm actually doing


Checking the length of the resultant training images:

In [28]:
! ls ../Rasterized/COG

[34m2012[m[m [34m2013[m[m [34m2014[m[m [34m2015[m[m [34m2016[m[m [34m2017[m[m [34m2018[m[m [34m2019[m[m [34m2020[m[m [34m2021[m[m


In [38]:
x = 0
for year in range(2012,2022):
    
    length =  len(glob.glob('../Rasterized/COG/' + str(year) + '/bool/*.tif'))
    print(str(year) + ': ' + str(length))
    x += length
    
    
x

2012: 304
2013: 184
2014: 194
2015: 161
2016: 167
2017: 307
2018: 202
2019: 68
2020: 291
2021: 251


2129

In [36]:
sum(lengths)

1878

In [None]:
len(glob.glob('../Data/Training/VIIRS/Rasterized/2021/bool/*.nc'))

Total of about 500 images, might not be enough to train the UNET

# Scratch

Modifying fires_from_topleft to export a separate tif for each fire, rather than all fires in CONUS within a given day/ hour time frame

In [37]:
# Modified - it will export each individual fire
def fires_from_topleft(chip_bounds, fires, fname_suffix = '', 
                       export = False, path = '', variable = 'bool',
                       return_merged = True):
    """
    Given input chip DataFrame, link with raw VIIRS input to receive FRP,
            rasterizing the points
    
    :param chip_bounds: pandas DF containing each chip bound
    :param fires: gpd.GeoDataFrame or filename of original VIIRS data
    :param fname_suffix: str, indicates the suffix for the filename
    :param export: Bool, indicates whether to export or to return the list
    :param path: str, indicates path of export location
    :param variable: Str, either "bool" or "frp", indicates whether to export
            the boolean mask or a continuous measurement of fire radiative power
    :param return_merged: Bool, indicates whether to export all fires in CONUS within a given
            day/hour time frame, or to export each individual fire as a separate TIF
    :return: list of xarray.Datasets which each contain a raster that indicates multiple 
            fires for each date
    """
    
    # Extract each date where there was an observation
    list_of_dates = chip_bounds.date.unique()
    
    # Define output list
    output_list = []
    
    for each_date in list_of_dates:
        # Reset output every day to get a new raster each date
        output_raster_img = None
        
        samples_in_that_day = chip_bounds[chip_bounds['date'] == each_date]
        
        # Grab bounding box for each sample
        for _, sample in samples_in_that_day.iterrows():
            
            top_left = [sample['top'],sample['left']]
            epsg_code = str(sample['epsg'])
            date_to_query = sample['date']
                                        
                                       
    
            aoi = bounds_to_geojson(
                rasterio.coords.BoundingBox(
                    left=top_left[1],
                    right=top_left[1] + 32000,
                    bottom=top_left[0] - 32000,
                    top=top_left[0],
                )
            )
        
            # reproj the bbox from utm to 4326
            utm_to_wgs84_transformer = pyproj.Transformer.from_crs(
                epsg_code, 4326, always_xy=True
            ).transform
            aoi_wgs84 = shapely_tf(utm_to_wgs84_transformer, shape(aoi))

            # load fire data intersecting chip bbox
            # --> Loads a gpd file for each sample, super inefficient, could be improved

            if isinstance(fires, str):
                fires_in_chip = gpd.read_file(fires, layer="merge", bbox=aoi_wgs84)
            else:
                chip_poly = gpd.GeoDataFrame(geometry=[aoi_wgs84], crs="EPSG:4326")
                fires_in_chip = fires[fires["acq_date"] == date_to_query].clip(chip_poly)

            fires_in_chip = fires_in_chip[fires_in_chip["acq_date"] == date_to_query]

            if fires_in_chip.empty:
                # possible if fire dies "next day"
                fires_in_chip = gpd.GeoDataFrame(geometry=[aoi_wgs84.centroid], crs="EPSG:4326")
                fires_in_chip["bool"] = 0
                fires_in_chip["frp"] = 0
            else:
                fires_in_chip["bool"] = 1
                fires_in_chip["frp"] = pd.to_numeric(fires_in_chip["frp"])

            bbox_4326, utm_crs = buffer_point(
                aoi_wgs84.centroid, buffer_m=15750, output_4326=True
            )
            bbox_4326_geojson = json.dumps(mapping(transform(lambda x, y: (y, x), bbox_4326)))

            # Reproject fires_in_chip to GOES CRS
            fires_in_chip = fires_in_chip.to_crs('EPSG:4326')
            

            # If output_raster_img is empty (i.e., new date)
            if output_raster_img is None:
                
                # rasterize
                output_raster_img = make_geocube(
                    vector_data=fires_in_chip,
                    measurements=["bool", "frp"],
                    resolution=(-500, 500),
                    output_crs=str(sample['epsg']),
                    fill=0,
                    geom=bbox_4326_geojson,
                )
                
                # Reproject to 4326
                output_raster_img = output_raster_img.rio.reproject("EPSG:4326")
                
                
            # Embedd raster with date info
            output_raster_img.attrs = {'date': each_date}

        



        if export == True:
            
            if return_merged == True:
                if variable == "bool":
                    output_list.append(output_raster_img.bool)

                else: 
                    output_list.append(output_raster_img.frp)

                merged_arrays = merge_arrays(output_list, crs = "EPSG:4326", nodata = np.nan)
                merged_arrays.attrs = {}
                merged_arrays.to_netcdf(path + "VIIRS_Rasterized" + str(each_date) + fname_suffix + '.nc')
                
            elif return_merged == False:
                num_fires = 1
                
                if variable == "bool":
                    fname_suffix  = 'number' + str(num_fires) + fname_suffix
                    output = output_raster_img.bool
                    output.attrs = {}
                    output.to_netcdf(path + "VIIRS_Rasterized" + str(each_date) + fname_suffix + '.nc')

                else: 
                    fname_suffix  = 'number' + str(num_fires) + fname_suffix
                    output = output_raster_img.frp
                    output.attrs = {}
                    output.to_netcdf(path + "VIIRS_Rasterized" + str(each_date) + fname_suffix + '.nc')
                
        

            
    return output_list

In [6]:
# Original, it merges every fire from a day/hour pair into one big raster
def fires_from_topleft(chip_bounds, fires, fname_suffix = '', export = False, path = '', variable = 'bool'):
    """
    Given input chip DataFrame, link with raw VIIRS input to receive FRP,
            rasterizing the points
    
    :param chip_bounds: pandas DF containing each chip bound
    :param fires: gpd.GeoDataFrame or filename of original VIIRS data
    :param fname_suffix: str, indicates the suffix for the filename
    :param export: Bool, indicates whether to export or to return the list
    :param path: str, indicates path of export location
    :param variable: Str, either "bool" or "frp", indicates whether to export
            the boolean mask or a continuous measurement of fire radiative power
    :return: list of xarray.Datasets which each contain a raster that indicates multiple 
            fires for each date
    """
    
    # Extract each date where there was an observation
    list_of_dates = chip_bounds.date.unique()
    
    # Define output list
    output_list = []
    
    for each_date in list_of_dates:
        # Reset output every day to get a new raster each date
        output_raster_img = None
        
        samples_in_that_day = chip_bounds[chip_bounds['date'] == each_date]
        
        # Grab bounding box for each sample
        for _, sample in samples_in_that_day.iterrows():
            
            top_left = [sample['top'],sample['left']]
            epsg_code = str(sample['epsg'])
            date_to_query = sample['date']
                                        
                                       
    
            aoi = bounds_to_geojson(
                rasterio.coords.BoundingBox(
                    left=top_left[1],
                    right=top_left[1] + 32000,
                    bottom=top_left[0] - 32000,
                    top=top_left[0],
                )
            )
        
            # reproj the bbox from utm to 4326
            utm_to_wgs84_transformer = pyproj.Transformer.from_crs(
                epsg_code, 4326, always_xy=True
            ).transform
            aoi_wgs84 = shapely_tf(utm_to_wgs84_transformer, shape(aoi))

            # load fire data intersecting chip bbox
            # --> Loads a gpd file for each sample, super inefficient, could be improved

            if isinstance(fires, str):
                fires_in_chip = gpd.read_file(fires, layer="merge", bbox=aoi_wgs84)
            else:
                chip_poly = gpd.GeoDataFrame(geometry=[aoi_wgs84], crs="EPSG:4326")
                fires_in_chip = fires[fires["acq_date"] == date_to_query].clip(chip_poly)

            fires_in_chip = fires_in_chip[fires_in_chip["acq_date"] == date_to_query]

            if fires_in_chip.empty:
                # possible if fire dies "next day"
                fires_in_chip = gpd.GeoDataFrame(geometry=[aoi_wgs84.centroid], crs="EPSG:4326")
                fires_in_chip["bool"] = 0
                fires_in_chip["frp"] = 0
            else:
                fires_in_chip["bool"] = 1
                fires_in_chip["frp"] = pd.to_numeric(fires_in_chip["frp"])

            bbox_4326, utm_crs = buffer_point(
                aoi_wgs84.centroid, buffer_m=15750, output_4326=True
            )
            bbox_4326_geojson = json.dumps(mapping(transform(lambda x, y: (y, x), bbox_4326)))

            # Reproject fires_in_chip to GOES CRS
            fires_in_chip = fires_in_chip.to_crs('EPSG:4326')
            

            # If output_raster_img is empty (i.e., new date)
            if output_raster_img is None:
                
                # rasterize
                output_raster_img = make_geocube(
                    vector_data=fires_in_chip,
                    measurements=["bool", "frp"],
                    resolution=(-500, 500),
                    output_crs=str(sample['epsg']),
                    fill=0,
                    geom=bbox_4326_geojson,
                )
                
                # Reproject to 4326
                output_raster_img = output_raster_img.rio.reproject("EPSG:4326")
                
                
            # Embedd raster with date info
            output_raster_img.attrs = {'date': each_date}

        
        if variable == "bool":
            output_list.append(output_raster_img.bool)
        
        else: 
            output_list.append(output_raster_img.frp)
      
    
        if export == True:
            
            merged_arrays = merge_arrays(output_list, crs = "EPSG:4326", nodata = np.nan)
            merged_arrays.attrs = {}
            merged_arrays.to_netcdf(path + "VIIRS_Rasterized" + str(each_date) + fname_suffix + '.nc')
            
            
    return output_list

In [None]:
# Checks for the effect of cluster size on the output training label
## Might not work after modifying fires_from_topleft, but would be easy to change.

# for num_points in [5,10,15,20,25]:
#     clustered_fires = cluster_fires(gdf, min_cluster_points=num_points)
#     chip_bounds = create_chip_bounds(clustered_fires)
#     clustered_fires.to_file('temp/clustered_fires_' + str(num_points) + '_points.shp')
    
#     chip_bounds = chip_bounds.reset_index()
    
#     for _, sample in chip_bounds.iterrows():
#         fires = fires_from_topleft([sample['top'], 
#                             sample['left']], 
#                            str(sample['epsg']), 
#                            sample['date'],
#                            gdf)
        
#         fires.to_netcdf("temp/fires_saved_" + str(num_points) + '_points.nc')
        
#     print("_______NEXT_______")