# Land Surface Temperature Calculation
The purpose of this script is to aggregate, average, and track surface temperature across each county in the state of Montana.

The data used for this script can be obtained through NASA EarthData using the [Giovanni](https://disc-beta.gsfc.nasa.gov/giovanni/) tool.

Create a NASA EarthData account if you don't already have one. Log into Giovanni with these credentials. Close any prompt that has popped up.

Ensure "Select Plot" is set to `Time Averaged Map` and set the date range from `2000-01-01` to `2020-12-31`. 

In "Select Region (Bounding Box or Shape)", click on the map icon, then the "Select a Shape" dropdown menu. Type `Montana` in the text box and select the result. Click the white "X" at the top-right of the window to close.

In the "Keyword" field, type `Surface temperature` and then click the "Search" button. The top result should be "Temperature (average surface skin) (NNLDAS_NOAH0125_M v2.0)". Look to the following columns to ensure the result you access has units "K", has temporal resolution "Monthly", and spatial resolution "0.125". After finding the proper result, click on the checkbox in the leftmost column before the "Temperature (average surface skin)" name.

At the bottom-right corner of the web browser, click the "Plot Data" button. Giovanni will load. This might take some time if other users are generating data at the same time as you. When it is your turn, the progress bar will slowly fill and text will flash showing the current task Giovanni is processing.

In [None]:
# Import necessary libraries
import os
import rasterio
import numpy as np
import geopandas as gpd
import pandas as pd

In [None]:
# Where are the files located? -- MUST be within "UMD-DataScience folder"
input_data = 'landsurfacetemp'

# Friendly name for the value we are processing - used for naming table
key_metric = 'AvgSurfT' 

# The boundaries of Montana and each county
county_shp = 'montana_shp/cb_2014_us_county_20m_Montana.shp'

# Make generic name to allow script to be used more generally
folder_path = input_data

In [None]:
# Initialize an empty dictionary to store the results for each year
results = {}

# Load the shapefile
counties = gpd.read_file(county_shp)

# Loop through the files in the folder and calculate the average land surface temperature for each county for each year
for file_name in os.listdir(folder_path):
    if file_name.endswith('.tif'):
        file_path = os.path.join(folder_path, file_name)
        # Complicated Giovanni naming scheme - getting year from filename
        year = file_name.split('.')[3][-8:] # This leaves, for example, '20000101'
        year = year[2:4] # Cuts rest out so '00' remains

        # Open the GeoTIFF file
        with rasterio.open(file_path) as src:
            # Read the data as a numpy array
            lst_array = src.read(1)
            # Replace NaN values with an absurd value to distinguish
            lst_array = np.nan_to_num(lst_array, nan=-9999)
            # Get the CRS
            transform = src.transform
            crs = src.crs

            # Reproject the shapefile to match the GeoTIFF CRS
            counties_proj = counties.to_crs(crs)

            # Extract the mean land surface temperature by county
            lst_by_county = []
            for i, county in counties_proj.iterrows():
                # Create a mask for the county
                county_mask = rasterio.features.geometry_mask([county.geometry],
                                                               out_shape=lst_array.shape,
                                                               transform=transform,
                                                               invert=True)
                # Extract the land surface temperature values for the county
                lst = lst_array[county_mask]
                # Remove the invalid values (-9999 in this case)
                lst = lst[lst != -9999]
                # Calculate the mean land surface temperature and round to 2 decimal places
                mean_lst = round(np.mean(lst), 2)
                # Append the results to a list
                lst_by_county.append({'NAME': county['NAME'],
                                      'GEOID': county['GEOID'],
                                      'GEOMETRY': county['geometry'],
                                      key_metric + f'_{year}': mean_lst})

        # Convert the list to a pandas dataframe
        df = pd.DataFrame(lst_by_county)

        # Append the dataframe to the results dictionary
        if year in results:
            # Merge the dataframe with the existing dataframe for this year
            df = df.rename(columns={col: f"{col}_{year}" for col in df.columns if col not in ["NAME", "GEOID", "GEOMETRY"]})
            results[year] = pd.merge(results[year], df.drop(["NAME", "GEOID", "GEOMETRY"], axis=1), on="GEOID")
        else:
            # Add the dataframe as a new entry in the dictionary
            results[year] = df

# Concatenate the dataframes for all years into a single dataframe
df_all_years = pd.concat([results[year] for year in results], axis=1)

# Drop duplicate columns
df_all_years = df_all_years.loc[:,~df_all_years.columns.duplicated()]

# Re-order years to be in increasing order
df_all_years = df_all_years.reindex(sorted(df_all_years.columns), axis=1)
mean_lst_cols = sorted([col for col in df_all_years.columns if col.startswith(key_metric)])
df_all_years = df_all_years[["NAME", "GEOID", "GEOMETRY"] + mean_lst_cols]

In [None]:
# Display output of processing as a sanity check
df_all_years

In [None]:
# Once happy with the above results, save the data to a file for compilation
df_all_years.to_csv(input_data + '.csv', index=True)