# Working with GOSIF data
[GOSIF](https://doi.org/10.3390/rs11050517) is a science data product generated by Dr. Jingfeng Xiao's group that estimates SIF with global coverage at 0.05° (~6km/pixel) resolution on an 8-day cadence. The estimates of SIF are derived from a data-based approach that combines data from the Moderate Resolution Imaging Spectroradiometer (MODIS) instruments onboard the Terra and Aqua spacecraft with OCO-2 SIF measurements and MERRA-2 meteorological model data. MODIS data from Terra and Aqua are an invaluable resource for climate analysis because they provide a 25-year record of daily global coverage imaging across 36 spectral bands. The researchers combined these data with OCO-2 SIF soundings and MERRA-2 model outputs to train a Cubist regression tree model that can predict SIF for a MODIS 0.05° grid. Importantly, GOSIF is able to provide these predictions from 2000 up to the present (although data is presently available up to 2023), meaning it includes a 14-year period before OCO-2 even launched. 

In this exercise, we will download and view SIF data, then compare its accuracy with direct SIF soundings from OCO-2 or OCO-3. Afterwards, we will see how GOSIF can be used in analysis.

In [None]:
import gzip
from http.server import HTTPServer, SimpleHTTPRequestHandler
from IPython.display import IFrame
import json
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import os
import numpy as np
import rasterio
from rasterio.plot import show
import shutil
import socket
import sys
import threading

# Add src directory containing helper code to sys.path
sys.path.append(os.path.abspath("../src"))

from geosif import download_gosif_granule

## I. Downloading GOSIF granules from UNH
First, we will download a GOSIF granule from the University of New Hampshire (UNH) data store maintained by Dr. Xiao's research group. GOSIF products are created at annual, monthly, and 8-day time steps. If you give the function in the cell below a year value only (i.e., no month or day) it will download the annual product for that year, if available. Similarly, providing a year and a month will download the monthly product, and a year, month and day together will download the closest 8-day product. 

In [None]:
# Download June 2020 Monthly Average GOSIF data
year = 2020
month = 7

output_dir = "data/"
# The file is a .gz (gzip) archive, so it will need to be extracted before we can use it
gosif_gz = download_gosif_granule(year, month, output_dir=output_dir)

# Strip .gz file extension from the downloaded file to get the output (extracted) filename
gosif_geotiff = os.path.splitext(gosif_gz)[0]
with gzip.open(gosif_gz, 'rb') as f_in:
    with open(gosif_geotiff, 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
print(f"Unpacked geotiff file: {gosif_geotiff}")

## II. Transforming a GOSIF granule into a format suitable for viewing
UNH provides GOSIF in GeoTIFF format, a common file format for geospatial data. While it is possible to view colormapped GeoTIFF files in GIS software like QGIS, its default encoding is greyscale with no transparency layer for regions with no data, such as over oceans and waterways. We will therefore convert the granule you downloaded in the previous step into PNG format with a colormap "baked in", meaning the SIF grid points will be quantized to 8-bit. This PNG will be much easier to view in the map viewer in the next step.

In [None]:
def convert_geotiff_to_png(geotiff_path, output_png_path, output_metadata_path, threshold=32760):
    with rasterio.open(geotiff_path) as src:
        data = src.read(1)  # Read the first band
        
        # Get metadata for georeferencing
        metadata = {
            "bounds": src.bounds._asdict(),
            "width": src.width,
            "height": src.height,
            "crs": src.crs.to_string()
        }
        
        mask = data > threshold
        valid_data = np.ma.masked_array(data, mask)
        vmin = np.nanmin(valid_data)
        vmax = np.nanmax(valid_data[valid_data <= threshold])
        norm_data = colors.Normalize(vmin=vmin, vmax=vmax)

        dpi = 360
        width_inches = src.width / dpi
        height_inches = src.height / dpi

        # Create a masked version where values > threshold will be transparent
        cmap = plt.cm.viridis.copy()
        cmap.set_bad(alpha=0)  # Set masked values to be transparent
        
        fig = plt.figure(figsize=(width_inches, height_inches), dpi=dpi)
        ax = plt.Axes(fig, [0, 0, 1, 1])  # No margins
        ax.set_axis_off()
        fig.add_axes(ax)
        ax.imshow(valid_data, cmap=cmap, norm=norm_data, interpolation="nearest", aspect="auto")
        plt.savefig(
            output_png_path, 
            dpi=dpi,
            bbox_inches="tight", 
            pad_inches=0, 
            transparent=True
        )
        plt.close(fig)
        
        # Save the metadata as JSON
        with open(output_metadata_path, "w") as f:
            json.dump(metadata, f)
        
        print(f"Converted {geotiff_path} to {output_png_path} with metadata at {output_metadata_path}")

# Example usage
convert_geotiff_to_png("data/GOSIF_2020.M06.tif", "data/GOSIF_2020.M06.png", "data/GOSIF_2020.M06_metadata.json")

Now we will open the converted image in the imaging viewing webapp. If you would like to view this visualization in a separate tab, open the link that will be printed when you run this cell.

In [None]:
def is_port_in_use(port):
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        try:
            s.bind(('', port))
            return False
        except socket.error:
            return True


def run_server(port):
    if is_port_in_use(port):
        return None
    server_address = ('', port)
    httpd = HTTPServer(server_address, SimpleHTTPRequestHandler)
    thread = threading.Thread(target=httpd.serve_forever)
    thread.daemon = True
    thread.start()
    return httpd

# Start the server
port = 5500
run_server(port)

# gosif.html loaded on an http server, then displayed in an iframe. You can also
# load the page separately in your browser to view with the full window size.
url = f"http://localhost:{port}/gosif.html"
print(f"You can also view this map by copying this address into a new tab: {url}")
IFrame(src=url, width=1200, height=800)