# Preprocess Data

Before running any analysis on the data, let's process it and fit it to the lower contiguous United States using a [shapefile from the US Census](https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html) (2024 > 1 : 500,000 > cb_2024_us_region_500k > cb_2024_us_region_500k.shp). We can then process this as follows:

In [None]:
# import necessary packages
import geopandas as geop
import xarray as xarr
import rioxarray as riox

# define file names
dataset = "rough_CONUS_precipitation_temperature_1988.nc"
shapefile = "./data/cb_2024_us_state_500k/cb_2024_us_state_500k.shp"

# load datasets
roughData = xarr.open_dataset(dataset)
conus_shapefile = geop.read_file(shapefile)

# filter to continental US; filter by two STUSPS, two-letter abbreviation for states and territories
conusSeparate = conus_shapefile[~conus_shapefile['STUSPS'].isin(['AK', 'HI', 'GU', 'MP', 'PR', 'VI', 'AS'])]

# dissolve to make continuous boundary
conus_boundary = conusSeparate.dissolve()

# now, write the same coordinate reference system 
roughData = roughData.rio.set_spatial_dims(x_dim="longitude", y_dim='latitude')
cleanData = roughData.rio.write_crs(conus_boundary.crs)


## GeoPandas to GeoJSON

Now we need to ensure we can clip it to fit the border of the continental United States (including those partially on the border).

In [None]:
from shapely.geometry import mapping

cleanData_clipped = cleanData.rio.clip(conus_boundary.geometry.apply(mapping))