This notebook adds a column for the land coverage type, e.g. ```"Deciduous Forest"```, and coverage type group, e.g. ```"Forest"```, to each granule's .csv file in the ```data/``` folder. Land coverage types are according to the Multi-Resolution Land Characteristics Consortium.

In [3]:
import os
import geopandas as gp
import pandas as pd
import rasterio
from shapely.geometry import Point

Begin by downloading a land cover map of New York State at ~60m resolution from MRLC. This takes a minute, so the work is not repeated if the file already exists locally.

In [27]:
if not os.path.exists("test_image2.geotiff"):
    from owslib.wms import WebMapService
    bbox = (-79.7633786294863,40.502009391283906,-71.85616396303963,45.01550900568005) # bounding box of new york.
    factor = 111139/30 # change-in-latitude to meters, divided by 30 m data resolution
    factor /= 2  # otherwise we exceed memory limits for the server.
    size = (int(factor*(bbox[2]-bbox[0])), int(factor*(bbox[3]-bbox[1])))
    wms = WebMapService('https://www.mrlc.gov/geoserver/mrlc_display/wms')
    img = wms.getmap(   layers=['CONUS_Land_Cover'],
                        srs='EPSG:4326',
                        bbox=bbox, 
                        size=size,
                        format='image/geotiff',
                        timeout = 300) # 5 minute timeout; took 4 minutes to run for me.
    out = open('test_image2.geotiff', 'wb')
    out.write(img.read())
    out.close()

The land cover map has meaningless RGB color values at each pixel. We want to convert these to string names of coverage types, like ```"Open Water"```.

In [43]:
# Luke's legend with locations of each land cover type
legendItems = {'Open Water':(40.651213,-74.157168), # didn't see any "perennial ice/snow" (pale blue)
               'Developed Open Space':(40.807089,-74.275723), 'Developed Low Intensity':(40.6979894,-73.7076795),
               'Developed Medium Intensity':(40.746664,-73.698380), 'Developed High Intensity':(40.70252,-73.90834), 
               'Barren Land (Rock/Sand/Clay)':(40.744854,-72.826518), # didn't see any "unconsolidated shore" (white)
               'Deciduous Forest':(41.21937,-74.43329), 'Evergreen Forest':(44.12911,-74.01368), 'Mixed Forest':(41.1450495,-74.5089763), 
               'Grassland/Herbaceous':(42.777063,-76.876228), 'Shrub/Scrub':(42.7724337,-76.8857693),
               'Pasture/Hay':(42.761178,-76.981478), 'Cultivated Crops':(42.71250,-77.02432),
               'Woody Wetlands':(40.90833,-74.30446), 'Emergent Herbaceous Wetlands':(40.611221,-73.687330)}

# fix the coordinates
for covername in legendItems.keys():
    lat, lon = legendItems[covername]
    legendItems[covername] = (lon, lat)

legendItems

{'Open Water': (-74.157168, 40.651213),
 'Developed Open Space': (-74.275723, 40.807089),
 'Developed Low Intensity': (-73.7076795, 40.6979894),
 'Developed Medium Intensity': (-73.69838, 40.746664),
 'Developed High Intensity': (-73.90834, 40.70252),
 'Barren Land (Rock/Sand/Clay)': (-72.826518, 40.744854),
 'Deciduous Forest': (-74.43329, 41.21937),
 'Evergreen Forest': (-74.01368, 44.12911),
 'Mixed Forest': (-74.5089763, 41.1450495),
 'Grassland/Herbaceous': (-76.876228, 42.777063),
 'Shrub/Scrub': (-76.8857693, 42.7724337),
 'Pasture/Hay': (-76.981478, 42.761178),
 'Cultivated Crops': (-77.02432, 42.7125),
 'Woody Wetlands': (-74.30446, 40.90833),
 'Emergent Herbaceous Wetlands': (-73.68733, 40.611221)}

In [44]:
names = list(legendItems.keys())
locations = [legendItems[name] for name in names]
rgbs = list(src.sample(locations))
rgbs = [tuple(a) for a in rgbs]   # since the arrays are unhashable
rgb2name = {rgb: name for rgb, name in zip(rgbs, names)}
rgb2name

{(71, 107, 160): 'Open Water',
 (221, 201, 201): 'Developed Open Space',
 (216, 147, 130): 'Developed Low Intensity',
 (237, 0, 0): 'Developed Medium Intensity',
 (170, 0, 0): 'Developed High Intensity',
 (178, 173, 163): 'Barren Land (Rock/Sand/Clay)',
 (104, 170, 99): 'Deciduous Forest',
 (28, 99, 48): 'Evergreen Forest',
 (181, 201, 142): 'Mixed Forest',
 (226, 226, 193): 'Grassland/Herbaceous',
 (204, 186, 124): 'Shrub/Scrub',
 (219, 216, 61): 'Pasture/Hay',
 (170, 112, 40): 'Cultivated Crops',
 (186, 216, 234): 'Woody Wetlands',
 (112, 163, 186): 'Emergent Herbaceous Wetlands'}

We can also associate each land coverage type with its larger group of coverage types.

In [70]:
name2group = {
    "Open Water": "Water",
    "Developed Open Space": "Developed",
    "Developed Low Intensity": "Developed",
    "Developed Medium Intensity": "Developed",
    "Developed High Intensity": "Developed",
    "Barren Land (Rock/Sand/Clay)": "Barren",
    "Deciduous Forest": "Forest",
    "Evergreen Forest": "Forest",
    "Mixed Forest": "Forest",
    "Grassland/Herbaceous": "Scrubland",
    "Shrub/Scrub": "Scrubland",
    "Pasture/Hay": "Planted/Cultivated",
    "Cultivated Crops": "Planted/Cultivated",
    "Woody Wetlands": "Wetlands",
    "Emergent Herbaceous Wetlands": "Wetlands",
    "Not Classified": "Not Classified"
}

The ```rgb2name``` Dict lets us convert pixel colors to corresponding land types. Let's do that for each shot in our data, and add the land coverage ```"Class"``` and ```"Group"``` as columns of the .csv files. This cell took a few minutes to execute, since it reads and writes hundreds of files.

In [72]:
# paths to all granules
paths = [os.path.join("data", g) for g in os.listdir("data") if g.endswith(".csv")]

for path in paths:
    df = pd.read_csv(path)
    coverage = []
    
    for rgb in src.sample(zip(
            df["lon_lowestmode"], 
            df["lat_lowestmode"]
        )):
            rgb = tuple(rgb)
            if rgb in rgb2name.keys():
                name = rgb2name[rgb]
            else:
                name = "Not Classified"
            coverage.append(name)

    group = [name2group[name] for name in coverage]
    
    if "Group" not in df.keys():
        df.insert(0, "Group", group)
    if "Class" not in df.keys():
        df.insert(0, "Class", coverage)
    df = df.drop(columns=["Unnamed: 0"])
    
    df.to_csv(path)

In [73]:
# check our work at a random granule
pd.read_csv(paths[444])

Unnamed: 0.1,Unnamed: 0,Class,Group,beam,channel,lat_lowestmode,lon_lowestmode,elev_lowestmode,delta_time,rh,land_cover_data/landsat_water_persistence,land_cover_data/landsat_treecover,land_cover_data/region_class,land_cover_data/urban_proportion,land_cover_data/urban_focal_window_size,shot_number
0,0,Mixed Forest,Forest,2,1,42.000053,-79.026602,509.01697,1.118910e+08,[-6.39 -6.05 -5.75 -5.45 -5.19 -4.93 -4.67 -4....,0,100.0,7,0,3,147210200200276388
1,1,Deciduous Forest,Forest,2,1,42.002994,-79.020969,523.16766,1.118910e+08,[-1.86 -1.27 -0.82 -0.44 -0.11 0.22 0.56 0....,0,100.0,7,0,3,147210200200276398
2,2,Deciduous Forest,Forest,2,1,42.005919,-79.015386,605.79926,1.118910e+08,[-3.1 -2.28 -1.57 -0.89 -0.33 0.14 0.67 1....,0,81.0,7,0,3,147210200200276408
3,3,Mixed Forest,Forest,2,1,42.008879,-79.009678,522.69940,1.118910e+08,[-3.21 -2.65 -2.09 -1.45 -0.82 -0.29 0.18 0....,0,100.0,7,0,3,147210200200276418
4,4,Deciduous Forest,Forest,2,1,42.011822,-79.004029,514.59830,1.118910e+08,[-5.45 -4.82 -4.15 -3.36 -1.83 -0.63 -0.07 0....,0,93.0,7,0,3,147210200200276428
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2965,2965,Woody Wetlands,Wetlands,11,5,43.955173,-74.935331,495.28690,1.118911e+08,[-4. -3.4 -2.99 -2.69 -2.46 -2.24 -2.09 -1....,0,0.0,7,0,3,147211100200283264
2966,2966,Woody Wetlands,Wetlands,11,5,43.957912,-74.929344,504.46866,1.118911e+08,[-6.95 -5.23 -3.59 -1.6 -0.07 1.27 2.46 3....,0,92.0,7,0,3,147211100200283274
2967,2967,Evergreen Forest,Forest,11,5,43.960653,-74.923350,502.94888,1.118911e+08,[-5.08 -3.55 -2.13 -0.11 1.6 2.61 3.4 4....,0,83.0,7,0,3,147211100200283284
2968,2968,Evergreen Forest,Forest,11,5,43.963393,-74.917356,505.20910,1.118911e+08,[-2.69 -1.49 -0.44 0.52 1.57 2.69 3.7 4....,0,100.0,7,0,3,147211100200283294
