# 1.0 Data Download and Preparation
### This notebook will walk you through the following:


*   Downloading optical data of your choosing (for this notebook we'll be downloading Landsat-8 data)
*   Downloading the National Land Cover Database Product
*   Creating a label file from the Land Cover Product

To learn more about the various commands available for Google Earth Engine beyond the ones used in this notebook, check out their reference documentation: https://developers.google.com/earth-engine/apidocs


## 1.1 Notebook Setup

For this notebook, we will need to download data and implement commands from Google Earth Engine. To begin, we will import Google Earth Engine (ee), get authorized to access Earth Engine, and then initialize a session using it in the notebook.

**Note: if you do not have a Google Earth Engine project already set up to enter into the 'ee.Initialize(project=)' command, go to the link below, scroll down to the Getting Access section and click on the link for the 'project registration page' in the tip box.**

https://developers.google.com/earth-engine/guides/access

In [2]:
import ee

In [3]:
ee.Authenticate()

In [4]:
ee.Initialize(project='ee-data-download-prep') # change ee-data-download-prep to the name of your GEE project

## 1.2 Searching Images to Download from GEE

Now we begin by searching for the images we wish to download for our work. This is done with the ImageCollection command, and is set up to filter search images by path, row, and cloud cover percentage. These variables are setup below and can be changed based on the path, row, and cloud cover desired by the user.

There are other ways to filter images using the ImageCollection command, a few resources to learn more are:

*   https://tutorials.geemap.org/ImageCollection/image_collection_overview/
*   https://developers.google.com/earth-engine/apidocs/ee-imagecollection


In [4]:
PATH_VAR  = 46
ROW_VAR   = 28
CLOUD_VAR = 5

In [5]:
L8_collection = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2').filter(ee.Filter.eq('WRS_PATH', PATH_VAR)).filter(ee.Filter.eq('WRS_ROW', ROW_VAR)).filterMetadata('CLOUD_COVER', 'less_than', CLOUD_VAR)

The following cell will print how many images were found for download based on the specifications in the ImageCollection command.

In [6]:
L8_collectionList = L8_collection.toList(L8_collection.size())
L8_collectionSize = L8_collectionList.size().getInfo()

print(L8_collectionSize)

45


## 1.25 Define a Region of Interest (Optional)

The following two cells are for defining a region of interest (ROI) within the selected scenes for download. This is typically done both for download speed and to conserve memory, as often the entire scene is not needed. If however, we wish to download the whole scene, skip these cells and go direction to section 1.3. There will be one additional step that will be documented later on.

In [7]:
W_COORD = -122.62
E_COORD = -121.9
N_COORD = 46.45
S_COORD = 46.07

In [8]:
ROI = ee.Geometry.Rectangle([W_COORD, S_COORD, E_COORD, N_COORD])

## 1.3 Download Images to Google Drive

We can now begin the process of downloading images from Google Earth Engine to Google Drive. **Note you will need to create a folder in your Google Drive Account for where you want the images to be dowloaded to.** When you have done so, enter the name into the text string in the cell below.

In [9]:
FOLDER_NAME = 'NOTEBOOK_TEST_FOLDER'

Next we will define which band of the selected images we want to download. For example, for images from Landsat 8, there are seven Surface Reflectance bands to choose from (SR_B1, SR_B2, SR_B3, SR_B4, SR_B5, SR_B6, and SR_B7) and one Surface Temperature Band (ST_B10).

In [22]:
BAND = 'SR_B2'

The naming convention of the downloaded images is set up to be the name of the image, the band chosen, as well as the date the image was acquired by the satellite. Choose what each image will be named and enter it into the text string below.

In [11]:
NAME = 'MSH'

The remaining variables to be set are for the download command: ee.batch.Export.image.toDrive.

*   DESC is a text string descriptor for the job being run
*   SCALE is is the resolution in meters per pixel in the downloaded images
*   CRS is the coordinate reference system for the images (note that it is currently set with the EPSG for WGS84)
*   FF is the file format (currently only GeoTIFF and TFRecord are supported in GEE)

Note there are additional options for this command and not all the options used here are required. **If you do not want to download a region of interest within your scenes and instead want the whole scene, delete the 'region=ROI,' portion of the command.** If you wish to learn more about the command, see here:
https://developers.google.com/earth-engine/apidocs/export-image-todrive


In [10]:
DESC = 'L8_Download'
SCALE = 30
CRS = 'EPSG:4326'
FF = 'GeoTIFF'

In [23]:
for i in range(L8_collectionSize):
  l8_image = ee.Image(L8_collectionList.get(i)).select([BAND])
  print(l8_image.getInfo())
  image_date = str(l8_image.get('DATE_ACQUIRED').getInfo())
  full_title = NAME + '_' + BAND + '_' + image_date
  print(full_title)
  task = ee.batch.Export.image.toDrive(image=l8_image,
                                      description=DESC,
                                      scale=SCALE,
                                      region=ROI,
                                      fileNamePrefix=full_title,
                                      crs=CRS,
                                      fileFormat=FF,
                                      folder=FOLDER_NAME)
  task.start()
  print(task.status())

{'type': 'Image', 'bands': [{'id': 'SR_B2', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 65535}, 'dimensions': [7781, 7901], 'crs': 'EPSG:32610', 'crs_transform': [30, 0, 432585, 0, -30, 5215515]}], 'id': 'LANDSAT/LC08/C02/T1_L2/LC08_046028_20130703', 'version': 1629897570468917, 'properties': {'DATA_SOURCE_ELEVATION': 'GLS2000', 'WRS_TYPE': 2, 'REFLECTANCE_ADD_BAND_1': -0.2, 'REFLECTANCE_ADD_BAND_2': -0.2, 'DATUM': 'WGS84', 'REFLECTANCE_ADD_BAND_3': -0.2, 'REFLECTANCE_ADD_BAND_4': -0.2, 'REFLECTANCE_ADD_BAND_5': -0.2, 'REFLECTANCE_ADD_BAND_6': -0.2, 'REFLECTANCE_ADD_BAND_7': -0.2, 'system:footprint': {'type': 'LinearRing', 'coordinates': [[-123.24959768764617, 47.0891224425124], [-123.25022757717805, 47.08868949064404], [-123.27208016387733, 47.02809610589206], [-123.8450652590295, 45.38692932566067], [-123.8475117825429, 45.37939790481123], [-123.84274874038525, 45.37852019232812], [-123.15386732220479, 45.25865566139378], [-121.577021480554, 44.96660847240

**Note: The neural network notebook that follows this notebook uses six bands worth of Landsat 8 data. If you wish to also use six bands of data for each acquisition date, once one band of data has finished downloading, change the BAND variable to a new band name and rerun, repeating the process for each band.**

## 1.4 Search for Land Cover File from GEE

We will re-use the ImageCollection command to search for the National Land Cover Database file. We will choose the year 2016 and then select out the 'landcover' band from the NLCD file.

*   To learn more about the NLCD, check here: https://www.usgs.gov/centers/eros/science/national-land-cover-database
*   To download the NLCD directly and see what other years are available for download, check here: https://www.mrlc.gov/data



In [5]:
LC_dataset = ee.ImageCollection("USGS/NLCD_RELEASES/2016_REL")
NLCD_2016 = LC_dataset.filter(ee.Filter.eq('system:index', '2016')).first()
landcover = NLCD_2016.select('landcover')

## 1.5 Download Land Cover File

We will re-use the ee.batch.Export.image.toDrive command to download the Land Cover file to Google Drive. Nearly all of the variables will be kept the same so that the land cover image and optical images are the same resolution and projection. Only the file name and batch job description will be changed.

In [11]:
nlcd_task = ee.batch.Export.image.toDrive(image=landcover,
                                      description='LandCover_Download',
                                      scale=SCALE,
                                      region=ROI,
                                      fileNamePrefix='LandCover_2016',
                                      crs=CRS,
                                      fileFormat=FF,
                                      folder=FOLDER_NAME)
nlcd_task.start()
print(nlcd_task.status())

{'state': 'READY', 'description': 'LandCover_Download', 'creation_timestamp_ms': 1704464157153, 'update_timestamp_ms': 1704464157153, 'start_timestamp_ms': 0, 'task_type': 'EXPORT_IMAGE', 'id': 'RN6HCWRREZT4VVIW3H2D6EXG', 'name': 'projects/ee-data-download-prep/operations/RN6HCWRREZT4VVIW3H2D6EXG'}


## 1.6 Create Label File from Land Cover File

To create the label file that we will use for the next notebook that crafts the neural network, we will read in the land cover file.

**Note: Make sure that the land cover file has been downloaded from Google Earth Engine to the specified folder.**

You will also need to mount your Google Drive to this notebook. To do so:

*   Click on the folder icon on the far left vertical toolbar (it sits beneath the key icon on the toolbar)
*   Click on the icon third from left that shows a folder with the Google Drive symbol (if you hover over this icon with your mouse it should say 'Mount Drive')

We will start by installing/importing pyrsgis for raster manipulation as well as importing gdal and numpy.



In [12]:
!pip install pyrsgis

Collecting pyrsgis
  Downloading pyrsgis-0.4.1-py3-none-any.whl (25 kB)
Installing collected packages: pyrsgis
Successfully installed pyrsgis-0.4.1


In [13]:
from osgeo import gdal
import numpy as np
from pyrsgis import raster

Then we will read in the downloaded land cover file into this notebook. To do so, change the example path below to one that includes your chosen folder name and the name you gave your land cover file.

In [14]:
LC_file = gdal.Open('/content/drive/MyDrive/NOTEBOOK_TEST_FOLDER/LandCover_2016.tif')

We will then convert the imported raster image to a 2D array so that we can interact with the data for label file creation purposes (i.e. storing the length and the width of the array).

In [15]:
LC_raster = LC_file.GetRasterBand(1)
LC_array = LC_raster.ReadAsArray()


In [16]:
print(LC_array.shape)
LC_length = LC_array.shape[0]
LC_width = LC_array.shape[1]

(1413, 2672)


We then create an empty array with the same dimensions as the land cover array.

In [17]:
label_array = np.empty(shape=(LC_length, LC_width), dtype='object')

Iterating through the land cover array, for each pixel with a value of 12 (associated with the ID snow in the land cover file) we set the value of the pixel in the label file to 1, and for all other land cover values we set the pixel in the label file to 0.

**Note: this example uses the value for snow for the label file, but the user can set other values as 1 for the label file if they wish. The list of values in the label file can be found here:**

https://www.mrlc.gov/data/legends/national-land-cover-database-class-legend-and-description

In [18]:
for i in range(LC_length):
  for j in range(LC_width):
    if LC_array[i,j] == 12:
      label_array[i,j] = 1
    else:
      label_array[i,j] = 0

Last, we will write the label file as a raster to the same folder as our downloaded images and our land cover tif file.

In [19]:
outFile = '/content/drive/MyDrive/NOTEBOOK_TEST_FOLDER/Snow_Label_File.tif'
raster.export(label_array, LC_file, filename=outFile, dtype='float')

At the end of this notebook, you should have a folder populated with:
1.   However many bands of optical data for your chosen time period
2.   A land cover file
3.   A label file that will be used in neural network construction

If you have all of these files, you're ready to move to the next notebook!

