# Transferring Data from Google Cloud Storage to Google Earth Engine

Once we have uploaded the raw data files to Google Cloud Storage, we want to move them onto Google Earth Engine so that we can perform data processing (i.e. generating train, validation and test sets for modelling and so on).

This notebook shows how to transfer data from Google Cloud Storage to GEE.

In [None]:
#Q to self: why did we have to transform the CRS of the labels data but not the .tif files?

# 1. Config

In [13]:
import subprocess
import geopandas as gpd
import os
from google.cloud import storage
import glob
import shutil
import pathlib

In [14]:
def transform_crs(gs_path, gs_processed_path, crs = "epsg:3857"):
    """
    This function takes a shapefile from Google Cloud Storage, transforms it to a new
    coordinate reference system (crs), and re-writes it. 
    It performs and intermediate step where it writes the files locally, but removes these after.
    
    gs_path : str
        Path to Google Cloud Storage where the shapefile (.shp) is.
        INCLUDE EXTENSION .shp
    gs_processed_path : str
        Path to Google Cloud Storage where the processed files will be written.
        DO NOT INCLUDE EXTENSION.
    crs : str
        The coordinate reference system (crs) the transform the shapefile to.
    """
    
    # Set environment variable
    os.environ["PROJ_IGNORE_CELESTIAL_BODY"] = "YES"
    
    # Read data from GS
    data = gpd.read_file(gs_path)
    
    # Transform CRS
    data_new_crs = data.set_crs(crs, allow_override=True)
    
    # Add to local folder "out/"
    local_folder = pathlib.Path("out")
    local_folder.mkdir(exist_ok=True) #exist_ok=False was causing error???
    data_new_crs.to_file("out/data.shp") # Note: this will output all the other files too! e.g. .prj

    
    # Move all files from local to Cloud Storage
    bucket = gs_processed_path.split("//")[1].split("/")[0]
    path_in_bucket = gs_processed_path.split(f"{bucket}/")[1]
    for file in glob.glob("out/*"):
        extension = file.split(".")[-1]
        client = storage.Client()
        bucket = client.get_bucket(bucket)
        blob = bucket.blob(f"{path_in_bucket}.{extension}")
        blob.upload_from_filename(f"out/data.{extension}")
    shutil.rmtree("out")

In [15]:
def transfer_from_gcs_to_gee(gs_path, ee_path, data_type):
    """
    Moves data from Google Cloud Storage to Google Earth Engine.
    
    gs_path : str
        Path to file in Google Cloud Storage.
        INCLUDE EXTENSION.
    ee_path : str
        Path for file to be written in Google Earth Engine.
        DO NOT INCLUDE EXTENSION.
    data_type : str
        "table" for .shp, "image" for .tif
    """
    
    result = subprocess.run(
        [
            "earthengine",
            "upload",
            data_type,
            f"--asset_id={ee_path}",
            f"{gs_path}"
        ],
        capture_output=True,
        check=True,
    )
    print(result.stdout.decode())
    print(result.stderr.decode())

# 2. Fault labels data

Our labels data is in the form of a shapefile (.shp), which contains lots of vectors, where each vector is one fault.

## 2.1. Transform the CRS of the data

In [4]:
transform_crs(
    gs_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/raw/faults.shp", 
    gs_processed_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/processed/faults"
)

In [5]:
labels = gpd.read_file("gs://esg-satelite-data-warehouse/mars/labels/faults/processed/faults.shp")

In [6]:
labels.crs

<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## 2.2. Transfer from GCS to GEE

In [9]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/processed/faults.shp", 
    ee_path = "projects/esg-satelite/assets/mars/labels/faults/pre/faults",
    data_type = "table"
)

Started upload task with ID: AP44L5FNX23DPD56VR452FRN




# 2.3. Uploading the original faults.shp from GCS to EE (now CRS transformation)

- Done mainly since Folium plots are slightly off - want to check that it isn't an issue with the CRS.

In [None]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/raw/faults.shp", 
    ee_path = "projects/esg-satelite/assets/mars/labels/faults_orig_crs/pre/faults",
    data_type = "table"
)

Started upload task with ID: ARRQ2635Y3Q4H4D4PWHW5CAA




# 3. Elevation data

- https://astrogeology.usgs.gov/search/map/Mars/Topography/HRSC_MOLA_Blend/Mars_HRSC_MOLA_BlendDEM_Global_200mp
- Uploaded the data onto GCP originally using a .tsv file.

## 3.1. Transfer from GCS to GEE

In [None]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/features/elevation/raw/Mars_HRSC_MOLA_BlendDEM_Global_200mp_v2.tif", 
    ee_path = "projects/esg-satelite/assets/mars/features/elevation/pre/elevation"
    data_type = "image"
)

# 4. HRSC Images (sample of 3 only)

- https://ode.rsl.wustl.edu/mars/mapsearch (Mars Express > HRSC > Derived > Version 3 Map Projected Reduced Data Record).
- Uploaded the sample of 3 images onto GCP manually as Claire sent some samples via email, and retrieving the data online is not straightforward.

**VERY IMPORTANT NOTE**: All of the images below were moved to an EE *image collection*. This had to be manually made first on EE. So, when we do processin gon the image collection, we simply use the path `projects/esg-satelite/assets/mars/mars_express_hrsc_example` (i.e. with no "image_x" appended).

In [None]:
# As these were in separate .tif files, they have to be moved individually
gs_paths = [
    "gs://esg-satelite-data-warehouse/mars/features/hrsc_sample_3/raw/h1462_nd3_Merc_clip.tif",
    "gs://esg-satelite-data-warehouse/mars/features/hrsc_sample_3/raw/h1495_nd3_Merc_clip.tif",
    "gs://esg-satelite-data-warehouse/mars/features/hrsc_sample_3/raw/h8304_nd3_Merc.tif"
]
ee_paths = [f"projects/esg-satelite/assets/mars/features/hrsc_sample_3/pre/hrsc_sample/image_{x}" for x in range(len(gs_paths))]

for gs_path, ee_path in zip(gs_paths,ee_paths):
    transfer_from_gcs_to_gee(
        gs_path = gs_path,
        ee_path = ee_path,
        data_type = "image"
    )

# 5. CTX Images (sample of 1 only)

- http://murray-lab.caltech.edu/CTX/V01/tiles/

In [None]:
# Get paths of all tif files
gs_paths = []
ee_paths = []
ee_path_base = "projects/esg-satelite/assets/mars/features/ctx_sample_2/pre/ctx_sample"
client = storage.Client()
for blob in client.list_blobs('esg-satelite-data-warehouse', prefix='mars/features/ctx_sample_2/raw_unzipped'):
    gs_path = f"gs://esg-satelite-data-warehouse/{str(blob).split(', ')[1]}"
    if gs_path[-3:] == "tif":
        gs_paths.append(gs_path)
        file_name_no_ext = gs_path.split("raw_unzipped/")[1].split(".tif")[0]
        ee_paths.append(f"{ee_path_base}/{file_name_no_ext}" )

In [None]:
for gs_path, ee_path in zip(gs_paths,ee_paths):
    transfer_from_gcs_to_gee(
        gs_path = gs_path,
        ee_path = ee_path,
        data_type = "image"
    )

Started upload task with ID: N3M2TRFL2VM5ICOEUZ45MNWY


Started upload task with ID: IVPHKJUYIA37DDYAYVZ5BPH5




# 6. CTX Images (all of Tempe Terra)

- http://murray-lab.caltech.edu/CTX/V01/tiles/

In [None]:
# Get paths of all tif files
gs_paths = []
ee_paths = []
ee_path_base = "projects/esg-satelite/assets/mars/features/ctx_tempe_terra/pre/ctx_tempe_terra"
client = storage.Client()
for blob in client.list_blobs('esg-satelite-data-warehouse', prefix='mars/features/ctx_tempe_terra/raw_unzipped_tif'):
    gs_path = f"gs://esg-satelite-data-warehouse/{str(blob).split(', ')[1]}"
    if gs_path[-3:] == "tif":
        gs_paths.append(gs_path)
        file_name_no_ext = gs_path.split("raw_unzipped_tif/")[1].split(".tif")[0]
        ee_paths.append(f"{ee_path_base}/{file_name_no_ext}" )

In [None]:
i = 1
for gs_path, ee_path in zip(gs_paths,ee_paths):
    transfer_from_gcs_to_gee(
        gs_path = gs_path,
        ee_path = ee_path,
        data_type = "image"
    )
    print(f"Uploaded image {i} of 66 to Earth Engine.")
    i+=1

Started upload task with ID: 4B72DMNPDUDTM5RO5XIGNLBP


Uploaded image 1 of 66 to Earth Engine.
Started upload task with ID: HYYWHWHQEKVWPYJ25P6QAZOL


Uploaded image 2 of 66 to Earth Engine.
Started upload task with ID: WLKGWP2QX76W3HAVLBX4JHMO


Uploaded image 3 of 66 to Earth Engine.
Started upload task with ID: U5DOBTMQ2BGDIYD4GCUY7DRK


Uploaded image 4 of 66 to Earth Engine.
Started upload task with ID: YISCPG6MB7GZWNG7SHNZNCG5


Uploaded image 5 of 66 to Earth Engine.
Started upload task with ID: AHIEBM45QZXLRK2VCNLJZCX6


Uploaded image 6 of 66 to Earth Engine.
Started upload task with ID: KLDJQ2FAX2SEZ6POIERFEDUB


Uploaded image 7 of 66 to Earth Engine.
Started upload task with ID: P46UI4W4R6PEKR4RIPYASFXO


Uploaded image 8 of 66 to Earth Engine.
Started upload task with ID: JDHS2KR54JHMJ2E7O4RRN6TC


Uploaded image 9 of 66 to Earth Engine.
Started upload task with ID: JQW3VD5USH2DTZCUUXQ5MFAA


Uploaded image 10 of 66 to Earth Engine.
Started upload task with ID: ACN2QHTOLX

# 7. THEMIS Mosaic

- https://astrogeology.usgs.gov/search/map/Mars/Odyssey/THEMIS-IR-Mosaic-ASU/Mars_MO_THEMIS-IR-Day_mosaic_global_100m_v12

In [None]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/features/themis_mosaic/raw/planetarymaps.usgs.gov/mosaic/Mars_MO_THEMIS-IR-Day_mosaic_global_100m_v12.tif",
    ee_path = "projects/esg-satelite/assets/mars/features/themis_mosaic/pre/themis_mosaic",
    data_type = "image"
)

Started upload task with ID: XUO5IHGXMMXNHGOULAEN6OMM




# 8. THEMIS Samples 4

 - Four samples of the THEMIS IR Day data.
 - Taken from the samples that Claire sent.

In [None]:
# As these were in separate .tif files, they have to be moved individually
gs_paths = [
    "gs://esg-satelite-data-warehouse/mars/features/themis_sample_4/raw/THEMIS_DayIR_ControlledMosaic_Arcadia_30N240E_100mpp.tif",
    "gs://esg-satelite-data-warehouse/mars/features/themis_sample_4/raw/THEMIS_DayIR_ControlledMosaic_LunaePalus_00N270E_100mpp.tif",
    "gs://esg-satelite-data-warehouse/mars/features/themis_sample_4/raw/THEMIS_DayIR_ControlledMosaic_MareAcidalium_30N300E_100mpp.tif",
    "gs://esg-satelite-data-warehouse/mars/features/themis_sample_4/raw/THEMIS_DayIR_ControlledMosaic_Tharsis_000N225E_100mpp.tif"
]

ee_paths = [f"projects/esg-satelite/assets/mars/features/themis_sample_4/pre/themis_sample/image_{x}" for x in range(len(gs_paths))]

for gs_path, ee_path in zip(gs_paths,ee_paths):
    transfer_from_gcs_to_gee(
        gs_path = gs_path,
        ee_path = ee_path,
        data_type = "image"
    )

Started upload task with ID: AZJ6ZZSAUKRZHOOPEO2AKSBX


Started upload task with ID: 4RPAC4EEXVB3Q6O6AD2YGC5F


Started upload task with ID: FM6K3OTMJHPJURUZHT44LOCE


Started upload task with ID: TVQ3QN264DFRGG3N2MURSCQ3




# 8. THEMIS EPSG:3857 Sample of 1

In [17]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/features/themis_sample_epsg3857/raw/themis_sample_epsg3857.tif",
    ee_path = "projects/esg-satelite/assets/mars/features/themis_epsg3857_sample/pre/themis_epsg3857_sample",
    data_type = "image"
)


Started upload task with ID: XGOXFR32WLEVVBDIZRJOPXSK


