# Transferring Data from Google Cloud Storage to Google Earth Engine

Once we have uploaded the raw data files to Google Cloud Storage, we want to move them onto Google Earth Engine so that we can perform data processing (i.e. generating train, validation and test sets for modelling and so on).

This notebook shows how to transfer data from Google Cloud Storage to GEE.

In [1]:
#Q to self: why did we have to transform the CRS of the labels data but not the .tif files?

# 1. Config

In [2]:
import subprocess
import geopandas as gpd
import os
from google.cloud import storage
import glob
import shutil
import pathlib

In [3]:
def transform_crs(gs_path, gs_processed_path, crs = "epsg:4326"):
    """
    This function takes a shapefile from Google Cloud Storage, transforms it to a new
    coordinate reference system (crs), and re-writes it. 
    It performs and intermediate step where it writes the files locally, but removes these after.
    
    gs_path : str
        Path to Google Cloud Storage where the shapefile (.shp) is.
        INCLUDE EXTENSION .shp
    gs_processed_path : str
        Path to Google Cloud Storage where the processed files will be written.
        DO NOT INCLUDE EXTENSION.
    crs : str
        The coordinate reference system (crs) the transform the shapefile to.
    """
    
    # Set environment variable
    os.environ["PROJ_IGNORE_CELESTIAL_BODY"] = "YES"
    
    # Read data from GS
    data = gpd.read_file(gs_path)
    
    # Transform CRS
    data_new_crs = data.to_crs(crs)
    
    # Add to local folder "out/"
    local_folder = pathlib.Path("out")
    local_folder.mkdir(exist_ok=False)
    data_new_crs.to_file("out/data.shp") # Note: this will output all the other files too! e.g. .prj

    
    # Move all files from local to Cloud Storage
    bucket = gs_processed_path.split("//")[1].split("/")[0]
    path_in_bucket = gs_processed_path.split(f"{bucket}/")[1]
    for file in glob.glob("out/*"):
        extension = file.split(".")[-1]
        client = storage.Client()
        bucket = client.get_bucket(bucket)
        blob = bucket.blob(f"{path_in_bucket}.{extension}")
        blob.upload_from_filename(f"out/data.{extension}")
    shutil.rmtree("out")

In [4]:
def transfer_from_gcs_to_gee(gs_path, ee_path, data_type):
    """
    Moves data from Google Cloud Storage to Google Earth Engine.
    
    gs_path : str
        Path to file in Google Cloud Storage.
        INCLUDE EXTENSION.
    ee_path : str
        Path for file to be written in Google Earth Engine.
        DO NOT INCLUDE EXTENSION.
    data_type : str
        "table" for .shp, "image" for .tif
    """
    
    result = subprocess.run(
        [
            "earthengine",
            "upload",
            data_type,
            f"--asset_id={ee_path}",
            f"{gs_path}"
        ],
        capture_output=True,
        check=True,
    )
    print(result.stdout.decode())
    print(result.stderr.decode())

# 2. Fault labels data

Our labels data is in the form of a shapefile (.shp), which contains lots of vectors, where each vector is one fault.

## 2.1. Transform the CRS of the data

In [6]:
transform_crs(
    gs_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/raw/faults.shp", 
    gs_processed_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/processed/faults"
)

## 2.2. Transfer from GCS to GEE

In [12]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/labels/faults/processed/faults.shp", 
    ee_path = "projects/esg-satelite/assets/mars/labels/faults/pre/faults",
    data_type = "table"
)

Started upload task with ID: 4ALGEC3ZB5F7YHJ2QMFLFUDF




# 3. Elevation data

- https://astrogeology.usgs.gov/search/map/Mars/Topography/HRSC_MOLA_Blend/Mars_HRSC_MOLA_BlendDEM_Global_200mp
- Uploaded the data onto GCP originally using a .tsv file.

## 3.1. Transfer from GCS to GEE

In [None]:
transfer_from_gcs_to_gee(
    gs_path = "gs://esg-satelite-data-warehouse/mars/features/elevation/raw/Mars_HRSC_MOLA_BlendDEM_Global_200mp_v2.tif", 
    ee_path = "projects/esg-satelite/assets/mars/features/elevation/pre/elevation"
    data_type = "image"
)

# 4. HRSC Images (sample of 3 only)

- https://ode.rsl.wustl.edu/mars/mapsearch (Mars Express > HRSC > Derived > Version 3 Map Projected Reduced Data Record).
- Uploaded the sample of 3 images onto GCP manually as Claire sent some samples via email, and retrieving the data online is not straightforward.

**VERY IMPORTANT NOTE**: All of the images below were moved to an EE *image collection*. This had to be manually made first on EE. So, when we do processin gon the image collection, we simply use the path `projects/esg-satelite/assets/mars/mars_express_hrsc_example` (i.e. with no "image_x" appended).

In [None]:
# As these were in separate .tif files, they have to be moved individually
gs_paths = [
    "gs://esg-satelite-data-warehouse/mars/features/hrsc_sample_3/raw/h1462_nd3_Merc_clip.tif",
    "gs://esg-satelite-data-warehouse/mars/features/hrsc_sample_3/raw/h1495_nd3_Merc_clip.tif",
    "gs://esg-satelite-data-warehouse/mars/features/hrsc_sample_3/raw/h8304_nd3_Merc.tif"
]
ee_paths = [f"projects/esg-satelite/assets/mars/features/hrsc_sample_3/pre/hrsc_sample/image_{x}" for x in range(len(gs_paths))]

for gs_path, ee_path in zip(gs_paths,ee_paths):
    transfer_from_gcs_to_gee(
        gs_path = gs_path,
        ee_path = ee_path,
        data_type = "image"
    )

# 5. CTX Images (sample of 1 only)

- http://murray-lab.caltech.edu/CTX/V01/tiles/

In [18]:
# Get paths of all tif files
gs_paths = []
ee_paths = []
ee_path_base = "projects/esg-satelite/assets/mars/features/ctx_sample_2/pre/ctx_sample"
client = storage.Client()
for blob in client.list_blobs('esg-satelite-data-warehouse', prefix='mars/features/ctx_sample_2/raw_unzipped'):
    gs_path = f"gs://esg-satelite-data-warehouse/{str(blob).split(', ')[1]}"
    if gs_path[-3:] == "tif":
        gs_paths.append(gs_path)
        file_name_no_ext = gs_path.split("raw_unzipped/")[1].split(".tif")[0]
        ee_paths.append(f"{ee_path_base}/{file_name_no_ext}" )

In [21]:
for gs_path, ee_path in zip(gs_paths,ee_paths):
    transfer_from_gcs_to_gee(
        gs_path = gs_path,
        ee_path = ee_path,
        data_type = "image"
    )

Started upload task with ID: N3M2TRFL2VM5ICOEUZ45MNWY


Started upload task with ID: IVPHKJUYIA37DDYAYVZ5BPH5


