# Set up Environment

- Install packages required
- Configure project, region, bucket names
- Mount GCS bucket
- Save configuration file

In [1]:
import os
import sys

## Create Local Directory Structure

In [1]:
! mkdir -p ./src/scripts
! mkdir -p ./data
! mkdir -p ./jars

## Install required packages

- [Google Earth Engine (GEE) API](https://developers.google.com/earth-engine/guides/python_install)
- [geemap](https://geemap.org/installation/) for interactive mapping with GEE
- [geobeam](https://github.com/GoogleCloudPlatform/dataflow-geobeam#1-install-the-module) to ingest and analyze massive amounts of geospatial data in parallel using Dataflow

In [14]:
install_packages = False

In [10]:
%%writefile requirements.txt

numpy==1.19.5
pandas==1.3.5
earthengine-api==0.1.300
geemap==0.11.5
geobeam==0.2.0
# pyrasterframes
# geopandas

Overwriting requirements.txt


In [17]:
if install_packages == True:
    ! pip install -r requirements.txt
else:
    print("No packages installed. Enable flag install_packages to install packages.")

No packages installed. Enable flag install_packages to install packages.


**To enable interactive maps with `geemap`, enable leaflet extension**

**NOTE:** Refresh browser for map to load

In [18]:
if install_packages == True:
    ! jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leaflet

### Restart the Kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [19]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Set up your Google Cloud Resources

### Enable Google Cloud APIs

- Cloud Storage
- Earth Engine
- Dataflow
- BigQuery
- Vertex AI

### Set your Project ID

If you don't know your project ID, you may be able to get your project ID using `gcloud` or `google.auth`.

In [7]:
PROJECT_ID = "[your-project-id]"  # <---CHANGE THIS TO YOUR PROJECT

import os

# Get your Google Cloud project ID using google.auth
if not os.getenv("IS_TESTING"):
    import google.auth

    _, PROJECT_ID = google.auth.default()
    print("Project ID: ", PROJECT_ID)

# validate PROJECT_ID
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    print(
        f"Please set your project id before proceeding to next step. Currently it's set as {PROJECT_ID}"
    )

Project ID:  rthallam-demo-project


### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial.

In [9]:
from datetime import datetime

def get_timestamp():
    return datetime.now().strftime("%Y%m%d%H%M%S")

TIMESTAMP = get_timestamp()
print(f"TIMESTAMP = {TIMESTAMP}")

TIMESTAMP = 20220303013224


### Create a Cloud Storage bucket

In [14]:
BUCKET_NAME = "[your-bucket-name]"  # <---CHANGE THIS TO YOUR BUCKET
REGION = "us-central1"  # @param {type:"string"}

BUCKET_NAME = "cloud-ai-platform-2f444b6a-a742-444b-b91a-c7519f51bd77"  # <---CHANGE THIS TO YOUR BUCKET

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    BUCKET_NAME = f"{PROJECT_ID}aip-{get_timestamp()}"

In [15]:
print(f"PROJECT_ID = {PROJECT_ID}")
print(f"BUCKET_NAME = {BUCKET_NAME}")
print(f"REGION = {REGION}")

PROJECT_ID = rthallam-demo-project
BUCKET_NAME = cloud-ai-platform-2f444b6a-a742-444b-b91a-c7519f51bd77
REGION = us-central1


In [16]:
os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["BUCKET_NAME"] = BUCKET_NAME
os.environ["REGION"] = REGION

---

**Only if your bucket doesn't already exist:** Run the following cell to create your Cloud Storage bucket.

---

In [None]:
! gsutil mb -l $REGION gs://$BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al gs://$BUCKET_NAME

### Mount GCS buckets

Mount GCS buckets to a local folder for reading/writing files from/to GCS easily.

In [2]:
%%writefile ./src/scripts/mount_gcs.sh

#!/bin/bash -eu

source /opt/c2d/c2d-utils || exit 1

function should_mount_gcs(){
  # Mount GCS buckets locally.
  local mount_gcs=true
  JUPYTER_USER=$(get_jupyter_user)
  mkdir -p "/home/${JUPYTER_USER}"/gcs
  gcsfuse --implicit-dirs "/home/${JUPYTER_USER}"/gcs
}

shopt -s nocasematch
should_mount_gcs
echo $?
shopt -u nocasematch

Writing ./src/scripts/mount_gcs.sh


- Change permissions on the script to run

In [4]:
! chmod 775 ./src/scripts/mount_gcs.sh 

- Run script to mount GCS buckets

In [5]:
! ./src/scripts/mount_gcs.sh  && echo $?

2022/03/03 01:29:58.237628 Start gcsfuse/0.40.0 (Go version go1.17.6) for app "" using mount point: /home/jupyter/gcs
2022/03/03 01:29:58.253377 Opening GCS connection...
2022/03/03 01:29:58.257285 Mounting file system "gcsfuse"...
daemonize.Run: readFromProcess: sub-process: mountWithArgs: mountWithConn: Mount: mount: running /usr/bin/fusermount: exit status 1
1
0


- Validate with the bucket created

In [24]:
os.environ["LOCAL_GCS_ROOT"] = "/home/jupyter/gcs"

In [None]:
! ls -ls $LOCAL_GCS_ROOT/$BUCKET_NAME