<a href="https://colab.research.google.com/github/nyeinnst/bigdata/blob/master/tfhub_bigearthnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

The goal here is to take models provided by TensorFlow hub and host them on AI Platform for use in Earth Engine.  This notebook is concerned with [the BigEarthNet/ResNet50 model](https://tfhub.dev/google/remote_sensing/bigearthnet-resnet50/1).  See [the paper](https://arxiv.org/pdf/1911.06721.pdf).

In [None]:
from google.colab import auth
auth.authenticate_user()

In [None]:
# This is the latest AI Platform version available.
!pip install tensorflow==2.2.0

In [None]:
import tensorflow as tf
print(tf.__version__)

In [None]:
import ee
ee.Authenticate()
ee.Initialize()

## Get the model from TensorFlow hub

Set the cache directory to a Cloud Storage bucket.  This will download the saved model from TensorFlow Hub to the bucket instead of the local file system.

In [None]:
import os

# YOUR BUCKET HERE
cloud_path = 'gs://YOUR-BUCKET'
os.environ["TFHUB_CACHE_DIR"] = cloud_path

Load TensorFlow Hub and resolve a model.  This will cache the saved model to the Cloud Storage directory specified above.

In [None]:
import tensorflow_hub as hub

model_url = 'https://tfhub.dev/google/remote_sensing/bigearthnet-resnet50/1'
model_path = hub.resolve(model_url)

## Reload the model saved by TensorFlow hub

Load the model that was cached by TensorFlow hub.  Note that `hub.load()` is roughly equivalent to `hub.resolve()` followed by `tf.saved_model.load()` ([reference](https://www.tensorflow.org/hub/api_docs/python/hub/load)).

In [None]:
imported = tf.saved_model.load(model_path, tags=[])

### Inspect the signatures of the saved model

In [None]:
print(imported.signatures)
print(imported.signatures['default'])
inputs = imported.signatures['default'].inputs
outputs = imported.signatures['default'].outputs

from pprint import pprint
pprint(inputs)
pprint(outputs)

### Pass some data through the reloaded model

Note that 'default' is the only option in the signature.  Also note that `imported.signatures['default']` is a `ConceteFunction`.  Send some data through to test the function. See [the model page](https://tfhub.dev/google/remote_sensing/bigearthnet-resnet50/1) for reference.  Specifically, "the size of the input image is flexible, but it would be best to match the model training input, which was height x width = 224 x 224 pixels."

In [None]:
images = tf.ones([8, 224, 224, 3]) # A batch of images with shape [batch_size, height, width, 3].
features = imported.signatures['default'](images)  # Features with shape [batch_size, num_features].
print(features.keys())
print(features['logits'][0])
print(features['pre_logits'][0])
print(features['default'][0])
# OK

From [the model page on TensorFlow hub](https://www.tensorflow.org/hub/common_saved_model_apis/images): "The output `logits` is a single tensor of dtype `float32` and shape `[batch_size, num_classes]`".  Assume that `logits` in this context is as defined [here](https://developers.google.com/machine-learning/glossary/#logits).  To get a class probability vector, feed the logits to a softmax.  Class list is [here](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/image_classification/bigearthnet.py#L71).

## Re-save the loaded model

The model needs to be saved with a 'serving_default' tag, i.e. `tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY`.  The signature comes from the 'default' function loaded from the TensorFlow hub model.  See [this reference](https://www.tensorflow.org/guide/saved_model#specifying_signatures_during_export) for more info on specifying export signatures.

In [None]:
export_dir = 'gs://YOUR-BUCKET/EXPORT-FOLDER'
tf.saved_model.save(imported, export_dir, imported.signatures['default'])

Use the command line tool to inspect the inputs and outputs.  This model has multiple outputs, so we need to choose one.  

In [None]:
!saved_model_cli show --dir {export_dir} --all

Pull the names from this output manually and hard-code them here.  Grab the length 45 `logits` for output.  There is only one input, `images`.

In [None]:
input_name = 'serving_default_images:0'
output_name = 'StatefulPartitionedCall:6'

# Make a dictionary that maps Earth Engine outputs and inputs to 
# AI Platform inputs and outputs, respectively.
import json
input_dict = "'" + json.dumps({input_name: "array"}) + "'"
output_dict = "'" + json.dumps({output_name: "logits"}) + "'"

print(input_dict)
print(output_dict)

Prepare the model for use in Earth Engine.  This process wraps the model in nodes that convert `base64` <-> `float32`.

In [None]:
eeified_dir = 'gs://YOUR-BUCKET/EEIFIED-FOLDER'

!earthengine set_project 'YOUR-PROJECT'
!earthengine model prepare --source_dir {export_dir} --dest_dir {eeified_dir} --input {input_dict} --output {output_dict}

In [None]:
MODEL_NAME = 'bigearthnet_logits'
VERSION_NAME = 'v1'

!gcloud ai-platform models create {MODEL_NAME} --project 'YOUR-PROJECT'

!gcloud ai-platform versions create {VERSION_NAME} \
  --project 'YOUR-PROJECT' \
  --model {MODEL_NAME} \
  --runtime-version 2.2 \
  --python-version 3.7 \
  --framework "TENSORFLOW" \
  --origin {eeified_dir}

## Embedding

Now do the length 2048 embedding.

In [None]:
input_name = 'serving_default_images:0'
output_name = 'StatefulPartitionedCall:5'

import json
input_dict = "'" + json.dumps({input_name: "array"}) + "'"
output_dict = "'" + json.dumps({output_name: "prelogits"}) + "'"

print(input_dict)
print(output_dict)

Prepare the model for use in Earth Engine.  This process wraps the model in nodes that convert `base64` <-> `float32`.

In [None]:
eeified_dir = 'gs://YOUR-BUCKET/EEIFIED-EMBED-FOLDER'

!earthengine set_project 'YOUR-PROJECT'
!earthengine model prepare --source_dir {export_dir} --dest_dir {eeified_dir} --input {input_dict} --output {output_dict}

In [None]:
MODEL_NAME = 'bigearthnet_embed'
VERSION_NAME = 'v1'

!gcloud ai-platform models create {MODEL_NAME} --project 'YOUR-PROJECT'

!gcloud ai-platform versions create {VERSION_NAME} \
  --project 'YOUR-PROJECT' \
  --model {MODEL_NAME} \
  --runtime-version 2.2 \
  --python-version 3.7 \
  --framework "TENSORFLOW" \
  --origin {eeified_dir}

## Fine Tuning

To learn more about fine tuning, see [this reference](https://www.tensorflow.org/hub/tf2_saved_model#fine-tuning).