# Demo Notebook for MLCommons Integration
This notebook provides a walkthrough guidance for users to invoke MLCommons apis to upload model

Step 0: Import packages and set up client

Step 1: Upload NLP model from local file to Opensearch cluster

Step 2: Load Model

Step 3: Get Task

Step 4: Get Model

Step 5: Generate Sentence Embedding

Step 6: Unload Model

Step 7: Delete Model



## Step 0: Import packages and set up client
Install required packages for opensearch_py_ml.sentence_transformer_model
Install `opensearchpy` and `opensearch-py-ml` through pypi

Please refer https://pytorch.org/ to proper install torch based on your environment setting.  

In [1]:
# pip install opensearch-py opensearch-py-ml

In [1]:
import warnings
warnings.filterwarnings('ignore')
from opensearch_py_ml.ml_commons import MLCommonClient
from opensearchpy import OpenSearch

In [3]:
# import mlcommon to later upload the model to OpenSearch Cluster
from opensearch_py_ml.ml_commons import MLCommonClient

In [4]:
CLUSTER_URL = 'https://localhost:9200'

In [5]:
def get_os_client(cluster_url = CLUSTER_URL,
                  username='admin',
                  password='admin'):
    '''
    Get OpenSearch client
    :param cluster_url: cluster URL like https://ml-te-netwo-1s12ba42br23v-ff1736fa7db98ff2.elb.us-west-2.amazonaws.com:443
    :return: OpenSearch client
    '''
    client = OpenSearch(
        hosts=[cluster_url],
        http_auth=(username, password),
        verify_certs=False
    )
    return client 

In [6]:
client = get_os_client()

## Step 1: Upload NLP model from local file to Opensearch cluster
With a synthetic queries zip file, users can fine tune a sentence transformer model. 

The `SentenceTransformerModel` class will inititate an object for training, exporting and configuring the model. Plese visit the [SentenceTransformerModel](https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.html#opensearch_py_ml.sentence_transformer_model.SentenceTransformerModel) for API Reference . 

The `train` function will import synthestic queries, load sentence transformer example and train the model using a hugging face sentence transformer model. Plese visit the [SentenceTransformerModel.train](https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.html#opensearch_py_ml.sentence_transformer_model.SentenceTransformerModel.train) for API Reference . 

In [7]:
# clean up cache before training to free up spaces
import gc, torch

gc.collect()

torch.cuda.empty_cache()

In [8]:
# initiate SentenceTransformerModel object and run train function
# num_processes should be less than the number of gpus on the users' machine, if None, it will auto apply 
# all the available gpus on the machine 
custom_model = SentenceTransformerModel(folder_path="/Volumes/workplace/upload_content/model_files/", overwrite = True)
training = custom_model.train(read_path = '/Volumes/workplace/upload_content/synthetic_queries.zip',
                        output_model_name = 'test2_model.pt',
                        zip_file_name= 'test2_model.zip',
                        overwrite = True,
                        use_accelerate  = True,  
                        num_machines = 1,
                        num_processes = 1,
                        num_epochs = 1,
                        verbose = False)

reading synthetic query file: /Volumes/workplace/upload_content/model_files/synthetic_queries/synthetic_queries_batch_3.p

reading synthetic query file: /Volumes/workplace/upload_content/model_files/synthetic_queries/synthetic_queries_batch_0.p

reading synthetic query file: /Volumes/workplace/upload_content/model_files/synthetic_queries/synthetic_queries_batch_1.p

reading synthetic query file: /Volumes/workplace/upload_content/model_files/synthetic_queries/synthetic_queries_batch_2.p

Loading training examples... 



100%|██████████| 243/243 [00:00<00:00, 197064.17it/s]

[{'compute_environment': 'LOCAL_MACHINE', 'deepspeed_config': {'gradient_accumulation_steps': 1, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2}, 'distributed_type': 'DEEPSPEED', 'downcast_bf16': 'no', 'fsdp_config': {}, 'machine_rank': 0, 'main_process_ip': None, 'main_process_port': None, 'main_training_function': 'main', 'mixed_precision': 'no', 'num_machines': 1, 'num_processes': 1, 'use_cpu': False}]
Launching training on MPS.





Start training with accelerator...

The total number of steps training epoch are 8

Training epoch 0...



100%|██████████| 8/8 [00:15<00:00,  1.96s/it]


Total training time: 16.737305164337158

Model saved to path: /Volumes/workplace/upload_content/model_files/test2_model.pt

tokenizer_json_path:  /Volumes/workplace/upload_content/model_files/tokenizer.json
zip file is saved to /Volumes/workplace/upload_content/model_files/test2_model.zip



## Step 2: (Optional) Save model
If following step 1, the model zip will be auto generated, and the print message will indicate the zip file path as shown above. 

But if using other pretrained sentence transformer model from Hugging face, users can use `save_as_pt` function to save a pre-trained sentence transformer model for inferencing or benchmark with other models. 

The `save_as_pt`  function will prepare the model in proper format(Torch Script) along with tokenizers configuration file to upload to OpenSearch. Plese visit the [SentenceTransformerModel.save_as_pt](https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.html#opensearch_py_ml.sentence_transformer_model.SentenceTransformerModel.save_as_pt) for API Reference . 

In [9]:
# default to download model id, "sentence-transformers/msmarco-distilbert-base-tas-b" from hugging face 
# and output a model in a zip file containing model.pt file and tokenizers.json file. 
pre_trained_model = SentenceTransformerModel(folder_path = '/Volumes/workplace/upload_content/export_huggingface/', overwrite = True)
pre_trained_model.save_as_pt(sentences = ['today is sunny'])

model file is saved to  /Volumes/workplace/upload_content/export_huggingface/msmarco-distilbert-base-tas-b.pt
zip file is saved to  /Volumes/workplace/upload_content/export_huggingface/msmarco-distilbert-base-tas-b.zip 



SentenceTransformer(
  original_name=SentenceTransformer
  (0): Transformer(
    original_name=Transformer
    (auto_model): DistilBertModel(
      original_name=DistilBertModel
      (embeddings): Embeddings(
        original_name=Embeddings
        (word_embeddings): Embedding(original_name=Embedding)
        (position_embeddings): Embedding(original_name=Embedding)
        (LayerNorm): LayerNorm(original_name=LayerNorm)
        (dropout): Dropout(original_name=Dropout)
      )
      (transformer): Transformer(
        original_name=Transformer
        (layer): ModuleList(
          original_name=ModuleList
          (0): TransformerBlock(
            original_name=TransformerBlock
            (attention): MultiHeadSelfAttention(
              original_name=MultiHeadSelfAttention
              (dropout): Dropout(original_name=Dropout)
              (q_lin): Linear(original_name=Linear)
              (k_lin): Linear(original_name=Linear)
              (v_lin): Linear(original_name=Lin

## Step 3: Upload the model to OpenSearch cluster
After generated a model zip file, the users will need to describe model configuration in a ml-commons_model_config.json file. The `make_model_config_json` function in sentencetransformermodel class will parse the config file from hugging-face config.son file. If users would like to use a different config than the pre-trained sentence transformer, `make_model_config_json` function provide arguuments to change the configuration content and generated a ml-commons_model_config.json file. Plese visit the [SentenceTransformerModel.make_model_config_json](https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.html#opensearch_py_ml.sentence_transformer_model.SentenceTransformerModel.make_model_config_json) for API Reference . 

In general, the ml common client supports uploading sentence transformer models. With a zip file contains model in  Torch Script format, and a configuration file for tokenizers in json format, the `upload_model` function connects to opensearch through ml client and upload the model. Plese visit the [MLCommonClient.upload_model](https://opensearch-project.github.io/opensearch-py-ml/reference/api/ml_commons_upload_api.html#opensearch_py_ml.ml_commons_integration.MLCommonClient.upload_model) for API Reference. 

In [10]:
#users will need to prepare a ml-commons_model_config.json file to config the model, including model name ..
#this is a helpful function in py-ml.sentence_transformer_model to generate ml-commons_model_config.json file
custom_model.make_model_config_json()

ml-commons_model_config.json file is saved at :  /Volumes/workplace/upload_content/model_files/ml-commons_model_config.json


In [11]:
#connect to ml_common client with OpenSearch client
import opensearch_py_ml as oml
from opensearch_py_ml.ml_commons import MLCommonClient
ml_client = MLCommonClient(client)

In [12]:
# upload model to OpenSearch cluster, using model zip file path and ml-commons_model_config.json file generated above

model_path = '/Volumes/workplace/upload_content/model_files/test2_model.zip'
model_config_path = '/Volumes/workplace/upload_content/model_files/ml-commons_model_config.json'
ml_client.upload_model( model_path, model_config_path, isVerbose=True)

Total number of chunks 27
Sha1 value of the model file:  1a198957ec8a759e83f1e862ad46bb120c6c1b5a031e75c415c1a893c87a3da7
Model meta data was created successfully. Model Id:  cz2RloUB6UQeRtfO8Jph
uploading chunk 1 of 27
{'status': 'Uploaded'}
uploading chunk 2 of 27
{'status': 'Uploaded'}
uploading chunk 3 of 27
{'status': 'Uploaded'}
uploading chunk 4 of 27
{'status': 'Uploaded'}
uploading chunk 5 of 27
{'status': 'Uploaded'}
uploading chunk 6 of 27
{'status': 'Uploaded'}
uploading chunk 7 of 27
{'status': 'Uploaded'}
uploading chunk 8 of 27
{'status': 'Uploaded'}
uploading chunk 9 of 27
{'status': 'Uploaded'}
uploading chunk 10 of 27
{'status': 'Uploaded'}
uploading chunk 11 of 27
{'status': 'Uploaded'}
uploading chunk 12 of 27
{'status': 'Uploaded'}
uploading chunk 13 of 27
{'status': 'Uploaded'}
uploading chunk 14 of 27
{'status': 'Uploaded'}
uploading chunk 15 of 27
{'status': 'Uploaded'}
uploading chunk 16 of 27
{'status': 'Uploaded'}
uploading chunk 17 of 27
{'status': 'Uploaded

'cz2RloUB6UQeRtfO8Jph'

In [13]:
# Now we can load the uploaded model into memory by using the model id.


load_model_output = ml_client.load_model("cz2RloUB6UQeRtfO8Jph")

print(load_model_output)


{'task_id': 'dD2WloUB6UQeRtfOVJqe', 'status': 'CREATED'}


In [14]:
# When we call load_model to load a model a task will be initiated. 
# We can see the task status with invoking `get_task_info` method using the task id 

task_info = ml_client.get_task_info("dD2WloUB6UQeRtfOVJqe")

print(task_info)

{'model_id': 'cz2RloUB6UQeRtfO8Jph', 'task_type': 'LOAD_MODEL', 'function_name': 'TEXT_EMBEDDING', 'state': 'COMPLETED', 'worker_node': 'j6DmtPSRSHuHiqwxr6BnGw', 'create_time': 1673268712503, 'last_update_time': 1673268781057, 'is_async': True}


In [15]:
# We can also see model information with invoking `get_model_info` method using model id:

model_info = ml_client.get_model_info("cz2RloUB6UQeRtfO8Jph")

print(model_info)

{'name': 'sentence-transformers/msmarco-distilbert-base-tas-b', 'algorithm': 'TEXT_EMBEDDING', 'model_version': '1', 'model_format': 'TORCH_SCRIPT', 'model_state': 'LOADED', 'model_content_hash_value': '1a198957ec8a759e83f1e862ad46bb120c6c1b5a031e75c415c1a893c87a3da7', 'model_config': {'model_type': 'distilbert', 'embedding_dimension': 768, 'framework_type': 'SENTENCE_TRANSFORMERS', 'all_config': '{"_name_or_path": "/Users/dhrubo/.cache/torch/sentence_transformers/sentence-transformers_msmarco-distilbert-base-tas-b/", "activation": "gelu", "architectures": ["DistilBertModel"], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "torch_dtype": "float32", "transformers_version": "4.16.2", "vocab_size": 30522}'}, 'created_time': 167326

In [16]:
# Now using this model we can generate sentence embedding.

input_sentences = ["Test sentence1", "Test sentence2"]

embedding_output = ml_client.generate_embedding("cz2RloUB6UQeRtfO8Jph", input_sentences)

print(embedding_output)


{'inference_results': [{'output': [{'name': 'sentence_embedding', 'data_type': 'FLOAT32', 'shape': [768], 'data': [0.21162823, -0.31966814, -0.09601192, 0.14223084, 0.2264302, 0.19668405, 0.5970008, -0.19368923, -0.19195588, 0.3238238, -0.11029507, 0.20113745, -0.31363994, 0.218243, 0.42289385, 0.27342117, 0.409293, -0.36741465, -0.16032813, -0.13022901, 0.30196202, 0.46416682, -0.25054777, 0.14115417, 0.17647037, 0.04499317, 0.020941753, 0.21221676, -0.67876154, 0.17754197, -0.13002695, 0.15142015, 0.19277212, -0.08254553, 0.07795113, 0.49458084, -0.35918534, 0.10529865, 0.18471923, -0.15236725, -0.4106201, -0.4043108, -0.2681529, 0.09409851, -0.40124586, 0.15329781, -0.46914512, -0.16094545, -0.61767846, 0.43674526, -0.121316366, 0.93069077, 0.07570939, -0.087039694, -0.55987626, -0.048423957, -0.09163006, -0.4443902, 0.22732261, -0.1771088, 0.2349906, 0.36873037, -0.010825862, -0.29243731, 0.25782382, 0.13242388, 0.62139493, 0.76742977, -0.80710083, -0.40227658, 0.4170124, 0.5734022

In [18]:
# Now we can unload the model from opensearch nodes by invoking `unload_model`:

unload_model_response = ml_client.unload_model("cz2RloUB6UQeRtfO8Jph")

print(unload_model_response)

unload api output: {'j6DmtPSRSHuHiqwxr6BnGw': {'stats': {'cz2RloUB6UQeRtfO8Jph': 'unloaded'}}}
{'j6DmtPSRSHuHiqwxr6BnGw': {'stats': {'cz2RloUB6UQeRtfO8Jph': 'unloaded'}}}


In [19]:
# We can also delete the model:

delete_model_response = ml_client.delete_model("cz2RloUB6UQeRtfO8Jph")

print(delete_model_response)

{'_index': '.plugins-ml-model', '_id': 'cz2RloUB6UQeRtfO8Jph', '_version': 6, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 32, '_primary_term': 1}
