# DLC Generation
- Model used [MobileBert_Paper](https://arxiv.org/pdf/2004.02984.pdf) , [Huggingface Link](https://huggingface.co/Alireza1044/mobilebert_sst2)

### Taking the Model from Huggingface

In [1]:
import tensorflow as tf

from transformers import TensorType
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
import sys

bs = 1
SEQ_LEN = 128
MODEL_NAME = "Alireza1044/mobilebert_sst2"

# Allocate tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_NAME, from_pt=True)


2024-03-12 11:34:04.908118: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-12 11:34:04.935372: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-12 11:34:09.271374: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFMobileBertForSequenceClassification: ['mobilebert.embeddings.position_ids']
- This IS expected if you are initializing TFMobi

### Converting the Model to Tensorflow keras format(.pb)

In [4]:
def model_fn(input_ids, attention_mask):
    output = tf.nn.softmax(model(input_ids, attention_mask).logits, axis=-1)
    return output

model_fn = tf.function(
    model_fn,
    input_signature=[
        tf.TensorSpec(shape=[bs, SEQ_LEN], dtype=tf.int32),
        tf.TensorSpec(shape=[bs, SEQ_LEN], dtype=tf.int32)
    ]
)


#### Checking the Tensorflow Model Prediction

In [6]:

# Sample input
context = "It is easy to say but hard to do ..."

input_encodings = tokenizer(
            context,
            return_tensors=TensorType.TENSORFLOW,
            # return_tensors="np",
            padding='max_length',
            return_length=True,
            max_length=SEQ_LEN,
            return_special_tokens_mask=True
        )
# print(input_encodings)

print(f"\nContext = \n{context}")
logits = model_fn(input_encodings.input_ids, input_encodings.attention_mask)
# print(logits)
# print(logits.shape)

positivity = logits[0][1] * 100
negativity = logits[0][0] * 100

print(f"\nPrediction: {positivity:.2f}% positive & {negativity:.2f}% negative\n")

input("Enter to continue ...")





Context = 
It is easy to say but hard to do ...

Prediction: 0.02% positive & 99.98% negative



Enter to continue ... 


''

#### Saving the tensorflow model to .pb format

In [7]:
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
frozen_func = convert_variables_to_constants_v2(model_fn.get_concrete_function())

layers = [op.name for op in frozen_func.graph.get_operations()]
print("-" * 50)
print("NO. of Frozen model layers: {}".format(len(layers)))

print("-" * 50)
print("Frozen model inputs: ")
print(frozen_func.inputs)
print("Frozen model outputs: ")
print(frozen_func.outputs)

graph_def = frozen_func.graph.as_graph_def()

graph_def = tf.compat.v1.graph_util.remove_training_nodes(graph_def)

tf.io.write_graph(graph_or_graph_def=graph_def,
                  logdir="./frozen_models",
                  name="mobilebert_sst2.pb",
                  as_text=False)

2024-03-12 11:39:55.460568: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-03-12 11:39:55.460670: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session


--------------------------------------------------
NO. of Frozen model layers: 5747
--------------------------------------------------
Frozen model inputs: 
[<tf.Tensor 'input_ids:0' shape=(1, 128) dtype=int32>, <tf.Tensor 'attention_mask:0' shape=(1, 128) dtype=int32>]
Frozen model outputs: 
[<tf.Tensor 'Identity:0' shape=(1, 2) dtype=float32>]


'./frozen_models/mobilebert_sst2.pb'

## Converting the Model to DLC Format

In [8]:
## Give appropriate SNPE-ROOT
import os
os.environ['SNPE_ROOT']="/local/mnt/workspace/snpe/snpe-2.20/2.20.0.240223/"

##### Converting the Model to FP-32 Format

In [9]:
%%bash
source $SNPE_ROOT/bin/envsetup.sh
snpe-tensorflow-to-dlc -i frozen_models/mobilebert_sst2.pb -d input_ids 1,128 -d attention_mask 1,128 --out_node Identity -o frozen_models/mobilebert_sst2.dlc


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO] AISW SDK environment set
[INFO] SNPE_ROOT: /local/mnt/workspace/snpe/snpe-2.20/2.20.0.240223


2024-03-12 11:41:19.603667: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-12 11:41:19.629782: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-12 11:41:20.834829: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-03-12 11:41:23.478095: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2024-03-12 11:41:26,990 - 235 - INFO - INFO_ALL_BUILDING_NETWORK: 
    Buildi

##### Converting the Model to FP16 Format

In [11]:
%%bash
source $SNPE_ROOT/bin/envsetup.sh
snpe-dlc-graph-prepare --input_dlc frozen_models/mobilebert_sst2.dlc --use_float_io --htp_archs v75 --set_output_tensors Identity:0,Identity_1:0

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO] AISW SDK environment set
[INFO] SNPE_ROOT: /local/mnt/workspace/snpe/snpe-2.20/2.20.0.240223


[INFO] InitializeStderr: DebugLog initialized.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM8650
[USER_INFO] Target device backend record identifier: HTP_V75_8MB
[USER_INFO] No cache record in the DLC matches the target device (HTP_V75_8MB). Creating a new record
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x24298d0
[INFO] Found Interface Provider (v2.14)
[USER_INFO] Platform option not set
[USER_INFO] Created ctx=0x1 for Snpe Unique Graph ID=0 backend=3 instancePtr=0x24294d8
[USER_INFO] FP16 precision enabled for graph with id=0
[USER_INFO] Offline Prepare VTCM size(MB) selected = 8
[USER_INFO] Offline Prepare Optimization Level passed = 2
[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context=0x1 for Snpe Unique Graph ID=0 backend=3 instancePtr=0x24294d8
[USER_INFO] DONE Cleaning up Context=0x1 for Snpe Unique Graph ID=0 backend=3 instancePtr=0x24294d8
[USER_INFO] B