# Preparaing the dataset
- [ https://rajpurkar.github.io/SQuAD-explorer/ ] (Dataset link)
- Download the dataset from the above link

In [5]:
%%bash

wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json

--2023-09-30 21:58:42--  https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.111.153, 185.199.108.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.111.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4370528 (4.2M) [application/json]
Saving to: ‘dev-v2.0.json’

     0K .......... .......... .......... .......... ..........  1%  719K 6s
    50K .......... .......... .......... .......... ..........  2%  900K 5s
   100K .......... .......... .......... .......... ..........  3% 3.85M 4s
   150K .......... .......... .......... .......... ..........  4% 1.35M 4s
   200K .......... .......... .......... .......... ..........  5% 4.65M 3s
   250K .......... .......... .......... .......... ..........  7% 5.76M 3s
   300K .......... .......... .......... .......... ..........  8% 7.24M 2s
   350K .......... .......... .......... .......... ........

In [8]:
import json
import pandas as pd

data_path="dev-v2.0.json"

with open(data_path,"r") as f:
    squad_data=json.load(f)

context_qa_triples=[]

for article in squad_data['data']:
    for paragraph in article['paragraphs']:
        context=paragraph['context']
        for qa in paragraph['qas']:
            question=qa['question']
            if qa['answers']:
                answer=qa['answers'][0]['text']
            elif qa['plausible_answers']:
                plausible_answers=qa['plausible_answers']
                answer=plausible_answers[0]['text']
            else:
                answer=''

            context_qa_triples.append({'context':context,'question':question,'answers':answer})

df=pd.DataFrame(context_qa_triples[:30])
df.head(3)

Unnamed: 0,context,question,answers
0,The Normans (Norman: Nourmands; French: Norman...,In what country is Normandy located?,France
1,The Normans (Norman: Nourmands; French: Norman...,When were the Normans in Normandy?,10th and 11th centuries
2,The Normans (Norman: Nourmands; French: Norman...,From which countries did the Norse originate?,"Denmark, Iceland and Norway"


# Generating Albert Model
- [Albert Model](https://huggingface.co/docs/transformers/model_doc/albert) You can Learn More about this model from this link
- You can also check different version of Albert for different usecases from here.

### Converting the Model to ONNX format using optimum
- [ https://github.com/huggingface/optimum ] (Link for optimum)
- Using optimum we can directly convert any pytorch or tensorflow model to onnx format.
- Then from this onnx file we can convert to DLC format using SNPE

In [9]:
%%bash
optimum-cli export onnx --model twmkn9/albert-base-v2-squad2 alberta-onnx/

2023-09-30 21:59:10.386023: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-30 21:59:10.461126: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-30 21:59:10.479101: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-30 21:59:10.790954: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: li

verbose: False, log level: Level.ERROR



2023-09-30 21:59:16.200782: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-30 21:59:16.275655: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-30 21:59:16.293026: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-30 21:59:16.602860: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: li

### DLC Conversion with fixed size
- Now as we get the ONNX Model we'll now convert this to DLC Format

In [10]:
%%bash
snpe-onnx-to-dlc -i alberta-onnx/model.onnx -d input_ids 1,384 -d attention_mask 1,384 -d token_type_ids 1,384 -o alberta.dlc

[1;31m`overwrite_input_shapes` and/or `test_input_shapes` instead. An error will be [0m
[1;31mraised in the future.[0m


2023-09-30 21:59:25,991 - 235 - INFO - Successfully simplified the onnx model in child process
2023-09-30 21:59:26,137 - 235 - INFO - Successfully receive the simplified onnx model in main process
2023-09-30 21:59:26,209 - 235 - INFO - Successfully run shape inference in child process
2023-09-30 21:59:26,365 - 235 - INFO - Successfully receive the inferred model in main process
2023-09-30 21:59:28,072 - 235 - INFO - INFO_INITIALIZATION_SUCCESS: 
2023-09-30 21:59:28,234 - 235 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2023-09-30 21:59:28,289 - 235 - INFO - INFO_WRITE_SUCCESS: 


### Creating FP16 Model
1. First of all we need to create the RAW File
2. Then we'll convert this FP32 DLC to FP16 DLC

In [11]:
%%bash
mkdir input_ids
mkdir attention_mask
mkdir token_type_ids

#### Creating the RAW Files

In [12]:
import numpy as np
from transformers import AutoTokenizer, AlbertForQuestionAnswering
import torch

# Getting the tokenizer to convert it to particular inputs that the model needed
tokenizer = AutoTokenizer.from_pretrained("twmkn9/albert-base-v2-squad2")

question_token={}

for i in range(df.shape[0]):
    question,text,answer=df.iloc[i].question,df.iloc[i].context,df.iloc[i].answers
    inputs = tokenizer(question, text, return_tensors="np",
            padding='max_length',
            truncation="longest_first",
            max_length=384)
    question_token[i]=[question,inputs,answer,text]
    inp_ids = inputs.input_ids
    inp_ids=inp_ids.astype(np.float32)
    with open("input_ids/inp_ids_"+str(i)+".raw", 'wb') as f:
        inp_ids.tofile(f)
    
    mask = inputs.attention_mask
    mask=mask.astype(np.float32)
    with open("attention_mask/attn_mask_"+str(i)+".raw", 'wb') as f:
        mask.tofile(f)

    token_type= inputs.token_type_ids
    token_type=token_type.astype(np.float32)
    with open("token_type_ids/token_type_id_"+str(i)+".raw", 'wb') as f:
        token_type.tofile(f)

2023-09-30 21:59:29.221362: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-30 21:59:29.295276: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-30 21:59:29.313236: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-30 21:59:29.610631: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: li

#### Creating the List 

In [13]:

total_iter = 30
print("Generating input_list \"small_raw_list.txt\" with {} iterations".format(total_iter))

with open("tf_raw_list.txt",'w') as f:
    for i in range(total_iter):
        f.write("input_ids:=input_ids/inp_ids_{}.raw attention_mask:=attention_mask/attn_mask_{}.raw token_type_ids:=token_type_ids/token_type_id_{}.raw\n".format(i,i,i))



Generating input_list "small_raw_list.txt" with 30 iterations


### Creating the FP16 Model
- This cached model is optimized for sm8550
- if you've different processor please change it accordingly

In [14]:
%%bash

snpe-dlc-graph-prepare --input_dlc alberta.dlc --input_list tf_raw_list.txt  --output_dlc alberta_float.dlc --set_output_tensors end_logits,start_logits --use_float_io --htp_socs sm8550

[INFO] InitializeStderr: DebugLog initialized.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM8550
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x18363b0
[INFO] Found Interface Provider (v2.8)
[USER_INFO] FP16 precision enabled for graph with id=0
[USER_INFO] Offline Prepare VTCM size(MB) selected = 8
[USER_INFO] Offline Prepare DLBC enablement passed = 0


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[USER_INFO] Cleaning up backend manager resources
[USER_INFO] Cleaning up Contexts
[USER_INFO] BackendTerminate triggered
[INFO] SNPE HTP Offline Prepare: Successfully created cache for SM8550
[INFO] SNPE HTP Offline Prepare: Saved cached DLC to alberta_float.dlc
[USER_INFO] BackendTerminate triggered
[INFO] DebugLog shutting down.


# Generating Mobilebert Model
- [Mobile bert ](https://huggingface.co/csarron/mobilebert-uncased-squad-v2/tree/main) You can Learn More about this model from this link
- To check more about different use cases of Mobilebert you can use this [link](https://huggingface.co/docs/transformers/model_doc/mobilebert)

### Generating the ONNX Model

In [15]:
%%bash
optimum-cli export onnx --model csarron/mobilebert-uncased-squad-v2 mobilebert-onnx/

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


2023-09-30 21:59:36.311479: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-30 21:59:36.385244: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-30 21:59:36.402226: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-30 21:59:36.706875: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: li

verbose: False, log level: Level.ERROR



2023-09-30 21:59:51.377048: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-30 21:59:51.452021: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-09-30 21:59:51.469676: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-09-30 21:59:51.779129: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: li

### Converting to DLC

In [16]:
%%bash
snpe-onnx-to-dlc -i mobilebert-onnx/model.onnx -d input_ids 1,384 -d attention_mask 1,384 -d token_type_ids 1,384 -o mobile_bert.dlc

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[1;31m`overwrite_input_shapes` and/or `test_input_shapes` instead. An error will be [0m
[1;31mraised in the future.[0m


2023-09-30 22:00:38,259 - 235 - INFO - Successfully simplified the onnx model in child process
2023-09-30 22:00:38,608 - 235 - INFO - Successfully receive the simplified onnx model in main process
2023-09-30 22:00:38,756 - 235 - INFO - Successfully run shape inference in child process
2023-09-30 22:00:39,093 - 235 - INFO - Successfully receive the inferred model in main process
2023-09-30 22:00:45,555 - 235 - INFO - INFO_INITIALIZATION_SUCCESS: 
2023-09-30 22:00:46,066 - 235 - INFO - INFO_CONVERSION_SUCCESS: Conversion completed successfully
2023-09-30 22:00:46,188 - 235 - INFO - INFO_WRITE_SUCCESS: 


### Creating the RAW file

In [17]:
import numpy as np
from transformers import AutoTokenizer, MobileBertForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("csarron/mobilebert-uncased-squad-v2")


question_token={}

for i in range(df.shape[0]):
    question,text,answer=df.iloc[i].question,df.iloc[i].context,df.iloc[i].answers
    inputs = tokenizer(question, text, return_tensors="np",
            padding='max_length',
            truncation="longest_first",
            max_length=384)
    question_token[i]=[question,inputs,answer,text]
    inp_ids = inputs.input_ids
    inp_ids=inp_ids.astype(np.float32)
    with open("input_ids/inp_ids_"+str(i)+".raw", 'wb') as f:
        inp_ids.tofile(f)
    
    mask = inputs.attention_mask
    mask=mask.astype(np.float32)
    with open("attention_mask/attn_mask_"+str(i)+".raw", 'wb') as f:
        mask.tofile(f)

    token_type= inputs.token_type_ids
    token_type=token_type.astype(np.float32)
    with open("token_type_ids/token_type_id_"+str(i)+".raw", 'wb') as f:
        token_type.tofile(f)

In [18]:

total_iter = 30
print("Generating input_list \"small_raw_list.txt\" with {} iterations".format(total_iter))

with open("tf_raw_list.txt",'w') as f:
    for i in range(total_iter):
        f.write("input_ids:=input_ids/inp_ids_{}.raw attention_mask:=attention_mask/attn_mask_{}.raw token_type_ids:=token_type_ids/token_type_id_{}.raw\n".format(i,i,i))



Generating input_list "small_raw_list.txt" with 30 iterations


#### Creating the FP 16 Model

In [20]:
%%bash
snpe-dlc-graph-prepare --input_dlc mobile_bert.dlc --input_list tf_raw_list.txt  --output_dlc mobile_bert_float.dlc --use_float_io --set_output_tensors end_logits,start_logits --htp_socs sm8550

[INFO] InitializeStderr: DebugLog initialized.
[INFO] SNPE HTP Offline Prepare: Attempting to create cache for SM8550
[INFO] Attempting to open dynamically linked lib: libHtpPrepare.so
[INFO] dlopen libHtpPrepare.so SUCCESS handle 0x17b9a90
[INFO] Found Interface Provider (v2.8)
[USER_INFO] FP16 precision enabled for graph with id=0
[USER_INFO] Offline Prepare VTCM size(MB) selected = 8
[USER_INFO] Offline Prepare DLBC enablement passed = 0


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[USER_INFO] Cleaning up backend manager resources
[USER_INFO] Cleaning up Contexts
[USER_INFO] BackendTerminate triggered
[INFO] SNPE HTP Offline Prepare: Successfully created cache for SM8550
[INFO] SNPE HTP Offline Prepare: Saved cached DLC to mobile_bert_float.dlc
[USER_INFO] BackendTerminate triggered
[INFO] DebugLog shutting down.
