# Work on the Debugging (11/28/2023)

Working on the debugging of the model and working out the task in hand. Also checking if I am doing correct or not.

Do the quantization of the model, and the conversion of the data, all in one notebook along with the model inference.

<span style="color:red;">
  
1. Standardize the P_T value before quantizing.
2. Three P_T plots:
    1. Raw P_T (Number of particles vs P_T (GeV))
    2. P_T after standardization (X-axis should be arbitrary units)
    3. After int8 quantization. Centers at 0 and range should be between -127, 127)
      
3. Upon standardization, there could be a few values which could be out of the range of (-127, 127). How to deal with the outliers? One of the methods is to put all of them in the last bin. Are there any other methods available?
  
4. Fix bin size does not give resolution. We can lose information if we discard those outliers.
  
5. Print the true and predicted P_T values. Should be in the INT8 range.
  
6. Put on the distribution if it makes sense.
</span>

<span style="color:green;">
    
1. Read about the dataset
    
2. Understand the outputs what you are plotting
</span>


**Date(11/28/2023)**
1. We need to standradization to whole datasample which is 100 in our case. 
2. Plot the datasample before and after standradization ??
3. Or do we only need to standradize the True P_T?
4. What type of output you have right now?

Links:-
1. https://www.tensorflow.org/lite/performance/post_training_integer_quant
2. https://www.tensorflow.org/lite/performance/post_training_quantization
3. https://www.tensorflow.org/lite/models/convert/
4. https://www.tensorflow.org/lite/performance/post_training_quant
5. https://www.tensorflow.org/lite/performance/post_training_float16_quant 
6. https://www.tensorflow.org/lite/performance/post_training_integer_quant_16x8 
7. https://www.tensorflow.org/lite/performance/quantization_spec 
8. https://arxiv.org/pdf/1712.05877.pdf

***DATE(11.29.2023)***

` The dataset contains input features and target features consist of different things like pT, eta, phi etc. so all of those need to be standardized and quantized separately. but you can start with just pT`
Task:
1. Since working only on the `pT`, quntize it after standradization. Further Check the plots before and after quantization.

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import ROOT 

2023-11-29 15:18:56.409803: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-29 15:19:01.455295: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
import sys 
sys.path += ["../../../mlpf/particleflow/mlpf/"]
from tfmodel.model_setup import make_model
from tfmodel.utils import parse_config

In [None]:
config, _ = parse_config("../../../mlpf/particleflow/parameters/clic.yaml") #positions on the lxplus

In [None]:
model = make_model(config, tf.float32)
model.build((1, None, config["dataset"]["num_input_features"]))


In [None]:
model.summary()

In [None]:
model.load_weights("weights-96-5.346523.hdf5", skip_mismatch=False, by_name=True)
## These files hosted at https://huggingface.co/jpata/particleflow/tree/clic_clusters_v1.6

In [None]:
## Reading the dataset
ds_builder = tfds.builder("clic_edm_qq_pf", data_dir = '../../../mlpf/tensorflow_datasets/') # Tensorflow datsets positions in the lxplus
dss = ds_builder.as_data_source("test")



In [None]:
def yield_from_ds():
    for elem in dss:
        yield {"X": elem["X"], "ygen": elem["ygen"], "ycand": elem["ycand"]}

In [None]:
output_signature = {k: tf.TensorSpec(shape=(None, v.shape[1])) for (k, v) in dss.dataset_info.features.items()}
tf_dataset = tf.data.Dataset.from_generator(yield_from_ds, output_signature=output_signature).take(100).padded_batch(batch_size=10)


In [None]:
tf_dataset

In [None]:
data = list(tfds.as_numpy(tf_dataset))


In [None]:
data_df = pd.DataFrame(data)
data_df.head()

In [None]:
Xs = [d["X"] for d in data]
ys = [d["ygen"] for d in data]

In [None]:
true_pts = []
pred_pts = []

for ibatch in range(len(Xs)):
    ret = model(Xs[ibatch])

    mask_true_particles = ys[ibatch][..., 0]!=0
    
    true_pt = ys[ibatch][mask_true_particles, 2]
    pred_pt = ret["pt"][mask_true_particles][..., 0].numpy()

    true_pts.append(true_pt)
    pred_pts.append(pred_pt)

In [None]:
true_pt = np.concatenate(true_pts)
pred_pt = np.concatenate(pred_pts)

In [None]:
plt.hist(pred_pt/true_pt, bins=np.linspace(0,3,100));
plt.yscale("log")

### pT(GeV) Plot before standradization
