# Post-Training Quantization

Task as defined by Joosep:
``I would perhaps start with post-training quantization``.
either in tensorflow: https://www.tensorflow.org/model_optimization/guide/quantization/post_training

or on top of the exported ONNX model:
https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html

``you need to set up a basic code that performs post-training quantization on a trained model, evaluated the speed and loss of physics performance
``

## Post Training quantization in TensorFlow 

Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy.

### Optimization methods
There are several post-training quantization options to choose from. 

| Model Technique         | Benefits                            | Hardware                                    |
|-------------------------|------------------------------------|---------------------------------------------|
| Dynamic range quantization | 4x smaller, 2x-3x speedup       | CPU                                         |
| Full integer quantization  | 4x smaller, 3x+ speedup         | CPU, Edge TPU, Microcontrollers             |
| Float16 quantization      | 2x smaller, GPU acceleration   | CPU, GPU                                    |


The following decision tree can help determine which post-training quantization method is best for your use case:
![images_1](https://www.tensorflow.org/static/lite/performance/images/optimization.jpg)

We will start with `Dynamic Range Quantization` because it provides reduced memory usage and faster computation without you having to provide a representative dataset for calibration. This type of quantization, statically quantizes only the weights from floating point to integer at conversion time, which provides 8-bits of precision:
```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
```
To further reduce latency during inference, "dynamic-range" operators dynamically quantize activations based on their range to 8-bits and perform computations with 8-bit weights and activations. This optimization provides latencies close to fully fixed-point inferences. However, the outputs are still stored using floating point so the increased speed of dynamic-range ops is less than a full fixed-point computation.


**Quantization involves reducing the precision of the weights and activations in a model, typically from 32-bit floating point values to 8-bit integers.**

In [63]:
import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib
import joblib


import h5py
import pickle
import pandas as pd

import sys

In [11]:
!wget https://huggingface.co/jpata/particleflow/blob/clic_clusters_v1.6/opt-96-5.346523.pkl

--2023-10-19 12:16:29--  https://huggingface.co/jpata/particleflow/blob/clic_clusters_v1.6/opt-96-5.346523.pkl
Resolving huggingface.co (huggingface.co)... 2600:9000:248c:1a00:17:b174:6d00:93a1, 2600:9000:248c:b200:17:b174:6d00:93a1, 2600:9000:248c:1000:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:248c:1a00:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42096 (41K) [text/html]
Saving to: ‘opt-96-5.346523.pkl.1’


2023-10-19 12:16:29 (416 KB/s) - ‘opt-96-5.346523.pkl.1’ saved [42096/42096]



In [10]:
!wget https://huggingface.co/jpata/particleflow/blob/clic_clusters_v1.6/weights-96-5.346523.hdf5

--2023-10-19 12:15:39--  https://huggingface.co/jpata/particleflow/blob/clic_clusters_v1.6/weights-96-5.346523.hdf5
Resolving huggingface.co (huggingface.co)... 2600:9000:248c:9000:17:b174:6d00:93a1, 2600:9000:248c:7e00:17:b174:6d00:93a1, 2600:9000:248c:ea00:17:b174:6d00:93a1, ...
Connecting to huggingface.co (huggingface.co)|2600:9000:248c:9000:17:b174:6d00:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37791 (37K) [text/html]
Saving to: ‘weights-96-5.346523.hdf5’


2023-10-19 12:15:40 (387 KB/s) - ‘weights-96-5.346523.hdf5’ saved [37791/37791]



### Reading the HDF5 Files

Method I

In [21]:
with h5py.File('weights-96-5.346523.hdf5', 'r') as f:
    # Print all root level object names (aka keys) 
    # these can be group or dataset names 
    print("Keys: %s" % f.keys())
    # get first object name/key; may or may NOT be a group
    a_group_key = list(f.keys())[0]
    
    # get the object type for a_group_key: usually group or dataset
    print(type(f[a_group_key])) 


Keys: <KeysViewHDF5 ['cg_id_0', 'cg_id_1', 'cg_id_2', 'cg_id_3', 'cg_id_4', 'cg_id_5', 'cg_reg_0', 'cg_reg_1', 'cg_reg_2', 'cg_reg_3', 'cg_reg_4', 'cg_reg_5', 'input_encoding_clic', 'node_encoding', 'normalization', 'output_decoding', 'top_level_model_weights']>
<class 'h5py._hl.group.Group'>


Method II

In [30]:
f = h5py.File('weights-96-5.346523.hdf5')

# Studying the structure of the file by printing what HDF5 groups are present

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset

cg_id_0
<class 'h5py._hl.group.Group'>
cg_id_1
<class 'h5py._hl.group.Group'>
cg_id_2
<class 'h5py._hl.group.Group'>
cg_id_3
<class 'h5py._hl.group.Group'>
cg_id_4
<class 'h5py._hl.group.Group'>
cg_id_5
<class 'h5py._hl.group.Group'>
cg_reg_0
<class 'h5py._hl.group.Group'>
cg_reg_1
<class 'h5py._hl.group.Group'>
cg_reg_2
<class 'h5py._hl.group.Group'>
cg_reg_3
<class 'h5py._hl.group.Group'>
cg_reg_4
<class 'h5py._hl.group.Group'>
cg_reg_5
<class 'h5py._hl.group.Group'>
input_encoding_clic
<class 'h5py._hl.group.Group'>
node_encoding
<class 'h5py._hl.group.Group'>
normalization
<class 'h5py._hl.group.Group'>
output_decoding
<class 'h5py._hl.group.Group'>
top_level_model_weights
<class 'h5py._hl.group.Group'>


In [47]:

# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Loop through the keys and read the data
    for key in file.keys():
#         print(f"Key: {key}")

        # Check if the key refers to a group
        if isinstance(file[key], h5py.Group):
            # Access the group and print its keys
            group = file[key]
            print(f"Keys within {key}: {list(group.keys())}")

#             for subkey in group.keys():
#                 data = group[subkey][()]
#                 print(f"Data in {subkey}:")
#                 print(data)
            
        
            

Keys within cg_id_0: ['cg_id_0', 'cg_id_0_ffn_dist_dense_0', 'cg_id_0_ffn_dist_dense_1', 'cg_id_0_ffn_dist_dense_2', 'cg_id_0_ffn_dist_dense_3']
Keys within cg_id_1: ['cg_id_1', 'cg_id_1_ffn_dist_dense_0', 'cg_id_1_ffn_dist_dense_1', 'cg_id_1_ffn_dist_dense_2', 'cg_id_1_ffn_dist_dense_3']
Keys within cg_id_2: ['cg_id_2', 'cg_id_2_ffn_dist_dense_0', 'cg_id_2_ffn_dist_dense_1', 'cg_id_2_ffn_dist_dense_2', 'cg_id_2_ffn_dist_dense_3']
Keys within cg_id_3: ['cg_id_3', 'cg_id_3_ffn_dist_dense_0', 'cg_id_3_ffn_dist_dense_1', 'cg_id_3_ffn_dist_dense_2', 'cg_id_3_ffn_dist_dense_3']
Keys within cg_id_4: ['cg_id_4', 'cg_id_4_ffn_dist_dense_0', 'cg_id_4_ffn_dist_dense_1', 'cg_id_4_ffn_dist_dense_2', 'cg_id_4_ffn_dist_dense_3']
Keys within cg_id_5: ['cg_id_5', 'cg_id_5_ffn_dist_dense_0', 'cg_id_5_ffn_dist_dense_1', 'cg_id_5_ffn_dist_dense_2', 'cg_id_5_ffn_dist_dense_3']
Keys within cg_reg_0: ['cg_reg_0', 'cg_reg_0_ffn_dist_dense_0', 'cg_reg_0_ffn_dist_dense_1', 'cg_reg_0_ffn_dist_dense_2', 'cg_reg_

In [90]:
# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the group 'cg_id_0'
    group_cg_id_0 = file['cg_id_0']
    
    # List the keys within the group
    keys_in_cg_id_0 = list(group_cg_id_0.keys())
    print(f"Keys within cg_id_0: {keys_in_cg_id_0}")
    
    #Choose a subkey
    subkey='cg_id_0'
    
    subgroup = group_cg_id_0[subkey]
    
#     print(subgroup)
    
    keys_in_subgroup = list(subgroup.keys())
    print(f"Keys within {subkey}: {keys_in_subgroup}")

    

Keys within cg_id_0: ['cg_id_0', 'cg_id_0_ffn_dist_dense_0', 'cg_id_0_ffn_dist_dense_1', 'cg_id_0_ffn_dist_dense_2', 'cg_id_0_ffn_dist_dense_3']
Keys within cg_id_0: ['cg_id_0_layernorm1', 'cg_id_0_msg_0', 'cg_id_0_msg_1', 'message_building_layer_lsh']


## Reading files within the subgroup

In [59]:
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the subgroup 'cg_id_0_ffn_dist_dense_0'
    subgroup_cg_id_0_ffn_dist_dense_0 = file['cg_id_0']['cg_id_0_ffn_dist_dense_0']
    
    # List the keys within the subgroup
    keys_in_subgroup = list(subgroup_cg_id_0_ffn_dist_dense_0.keys())
    print(f"Keys within cg_id_0_ffn_dist_dense_0: {keys_in_subgroup}")
    
    # Access and print data within the subgroup
    for subkey in keys_in_subgroup:
        data = subgroup_cg_id_0_ffn_dist_dense_0[subkey][()]
        print(f"Data in {subkey}:")
        print(data)

Keys within cg_id_0_ffn_dist_dense_0: ['bias:0', 'kernel:0']
Data in bias:0:
[-0.42805317  0.9052358  -0.30682716  0.457701   -0.77937907  0.21298797
 -0.64650166 -0.54155844  0.62915844  0.71274954 -0.852341   -0.42336592
 -0.5791534   0.4973752   0.6579737  -0.2940757  -0.43727416 -0.3596791
 -0.53477967  0.2825041   0.52568066  0.38996714  0.349633   -0.9086682
  0.38351122  1.346358   -0.7207639   0.28883758  0.70375586  0.8860439
 -0.41023213 -0.4066803   0.05410811 -0.06734214 -0.56115437 -0.58051896
 -0.3027428   0.74306834  0.45497474 -0.03278346 -0.24062441 -0.5881686
  0.41746095  0.42180288  0.10265758 -0.2534956  -0.15646602 -0.9777575
  0.8568099  -0.53954774  0.44354475 -0.4319153   0.63953334  0.18238714
 -0.7146191  -0.97242945 -0.22551654  0.22472627 -0.4724067  -0.99339795
 -0.9917985   0.13679531  0.31225142 -0.6768131 ]
Data in kernel:0:
[[ 0.31398082 -0.5096557  -0.03505608 ...  0.08097732  0.19537257
  -0.35680908]
 [-0.04176841  0.10870415  0.01572619 ...  0.0541

##### applying the post training quantization on the weights(bais and Kernel) stored in a subsubgroup of the HDF5 file.

In [61]:
# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the subgroup 'cg_id_0_ffn_dist_dense_0'
    subgroup_cg_id_0_ffn_dist_dense_0 = file['cg_id_0']['cg_id_0_ffn_dist_dense_0']
    
    # Load the weights (bias and kernel)
    bias = subgroup_cg_id_0_ffn_dist_dense_0['bias:0'][()]  # Load bias
    kernel = subgroup_cg_id_0_ffn_dist_dense_0['kernel:0'][()]  # Load kernel
    
    # Apply post-training quantization
    # Apply post-training quantization to the weights
    quantized_bias = tf.quantization.fake_quant_with_min_max_args(
        bias, min=-10, max=10, num_bits=8)
    quantized_kernel = tf.quantization.fake_quant_with_min_max_args(
        kernel, min=-10, max=10, num_bits=8)

In [64]:
# Calculate the sizes
original_bias_size = sys.getsizeof(bias)
original_kernel_size = sys.getsizeof(kernel)
quantized_bias_size = sys.getsizeof(quantized_bias)
quantized_kernel_size = sys.getsizeof(quantized_kernel)


In [66]:
# Print the sizes
print(f"Output with Fake Quantization")
print(f"=====================================")
print(f"Original Bias Size: {original_bias_size} bytes")
print(f"Original Kernel Size: {original_kernel_size} bytes")
print(f"Quantized Bias Size: {quantized_bias_size} bytes")
print(f"Quantized Kernel Size: {quantized_kernel_size} bytes")

Output with Fake Quantization
Original Bias Size: 368 bytes
Original Kernel Size: 65664 bytes
Quantized Bias Size: 168 bytes
Quantized Kernel Size: 168 bytes


#### Full integer Quantization

In [80]:
# Define a function to perform full integer quantization
def full_integer_quantization(weights, num_bits=8):
    # Determine the range of values
    min_val = np.min(weights)
    max_val = np.max(weights)
    
    # Define the quantization range
    q_min = 0
    q_max = 2**num_bits - 1
    
    # Scale and quantize the weights
    scale = (max_val - min_val) / (q_max - q_min)
    zero_point = q_min - min_val / scale
    
    quantized_weights = np.round(weights / scale + zero_point)
    return quantized_weights.astype(np.uint8)


# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the subgroup 'cg_id_0_ffn_dist_dense_0'
    subgroup_cg_id_0_ffn_dist_dense_0 = file['cg_id_0']['cg_id_0_ffn_dist_dense_0']
    
    # Load the weights (bias and kernel)
    bias = subgroup_cg_id_0_ffn_dist_dense_0['bias:0'][()]  # Load bias
    kernel = subgroup_cg_id_0_ffn_dist_dense_0['kernel:0'][()]  # Load kernel
    
    # Apply full integer quantization to the weights
    quantized_bias = full_integer_quantization(bias)
    quantized_kernel = full_integer_quantization(kernel)
    
    
# Calculate the sizes
original_bias_size = sys.getsizeof(bias)
original_kernel_size = sys.getsizeof(kernel)
quantized_bias_size = sys.getsizeof(quantized_bias)
quantized_kernel_size = sys.getsizeof(quantized_kernel)


# Print the sizes
print(f"Output with Full integer Quantization")
print(f"=====================================")
print(f"Original Bias Size: {original_bias_size} bytes")
print(f"Original Kernel Size: {original_kernel_size} bytes")
print(f"Quantized Bias Size: {quantized_bias_size} bytes")
print(f"Quantized Kernel Size: {quantized_kernel_size} bytes")

# Calculate the reduction factor
reduction_factor_bias = original_bias_size / quantized_bias_size
reduction_factor_kernel = original_kernel_size / quantized_kernel_size
print(f"Reduction Factor for Bias: {reduction_factor_bias}")
print(f"Reduction Factor for Kernel: {reduction_factor_kernel}")


Output with Full integer Quantization
Original Bias Size: 368 bytes
Original Kernel Size: 65664 bytes
Quantized Bias Size: 176 bytes
Quantized Kernel Size: 16512 bytes
Reduction Factor for Bias: 2.090909090909091
Reduction Factor for Kernel: 3.9767441860465116


In [79]:
import h5py
import tensorflow as tf

# Define a function to apply dynamic range quantization
def dynamic_range_quantization(weights):
    # Define the quantization range dynamically based on the weights
    min_val = tf.reduce_min(weights)
    max_val = tf.reduce_max(weights)
    num_bits = 8  # Define the number of bits for quantization

    # Compute the scale and zero point
    scale = (max_val - min_val) / ((2 ** num_bits) - 1)
    zero_point = tf.round(-min_val / scale)

    # Apply quantization and dequantization
    quantized_weights = tf.round(weights / scale) - zero_point
    dequantized_weights = (quantized_weights + zero_point) * scale

    return dequantized_weights

# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the subgroup 'cg_id_0_ffn_dist_dense_0'
    subgroup_cg_id_0_ffn_dist_dense_0 = file['cg_id_0']['cg_id_0_ffn_dist_dense_0']
    
    # Load the weights (bias and kernel)
    bias = subgroup_cg_id_0_ffn_dist_dense_0['bias:0'][()]  # Load bias
    kernel = subgroup_cg_id_0_ffn_dist_dense_0['kernel:0'][()]  # Load kernel

    # Apply dynamic range quantization to the weights
    quantized_bias = dynamic_range_quantization(bias)
    quantized_kernel = dynamic_range_quantization(kernel)

    
# Calculate the sizes
original_bias_size = sys.getsizeof(bias)
original_kernel_size = sys.getsizeof(kernel)
quantized_bias_size = sys.getsizeof(quantized_bias)
quantized_kernel_size = sys.getsizeof(quantized_kernel)


# Print the sizes
print(f"Output with Dynamic Range Quantization")
print(f"=====================================")
print(f"Original Bias Size: {original_bias_size} bytes")
print(f"Original Kernel Size: {original_kernel_size} bytes")
print(f"Quantized Bias Size: {quantized_bias_size} bytes")
print(f"Quantized Kernel Size: {quantized_kernel_size} bytes")

# Calculate the reduction factor
reduction_factor_bias = original_bias_size / quantized_bias_size
reduction_factor_kernel = original_kernel_size / quantized_kernel_size
print(f"Reduction Factor for Bias: {reduction_factor_bias}")
print(f"Reduction Factor for Kernel: {reduction_factor_kernel}")


Output with Dynamic Range Quantization
Original Bias Size: 368 bytes
Original Kernel Size: 65664 bytes
Quantized Bias Size: 168 bytes
Quantized Kernel Size: 168 bytes
Reduction Factor for Bias: 2.1904761904761907
Reduction Factor for Kernel: 390.85714285714283


In [78]:
import h5py
import tensorflow as tf

# Define a function to apply Float16 quantization
def float16_quantization(weights):
    return tf.dtypes.cast(weights, tf.float16)

# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the subgroup 'cg_id_0_ffn_dist_dense_0'
    subgroup_cg_id_0_ffn_dist_dense_0 = file['cg_id_0']['cg_id_0_ffn_dist_dense_0']
    
    # Load the weights (bias and kernel)
    bias = subgroup_cg_id_0_ffn_dist_dense_0['bias:0'][()]  # Load bias
    kernel = subgroup_cg_id_0_ffn_dist_dense_0['kernel:0'][()]  # Load kernel

    # Apply Float16 quantization to the weights
    float16_bias = float16_quantization(bias)
    float16_kernel = float16_quantization(kernel)

    
# Calculate the sizes
original_bias_size = sys.getsizeof(bias)
original_kernel_size = sys.getsizeof(kernel)
quantized_bias_size = sys.getsizeof(quantized_bias)
quantized_kernel_size = sys.getsizeof(quantized_kernel)


# Print the sizes
print(f"Output with Float 16 Quantization")
print(f"=====================================")
print(f"Original Bias Size: {original_bias_size} bytes")
print(f"Original Kernel Size: {original_kernel_size} bytes")
print(f"Quantized Bias Size: {quantized_bias_size} bytes")
print(f"Quantized Kernel Size: {quantized_kernel_size} bytes")

# Calculate the reduction factor
reduction_factor_bias = original_bias_size / quantized_bias_size
reduction_factor_kernel = original_kernel_size / quantized_kernel_size
print(f"Reduction Factor for Bias: {reduction_factor_bias}")
print(f"Reduction Factor for Kernel: {reduction_factor_kernel}")


Output with Float 16 Quantization
Original Bias Size: 368 bytes
Original Kernel Size: 65664 bytes
Quantized Bias Size: 168 bytes
Quantized Kernel Size: 168 bytes
Reduction Factor for Bias: 2.1904761904761907
Reduction Factor for Kernel: 390.85714285714283


Issuses:
1. how the kernel quantization can give 390x in FP16 and Dynamic Range?
2. there are a few files inside the ['cg_id_0']['cg_id_0'], do we need to work on those as well?
3. The Files are not organised on this notebook, organise it. 

## Applying on the whole Subgroup

### `cg_id_0`

#### Dynamic Range Quantization

In [91]:
# Define a function to apply dynamic range quantization
def dynamic_range_quantization(weights):
    # Define the quantization range dynamically based on the weights
    min_val = tf.reduce_min(weights)
    max_val = tf.reduce_max(weights)
    num_bits = 8  # Define the number of bits for quantization

    # Compute the scale and zero point
    scale = (max_val - min_val) / ((2 ** num_bits) - 1)
    zero_point = tf.round(-min_val / scale)

    # Apply quantization and dequantization
    quantized_weights = tf.round(weights / scale) - zero_point
    dequantized_weights = (quantized_weights + zero_point) * scale

    return dequantized_weights



# Define a function to get the size of an object in bytes
def get_size(obj):
    return sys.getsizeof(obj)

# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the subgroup 'cg_id_0_ffn_dist_dense_0'
    subgroup_cg_id_0_ffn_dist_dense_0 = file['cg_id_0']['cg_id_0_ffn_dist_dense_0']
    
    # Load the weights (bias and kernel)
    bias = subgroup_cg_id_0_ffn_dist_dense_0['bias:0'][()]  # Load bias
    kernel = subgroup_cg_id_0_ffn_dist_dense_0['kernel:0'][()]  # Load kernel

    # Apply dynamic range quantization to the weights
    quantized_bias = dynamic_range_quantization(bias)
    quantized_kernel = dynamic_range_quantization(kernel)

    # Calculate original and quantized sizes
    original_bias_size = get_size(bias)
    original_kernel_size = get_size(kernel)
    quantized_bias_size = get_size(quantized_bias)
    quantized_kernel_size = get_size(quantized_kernel)
    
     print(f"Output with Dynamic Range Quantization")
        print(f"=====================================")
        print(f"Subgroup Name: {subgroup_name}")
        print(f"Original Bias Size: {original_bias_size} bytes")
        print(f"Original Kernel Size: {original_kernel_size} bytes")
        print(f"Quantized Bias Size: {quantized_bias_size} bytes")
        print(f"Quantized Kernel Size: {quantized_kernel_size} bytes")
        print(f"Reduction Factor for Bias: {original_bias_size / quantized_bias_size}x")
        print(f"Reduction Factor for Kernel: {original_kernel_size / quantized_kernel_size}x")
        print('======================')
        print()
        

# Calculate total sizes
total_original_size = original_bias_size + original_kernel_size
total_quantized_size = quantized_bias_size + quantized_kernel_size

print(f"Total Original Size: {total_original_size} bytes")
print(f"Total Quantized Size: {total_quantized_size} bytes")
print(f"Reduction Factor: {total_original_size / total_quantized_size}x")

Total Original Size: 66032 bytes
Total Quantized Size: 336 bytes
Reduction Factor: 196.52380952380952x


#### Float 16 Quantization

In [89]:
# Define a function to apply Float16 quantization
def float16_quantization(weights):
    return tf.dtypes.cast(weights, tf.float16)

# Define a function to get the size of an object in bytes
def get_size(obj):
    return sys.getsizeof(obj)

# Open the HDF5 file
with h5py.File('weights-96-5.346523.hdf5', 'r') as file:
    # Access the group 'cg_id_0'
    group_cg_id_0 = file['cg_id_0']
    
    # Define a list of subgroup names
    subgroup_names = ['cg_id_0_ffn_dist_dense_0', 
                      'cg_id_0_ffn_dist_dense_1', 
                      'cg_id_0_ffn_dist_dense_2', 
                      'cg_id_0_ffn_dist_dense_3']
    
    for subgroup_name in subgroup_names:
        # Access the subgroup
        subgroup = group_cg_id_0[subgroup_name]
        
        # Load the weights (bias and kernel)
        bias = subgroup['bias:0'][()]  # Load bias
        kernel = subgroup['kernel:0'][()]  # Load kernel

        # Apply Float16 quantization to the weights
        float16_bias = float16_quantization(bias)
        float16_kernel = float16_quantization(kernel)

        # Print the sizes and reduction factors
        original_bias_size = get_size(bias)
        original_kernel_size = get_size(kernel)
        quantized_bias_size = get_size(float16_bias)
        quantized_kernel_size = get_size(float16_kernel)
        
        print(f"Output with Float 16 Quantization")
        print(f"=====================================")
        print(f"Subgroup Name: {subgroup_name}")
        print(f"Original Bias Size: {original_bias_size} bytes")
        print(f"Original Kernel Size: {original_kernel_size} bytes")
        print(f"Quantized Bias Size: {quantized_bias_size} bytes")
        print(f"Quantized Kernel Size: {quantized_kernel_size} bytes")
        print(f"Reduction Factor for Bias: {original_bias_size / quantized_bias_size}x")
        print(f"Reduction Factor for Kernel: {original_kernel_size / quantized_kernel_size}x")
        print('======================')
        print()
        
# Calculate total sizes
total_original_size = original_bias_size + original_kernel_size
total_quantized_size = quantized_bias_size + quantized_kernel_size
    
print(f"Total Original Size: {total_original_size} bytes")
print(f"Total Quantized Size: {total_quantized_size} bytes")
print(f"Reduction Factor: {total_original_size / total_quantized_size}x")


Output with Float 16 Quantization
Subgroup Name: cg_id_0_ffn_dist_dense_0
Original Bias Size: 368 bytes
Original Kernel Size: 65664 bytes
Quantized Bias Size: 168 bytes
Quantized Kernel Size: 168 bytes
Reduction Factor for Bias: 2.1904761904761907x
Reduction Factor for Kernel: 390.85714285714283x

Output with Float 16 Quantization
Subgroup Name: cg_id_0_ffn_dist_dense_1
Original Bias Size: 368 bytes
Original Kernel Size: 16512 bytes
Quantized Bias Size: 168 bytes
Quantized Kernel Size: 168 bytes
Reduction Factor for Bias: 2.1904761904761907x
Reduction Factor for Kernel: 98.28571428571429x

Output with Float 16 Quantization
Subgroup Name: cg_id_0_ffn_dist_dense_2
Original Bias Size: 368 bytes
Original Kernel Size: 16512 bytes
Quantized Bias Size: 168 bytes
Quantized Kernel Size: 168 bytes
Reduction Factor for Bias: 2.1904761904761907x
Reduction Factor for Kernel: 98.28571428571429x

Output with Float 16 Quantization
Subgroup Name: cg_id_0_ffn_dist_dense_3
Original Bias Size: 624 bytes
O