# 0.5 Generate Noisy Data and Fitting

This notebook is used to generate and fit all the data required for the paper. We will generate data for the following noise cases: 

1, 2, 3, 4, 5, 6, 7 STD

## Imports

In [3]:
import sys

sys.path.append("../../")
# sys.path.append("/home/ferroelectric/Documents/m3_learning/m3_learning/src")
sys.path.append('../../src')


In [4]:
%load_ext autoreload
%autoreload 2

import numpy as np
from m3_learning.be.dataset import BE_Dataset
from m3_learning.viz.printing import printer
from m3_learning.be.nn import SHO_fit_func_nn, SHO_Model
from m3_learning.util.file_IO import download_and_unzip


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


2024-06-22 11:32:38.164208: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-22 11:32:38.789674: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ferroelectric/micromamba/envs/paper/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-22 11:32:38.789717: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ferroelectric/micromamba/envs/paper/

## Loading data for SHO fitting


In [5]:
# Download the data file from Zenodo
url = 'https://zenodo.org/record/7774788/files/PZT_2080_raw_data.h5?download=1'

# Specify the filename and the path to save the file
filename = '/data_raw.h5'
save_path = './Data'

# download the file
download_and_unzip(filename, url, save_path)

downloading data
...100%, 1663 MB, 22248 KB/s, 76 seconds passed

In [4]:
data_path = save_path + '/' + filename

# instantiate the dataset object
dataset = BE_Dataset(data_path)

# print the contents of the file
dataset.print_be_tree()

No spectroscopic datasets found as attributes of /Measurement_000/Channel_000/Position_Indices
No position datasets found as attributes of /Raw_Data-SHO_Fit_000/Spectroscopic_Values


  return (data - self.mean)/self.std


/
├ Measurement_000
  ---------------
  ├ Channel_000
    -----------
    ├ Bin_FFT
    ├ Bin_Frequencies
    ├ Bin_Indices
    ├ Bin_Step
    ├ Bin_Wfm_Type
    ├ Excitation_Waveform
    ├ Noise_Floor
    ├ Noisy_Data_1
    ├ Noisy_Data_2
    ├ Noisy_Data_3
    ├ Noisy_Data_4
    ├ Noisy_Data_5
    ├ Noisy_Data_6
    ├ Noisy_Data_7
    ├ Noisy_Data_8
    ├ Position_Indices
    ├ Position_Values
    ├ Raw_Data
    ├ Spatially_Averaged_Plot_Group_000
      ---------------------------------
      ├ Bin_Frequencies
      ├ Max_Response
      ├ Mean_Spectrogram
      ├ Min_Response
      ├ Spectroscopic_Parameter
      ├ Step_Averaged_Response
    ├ Spatially_Averaged_Plot_Group_001
      ---------------------------------
      ├ Bin_Frequencies
      ├ Max_Response
      ├ Mean_Spectrogram
      ├ Min_Response
      ├ Spectroscopic_Parameter
      ├ Step_Averaged_Response
    ├ Spectroscopic_Indices
    ├ Spectroscopic_Values
    ├ UDVS
    ├ UDVS_Indices
├ Noisy_Data_1_SHO_Fit
  --------

## Generates Noisy Data

This function will generate noisy records and save them as an h5_main file in the USID format. This allows the data to be computed with the Pycroscopy SHO Fitter. 

In [5]:
# calculates the standard deviation and uses that for the noise
noise_STD = np.std(dataset.get_original_data)

# prints the standard deviation
print(noise_STD)

0.0038833667


In [6]:
dataset.generate_noisy_data_records(noise_levels = np.arange(1,9), 
                                    verbose=True, 
                                    noise_STD=noise_STD)

The STD of the data is: 0.0038833667058497667
Adding noise level 1
Adding noise level 2
Adding noise level 3
Adding noise level 4
Adding noise level 5
Adding noise level 6
Adding noise level 7
Adding noise level 8


## SHO fits on all the datasets

This will take some time, Each fit takes about 10 minutes to complete. 

In [7]:
out = [f"Noisy_Data_{i}" for i in np.arange(1,9)]
out.append("Raw_Data")

for data in out:
    print(f"Fitting {data}")
    dataset.SHO_Fitter(dataset = data, h5_sho_targ_grp = f"{data}_SHO_Fit", max_mem=1024*64, max_cores= 20)

Fitting Noisy_Data_1
Working on:
./Data//data_raw_unmod.h5
['Y', 'X'] [60, 60]


SHO Fits will be written to:
./Data/data_raw_unmod.h5


Consider calling test() to check results before calling compute() which computes on the entire dataset and writes results to the HDF5 file
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 finished parallel computation
Rank 0 - 7% complete. Time remaining: 2.02 mins
Rank 0 finished parallel computation
Rank 0 - 15% complete. Time remaining: 2.47 mins
Rank 0 finished parallel computation
Rank 0 - 23% complete. Time remaining: 2.06 mins
Rank 0 finished parallel computation
Rank 0 - 31% complete. Time remaining: 1.77 mins
Rank 0 finished parallel computation
Rank 0 - 39% complete. Time remaining: 1.52 

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 3.35 mins
Rank 0 - 13% complete. Time remaining: 2.56 mins
Rank 0 - 20% complete. Time remaining: 2.24 mins
Rank 0 - 26% complete. Time remaining: 2.0 mins
Rank 0 - 33% complete. Time remaining: 1.77 mins
Rank 0 - 40% complete. Time remaining: 1.47 mins
Rank 0 - 46% complete. Time remaining: 1.32 mins
Rank 0 - 53% complete. Time remaining: 1.15 mins
Rank 0 - 60% complete. Time remaining: 59.05 sec
Rank 0 - 66% complete. Time remaining: 47.88 sec
Rank 0 - 73% complete. Time remaining: 37.78 sec
Rank 0 - 80% complete. Time remaining: 27.26 sec
Rank 0 - 87% complete. Time remaining: 17.4 sec
Rank 0 - 93% complete. 

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.14 mins
Rank 0 - 13% complete. Time remaining: 2.06 mins
Rank 0 - 20% complete. Time remaining: 1.94 mins
Rank 0 - 26% complete. Time remaining: 1.78 mins
Rank 0 - 33% complete. Time remaining: 1.58 mins
Rank 0 - 40% complete. Time remaining: 1.39 mins
Rank 0 - 46% complete. Time remaining: 1.24 mins
Rank 0 - 53% complete. Time remaining: 1.04 mins
Rank 0 - 60% complete. Time remaining: 54.02 sec
Rank 0 - 66% complete. Time remaining: 46.43 sec
Rank 0 - 73% complete. Time remaining: 39.42 sec
Rank 0 - 80% complete. Time remaining: 28.92 sec
Rank 0 - 87% complete. Time remaining: 19.25 sec
Rank 0 - 93% complete

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.26 mins
Rank 0 - 13% complete. Time remaining: 2.16 mins
Rank 0 - 20% complete. Time remaining: 1.98 mins
Rank 0 - 26% complete. Time remaining: 1.73 mins
Rank 0 - 33% complete. Time remaining: 1.58 mins
Rank 0 - 40% complete. Time remaining: 1.39 mins
Rank 0 - 46% complete. Time remaining: 1.2 mins
Rank 0 - 53% complete. Time remaining: 1.03 mins
Rank 0 - 60% complete. Time remaining: 54.96 sec
Rank 0 - 66% complete. Time remaining: 46.08 sec
Rank 0 - 73% complete. Time remaining: 38.3 sec
Rank 0 - 80% complete. Time remaining: 30.26 sec
Rank 0 - 87% complete. Time remaining: 19.98 sec
Rank 0 - 93% complete. 

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.45 mins
Rank 0 - 13% complete. Time remaining: 2.36 mins
Rank 0 - 20% complete. Time remaining: 2.21 mins
Rank 0 - 26% complete. Time remaining: 1.93 mins
Rank 0 - 33% complete. Time remaining: 1.7 mins
Rank 0 - 40% complete. Time remaining: 1.51 mins
Rank 0 - 46% complete. Time remaining: 1.27 mins
Rank 0 - 53% complete. Time remaining: 1.05 mins
Rank 0 - 60% complete. Time remaining: 55.78 sec
Rank 0 - 66% complete. Time remaining: 47.37 sec
Rank 0 - 73% complete. Time remaining: 37.64 sec
Rank 0 - 80% complete. Time remaining: 28.35 sec
Rank 0 - 87% complete. Time remaining: 18.77 sec
Rank 0 - 93% complete.

  Q_fit = -sqrt(d) / c


Rank 0 finished parallel computation
Rank 0 - 31% complete. Time remaining: 1.59 mins
Rank 0 finished parallel computation
Rank 0 - 39% complete. Time remaining: 1.41 mins
Rank 0 finished parallel computation
Rank 0 - 47% complete. Time remaining: 1.23 mins
Rank 0 finished parallel computation
Rank 0 - 54% complete. Time remaining: 1.06 mins


  Q_fit = -sqrt(d) / c


Rank 0 finished parallel computation
Rank 0 - 62% complete. Time remaining: 52.19 sec
Rank 0 finished parallel computation
Rank 0 - 70% complete. Time remaining: 41.51 sec
Rank 0 finished parallel computation
Rank 0 - 78% complete. Time remaining: 30.48 sec
Rank 0 finished parallel computation
Rank 0 - 86% complete. Time remaining: 19.46 sec
Rank 0 finished parallel computation
Rank 0 - 94% complete. Time remaining: 8.48 sec
Rank 0 finished parallel computation
Rank 0 - 100% complete. Time remaining: 0.0 msec
Finished processing the entire dataset!

Note: SHO_Fit has already been performed with the same parameters before. These results will be returned by compute() by default. Set override to True to force fresh computation

[<HDF5 group "/Noisy_Data_5_SHO_Fit/Noisy_Data_5-SHO_Fit_000" (4 members)>]


  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.45 mins
Rank 0 - 13% complete. Time remaining: 2.08 mins
Rank 0 - 20% complete. Time remaining: 1.95 mins
Rank 0 - 26% complete. Time remaining: 1.82 mins
Rank 0 - 33% complete. Time remaining: 1.61 mins
Rank 0 - 40% complete. Time remaining: 1.41 mins
Rank 0 - 46% complete. Time remaining: 1.29 mins
Rank 0 - 53% complete. Time remaining: 1.12 mins
Rank 0 - 60% complete. Time remaining: 57.63 sec
Rank 0 - 66% complete. Time remaining: 49.83 sec
Rank 0 - 73% complete. Time remaining: 40.51 sec
Rank 0 - 80% complete. Time remaining: 29.29 sec
Rank 0 - 87% complete. Time remaining: 19.09 sec
Rank 0 - 93% complete

  Q_fit = -sqrt(d) / c


Rank 0 finished parallel computation
Rank 0 - 78% complete. Time remaining: 30.29 sec
Rank 0 finished parallel computation
Rank 0 - 86% complete. Time remaining: 19.34 sec
Rank 0 finished parallel computation
Rank 0 - 94% complete. Time remaining: 8.39 sec
Rank 0 finished parallel computation
Rank 0 - 100% complete. Time remaining: 0.0 msec
Finished processing the entire dataset!

Note: SHO_Fit has already been performed with the same parameters before. These results will be returned by compute() by default. Set override to True to force fresh computation

[<HDF5 group "/Noisy_Data_6_SHO_Fit/Noisy_Data_6-SHO_Fit_000" (4 members)>]


  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.25 mins
Rank 0 - 13% complete. Time remaining: 2.14 mins
Rank 0 - 20% complete. Time remaining: 1.91 mins
Rank 0 - 26% complete. Time remaining: 1.75 mins
Rank 0 - 33% complete. Time remaining: 1.61 mins
Rank 0 - 40% complete. Time remaining: 1.43 mins
Rank 0 - 46% complete. Time remaining: 1.23 mins
Rank 0 - 53% complete. Time remaining: 1.11 mins
Rank 0 - 60% complete. Time remaining: 55.24 sec
Rank 0 - 66% complete. Time remaining: 44.45 sec
Rank 0 - 73% complete. Time remaining: 34.87 sec
Rank 0 - 80% complete. Time remaining: 26.65 sec
Rank 0 - 87% complete. Time remaining: 17.46 sec
Rank 0 - 93% complete

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.59 mins
Rank 0 - 13% complete. Time remaining: 2.39 mins
Rank 0 - 20% complete. Time remaining: 2.08 mins
Rank 0 - 26% complete. Time remaining: 1.85 mins
Rank 0 - 33% complete. Time remaining: 1.67 mins
Rank 0 - 40% complete. Time remaining: 1.46 mins
Rank 0 - 46% complete. Time remaining: 1.22 mins
Rank 0 - 53% complete. Time remaining: 1.08 mins
Rank 0 - 60% complete. Time remaining: 56.61 sec
Rank 0 - 66% complete. Time remaining: 46.54 sec
Rank 0 - 73% complete. Time remaining: 37.59 sec
Rank 0 - 80% complete. Time remaining: 28.71 sec
Rank 0 - 87% complete. Time remaining: 18.99 sec
Rank 0 - 93% complete

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 3.76 mins
Rank 0 - 13% complete. Time remaining: 2.92 mins
Rank 0 - 20% complete. Time remaining: 2.46 mins
Rank 0 - 26% complete. Time remaining: 2.09 mins
Rank 0 - 33% complete. Time remaining: 1.87 mins
Rank 0 - 40% complete. Time remaining: 1.44 mins
Rank 0 - 46% complete. Time remaining: 1.27 mins
Rank 0 - 53% complete. Time remaining: 1.12 mins
Rank 0 - 60% complete. Time remaining: 59.0 sec
Rank 0 - 66% complete. Time remaining: 48.35 sec
Rank 0 - 73% complete. Time remaining: 40.31 sec
Rank 0 - 80% complete. Time remaining: 29.27 sec
Rank 0 - 87% complete. Time remaining: 18.96 sec
Rank 0 - 93% complete.

  warn('status dataset not created yet')
  warn('status dataset not created yet')


Resuming computation. 0% completed already
	This class (likely) supports interruption and resuming of computations!
	If you are operating in a python console, press Ctrl+C or Cmd+C to abort
	If you are in a Jupyter notebook, click on "Kernel">>"Interrupt"
	If you are operating on a cluster and your job gets killed, re-run the job to resume

Rank 0 - 6% complete. Time remaining: 2.32 mins
Rank 0 - 13% complete. Time remaining: 2.0 mins
Rank 0 - 20% complete. Time remaining: 1.8 mins
Rank 0 - 26% complete. Time remaining: 1.69 mins
Rank 0 - 33% complete. Time remaining: 1.74 mins
Rank 0 - 40% complete. Time remaining: 1.5 mins
Rank 0 - 46% complete. Time remaining: 1.33 mins
Rank 0 - 53% complete. Time remaining: 1.17 mins
Rank 0 - 60% complete. Time remaining: 58.49 sec
Rank 0 - 66% complete. Time remaining: 41.54 sec
Rank 0 - 73% complete. Time remaining: 34.14 sec
Rank 0 - 80% complete. Time remaining: 27.18 sec
Rank 0 - 87% complete. Time remaining: 18.26 sec
Rank 0 - 93% complete. T

### Checks the results to make sure it was saved correctly

In [8]:
# print the contents of the file
dataset.print_be_tree()

/
├ Measurement_000
  ---------------
  ├ Channel_000
    -----------
    ├ Bin_FFT
    ├ Bin_Frequencies
    ├ Bin_Indices
    ├ Bin_Step
    ├ Bin_Wfm_Type
    ├ Excitation_Waveform
    ├ Noise_Floor
    ├ Noisy_Data_1
    ├ Noisy_Data_2
    ├ Noisy_Data_3
    ├ Noisy_Data_4
    ├ Noisy_Data_5
    ├ Noisy_Data_6
    ├ Noisy_Data_7
    ├ Noisy_Data_8
    ├ Position_Indices
    ├ Position_Values
    ├ Raw_Data
    ├ Spatially_Averaged_Plot_Group_000
      ---------------------------------
      ├ Bin_Frequencies
      ├ Max_Response
      ├ Mean_Spectrogram
      ├ Min_Response
      ├ Spectroscopic_Parameter
      ├ Step_Averaged_Response
    ├ Spatially_Averaged_Plot_Group_001
      ---------------------------------
      ├ Bin_Frequencies
      ├ Max_Response
      ├ Mean_Spectrogram
      ├ Min_Response
      ├ Spectroscopic_Parameter
      ├ Step_Averaged_Response
    ├ Spectroscopic_Indices
    ├ Spectroscopic_Values
    ├ UDVS
    ├ UDVS_Indices
├ Noisy_Data_1_SHO_Fit
  --------