# Analyse test SCIO data

## Rationale

To calculate the absorbance spectrum, use the following formula:

$A = -log10((S - D - G) / (SW - SWD - SWG))$

Where:

- A is the absorbance spectrum
- S is the sample spectrum
- D is the sample dark spectrum
- G is the sample gradient spectrum
- SW is the sample white spectrum
- SWD is the sample white dark spectrum
- SWG is the sample white gradient spectrum

The subtractions and divisions above are part of a common data preprocessing step in spectrophotometry called baseline correction. The idea is to remove any contributions to the measured signal that are not due to the sample itself, but rather due to the instrument or the surrounding environment. This helps to enhance the accuracy and reliability of the measurements, particularly in the presence of noise or other sources of variability.

The specific steps involved in baseline correction can vary depending on the particular instrument and experimental setup, but some common approaches include:

- Subtracting a background or reference spectrum, such as a dark spectrum or a blank sample, to remove any contributions from the instrument or the surrounding environment.
- Dividing the sample spectrum by a reference spectrum, such as a white or standard spectrum, to correct for any variations in the intensity or wavelength response of the instrument.
- Applying mathematical transformations, such as smoothing or differentiation, to remove any high-frequency noise or artifacts in the data.

In the specific example given above, the baseline correction involves subtracting the dark and white spectra to remove any contributions from the instrument or the environment, and then dividing the sample and white spectra to correct for any variations in the instrument response. The resulting absorbance spectrum should represent the contribution of the sample alone, without any interference from other sources.

The gradient spectrum is a measurement of how the intensity of the light changes over the wavelength range. It is typically measured by taking a set of measurements with a varying concentration of a sample, which allows one to calculate the slope of the absorbance vs. concentration curve at each wavelength. This is useful for determining the sensitivity of the spectrometer at different wavelengths, as well as for correcting for changes in the intensity of the light source over time. The gradient spectrum can be subtracted from the sample spectrum to correct for any changes in the intensity of the light source during the measurement.

In [1]:
import numpy as np

def calculate_absorbance_spectrum(sample, sample_dark, sample_gradient, sample_white, sample_white_dark, sample_white_gradient):
    # Subtract dark and gradient from sample
    S = np.subtract(sample, sample_dark)
    S = np.subtract(S, sample_gradient)

    # Subtract dark and gradient from sample white
    SW = np.subtract(sample_white, sample_white_dark)
    SW = np.subtract(SW, sample_white_gradient)

    # Calculate absorbance spectrum
    A = -np.log10(np.divide(S, SW))

    return A

# This function first subtracts the dark and gradient spectra from the sample spectrum and the sample white spectrum.
# It then calculates the absorbance spectrum using the formula above and returns it as output. Note that this
# implementation assumes that the input spectra are numpy

## Data extraction from raw

- I have 331 values, and normally an int is 4 bytes, so should be 1324 bytes.
- The sample and sample_dark contain 1800 bytes, the sample_gradient contains 1656 bytes
- Gradient:
  - If there is some sort of padding byte, then we get $331*4 + 331*1 = 1655$ bytes, containing 1 length byte (the first one)
  - Alternatively, the first or last 331 bytes could be a different message of something
- Main data:
  - There are 4 empty padding bytes. Then we have 1796 bytes, which is not dividable by 16, i.e. could not be encrypted data. Unless the next few bytes are encrypted too?

In [2]:
print((1800-4) - 331*5)

# Assuming encryption, both types have the first 8 bytes superfluous
print(112*16)
print(103*16)

141
1792
1648


In [4]:
import json

# Could the device ID be the key?
# Load JSON data from file
with open('test_data.json') as f:
    data = json.load(f)
    
print(data['device_id'])
key = bytes(data['device_id'], 'latin-1')
print(key)

byte_key = bytes.fromhex(data['device_id'])
print(len(byte_key), 'bytes key length, not enough as a key')


8032AB45611198F1
b'8032AB45611198F1'
8 bytes key length, not enough as a key


In [10]:
import base64
from Crypto.Cipher import AES

# Get the base64 string from the 'sample' dataset
b64_string = data['sample']

# Decode the base64 string into bytes
bytes_data = base64.b64decode(b64_string)

# Create an AES cipher object with CTR mode
cipher = AES.new(key, AES.MODE_CTR)

# Decrypt the data
decrypted_data = cipher.decrypt(bytes_data[8:])

# Print the result
#print(decrypted_data)
print(len(decrypted_data))
print(decrypted_data.hex())

import struct
# assume your bytes are in a variable called `data`
start = 0  # skip the first 1 byte
num_values = 331
value_size = 4  # each value is 4 bytes
separator_size = 1  # there is 1 byte separating each value
fmt = f">{separator_size}x{num_values}I"  # use 'x' to skip the separator byte

fmt=f'<IIIIIII'

int_data = list(struct.unpack_from(fmt, bytes_data, start))
#int_data = [x * 10**(-9) for x in int_data]

# Print the first 10 integers
print(int_data)

1792
39d1ef001712d0db4667c27fe2f82e7a045972ed2240a2f8e010027abeca8e8e832f9f3d3f717263efc9dc7bd192c8d93e88662dd675296497142341b18ffa292f084282e215bd0daaf0e60cd8f68bd75d91c7c81c65ddc44d8ec1079c62b450aab289ee07326525af8ab2dd64a6d5d9f60b9e39ad9a70b8110841418b46baec8b8d55ab29f56ecfc42ddf171144d11e2bfad1154c5ea4beb4d06ba1c27e850eb4e7cbb6acf79b1776b7b5ddbd24fff57bbd01c2cda85219260466b0b7f51b3da76152a626b72ec3f09858b535bd1c31f76870b42c7338d670ee40a0b3325e2d6a72f52b7e247ac53a5d2f6f1bcf82eb28f2f4cfdd442e7ddf7a5a1b3dcf402387934bd18173c698eb4cccbfa68d64fa555f00e0d0a434588f8a42633397cec01784dc3d5185d182999dc03b875eb71fc34edda76c069d0e3c479055a24ebf53190e8620ddba7b056758de8868302ed1537ea9b16903d97a681673f5e7e6af2c958d285a3c35120499a7815aa4776afc8370b20ecbfe317683fde1565ecd2ab8b2bb4ce08bec55d3e7703fbf2e18359b0332b27cbee343dead30b411a0310842ed150cd66de327c0fa8f62c20f6517b78092bddb3b35e88126b9d8d48dc067c5986773f5dc9658a9ca55d3561a491e71bcee66957479be886cec456d9197d174f90708098020cd96301a8ef23d3efc28953

In [7]:
import json
import base64

with open('test_result.json') as f:
    res = json.load(f)
print(res['spectrum'][:10])

# Load JSON data from file
with open('test_data.json') as f:
    data = json.load(f)
    
print(data['device_id'])

# Get the base64 string from the 'sample' dataset
b64_string = data['sample']

# Decode the base64 string into bytes
bytes_data = base64.b64decode(b64_string)

print(len(bytes_data))
#print(bytes_data)


import struct
# assume your bytes are in a variable called `data`
start = 0  # skip the first 1 byte
num_values = 331
value_size = 4  # each value is 4 bytes
separator_size = 1  # there is 1 byte separating each value
fmt = f">{separator_size}x{num_values}I"  # use 'x' to skip the separator byte

fmt = f">IIIBIBIBIBIBI"

int_data = list(struct.unpack_from(fmt, bytes_data, start))
print(len(int_data))
#int_data = [x * 10**(-9) for x in int_data]

# Print the first 10 integers
print(int_data)

[0.4615061791382704, 0.4627825684869823, 0.46383626154697105, 0.4645163970814471, 0.46488336964132015, 0.4650472112244314, 0.465207393430775, 0.46549194913784925, 0.465916165833253, 0.4664145753122686]
8032AB45611198F1
1800
13
[0, 3839637035, 3886511359, 3, 2644332554, 51, 2061554015, 169, 4228834443, 60, 4060411553, 113, 968957412]


In [68]:
from Crypto.Cipher import AES
import base64

# Load JSON data from file
with open('test_data.json') as f:
    encrypted_data = json.load(f)

# The decryption key
key = "8032AB45611198F1".encode('latin-1')

# Create an AES cipher object with the given key and ECB mode
cipher = AES.new(key, AES.MODE_ECB)

# Decode the base64 string to bytes
encrypted_bytes = base64.b64decode(encrypted_data['sample'])

# Decrypt the data
decrypted_bytes = cipher.decrypt(encrypted_bytes[4:]) # Slice to remove padding

# Convert the bytes to a list of integers
int_list = [int.from_bytes(decrypted_bytes[i:i+4], 'big', signed=False) for i in range(0, len(decrypted_bytes), 4)]

ValueError: Data must be aligned to block boundary in ECB mode

In [18]:
# Load JSON data from file
with open('test_data.json') as f:
    data = json.load(f)

# Get the base64 string from the 'sample' dataset
b64_string = data['sample_white_gradient']

# Decode the base64 string into bytes
bytes_data = base64.b64decode(b64_string)

print(len(bytes_data))

import struct
# assume your bytes are in a variable called `data`
start = 6  # skip the first 6 bytes
num_values = 330
value_size = 4  # each value is 4 bytes
separator_size = 1  # there is 1 byte separating each value
fmt = f">{num_values}I{separator_size}x"  # use 'x' to skip the separator byte

int_data_w = struct.unpack_from(fmt, bytes_data, start)

int_data_w = [x * 10**(-10) for x in int_data_w]

# Print the first 10 integers
print(int_data_w[:10])

1656
[0.3541440255, 0.2231017404, 0.1392947054, 0.4022710711, 0.032072177800000004, 0.19670508120000002, 0.177511011, 0.3887616434, 0.362936115, 0.022452329]


In [33]:
331*5

1655

In [20]:
result = [int_data_w[i] - int_data[i] for i in range(len(int_data))]

print(result)

[0.1031693756, 0.023116495999999986, -0.1381343485, 0.20564053379999997, -0.1536861458, -0.13685428839999997, -0.21021458580000002, 0.1398137382, 0.1797975043, -0.305211193, 0.1534573241, 0.1850059249, -0.1246824057, 0.010760518699999999, -0.2518457876, -0.1362046764, 0.1756478357, 0.177801524, 0.031507238800000004, -0.11126062669999999, 0.06716087929999998, 0.3191672961, 0.16115700589999998, -0.0499767582, -0.25945946729999997, -0.16555875490000002, -0.2033292049, 0.2147782078, -0.3638491023, -0.3277492739, -0.14221325629999998, -0.19868819669999999, -0.2017295674, -0.2641449517, -0.044995441199999986, 0.1312641586, -0.15080418, 0.1396801713, -0.1289353499, 0.27373594370000004, -0.1733706306, -0.3725474301, -0.05501499500000004, -0.05434773010000002, -0.14581962880000005, -0.09414841399999999, 0.006466869200000003, 0.0605959422, 0.267508748, -0.058284580700000005, -0.005765073199999998, -0.3559685656, -0.031144865699999996, -0.3064538747, 0.3363200234, 0.0095490986, 0.134424751, 0.036

In [31]:
import numpy as np

# Read raw data from file or device
raw_data = [0, 0, 0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 0, 0]

# Define wavelength range
wavelengths = np.linspace(740, 1070, 331)
print(wavelengths)

# Subtract dark spectrum (optional)
dark_spectrum = [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
processed_data = [raw - dark for raw, dark in zip(raw_data, dark_spectrum)]

# Normalize to white spectrum (optional)
white_spectrum = [0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8]
processed_data = [raw / white for raw, white in zip(processed_data, white_spectrum)]

# Convert to absorbance
reference_spectrum = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
absorbance_spectrum = [-np.log10(raw / reference) for raw, reference in zip(processed_data, reference_spectrum)]


[ 740.  741.  742.  743.  744.  745.  746.  747.  748.  749.  750.  751.
  752.  753.  754.  755.  756.  757.  758.  759.  760.  761.  762.  763.
  764.  765.  766.  767.  768.  769.  770.  771.  772.  773.  774.  775.
  776.  777.  778.  779.  780.  781.  782.  783.  784.  785.  786.  787.
  788.  789.  790.  791.  792.  793.  794.  795.  796.  797.  798.  799.
  800.  801.  802.  803.  804.  805.  806.  807.  808.  809.  810.  811.
  812.  813.  814.  815.  816.  817.  818.  819.  820.  821.  822.  823.
  824.  825.  826.  827.  828.  829.  830.  831.  832.  833.  834.  835.
  836.  837.  838.  839.  840.  841.  842.  843.  844.  845.  846.  847.
  848.  849.  850.  851.  852.  853.  854.  855.  856.  857.  858.  859.
  860.  861.  862.  863.  864.  865.  866.  867.  868.  869.  870.  871.
  872.  873.  874.  875.  876.  877.  878.  879.  880.  881.  882.  883.
  884.  885.  886.  887.  888.  889.  890.  891.  892.  893.  894.  895.
  896.  897.  898.  899.  900.  901.  902.  903.  9

ZeroDivisionError: division by zero

In [27]:
103*16

1648