# Import of OMNIC files

Thermo Scientific [OMNIC](https://www.thermofisher.com/search/results?query=OMNIC)
software have two proprietary binary file formats:

- .spa files that handle single spectra
- .spg files which contain a group of spectra

Both have been reverse engineered, hence allowing extracting their key data.
The Omnic reader of Spectrochempy ( `read_omnic()` ) has been developed based on
posts in open forums on the .spa file format and extended to .spg file formats.

## Import spg file

Let's import an .spg file from the `datadir` (see :ref:`import.ipynb` for details)):
and display its main attributes:

In [1]:
import spectrochempy as scp

0,1
,SpectroChemPy's API - v.0.8.0 ©Copyright 2014-2025 - A.Travert & C.Fernandez @ LCS


In [2]:
X = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG")
X

Running on GitHub Actions
MPL Configuration directory: /home/runner/.config/matplotlib
Stylelib directory: /home/runner/.config/matplotlib/stylelib


The displayed attributes are detailed in the following:

- `name` is the name of the group of spectra as it appears in the .spg file. OMNIC
  sets this name to the .spg filename used at the creation of the group.
  In this example, the name ("Group sust Mo_Al2O3_base line.SPG") differs
  from the filename (`"CO@Mo_Al2O3.SPG"`) because the latter has been changed from
  outside OMNIC (directly in the OS).

- `author` is that of the creator of the NDDataset (not of the .spg file, which, to
  our knowledge, does not have
  this type of attribute). The string is composed of the username and of the machine
  name as given by the OS, e.g., `"username@machinename"`.
  It can be accessed and changed using `X.author` .

- `created` is the creation date of the NDDataset (again not that of the .spg file).
  It can be accessed (or even changed) using `X.created` .

- `description` indicates the complete pathname of the .spg file. As the pathname is
  also given in the history (below), it can be a good practice to give a
  self-explaining description of the group, for instance:

In [3]:
X.description = "CO adsorption on CoMo/Al2O3, difference spectra"
X.description

'CO adsorption on CoMo/Al2O3, difference spectra'

or directly at the import:

In [4]:
X = scp.read_omnic("irdata//CO@Mo_Al2O3.SPG", description="CO@CoMo/Al2O3, diff spectra")
X.description

'CO@CoMo/Al2O3, diff spectra'

- `history` records changes made to the dataset. Here, right after its creation, it
  has been sorted by date (see below).

Then come the attributes related to the data themselves:

- `title` (not to be confused with the `name` of the dataset) describes the nature
  of data (here **absorbance** ).

- `values` shows the data as quantity (with their units when they exist - here a.u.
  for absorbance units).

- The numerical values ar accessed through the `data` attribute and the units
  throughout `units` attribute.

In [5]:
X.values

0,1
Magnitude,[[0.000803191214799881 3.787875175476074e-05 ... 0.000302683562040329 0.0003744959831237793] [-3.607943654060364e-05 -0.0001980997622013092 ... 0.0003089122474193573 0.0011698119342327118] ... [0.0008356980979442596 -0.0001386702060699463 ... -0.0005221068859100342 -0.001121222972869873] [0.0005654506385326385 -0.00011600926518440247 ... -0.0005699768662452698 -0.000630699098110199]]
Units,a.u.


In [6]:
X.data

array([[0.0008032, 3.788e-05, ..., 0.0003027, 0.0003745],
       [-3.608e-05, -0.0001981, ..., 0.0003089,  0.00117],
       ...,
       [0.0008357, -0.0001387, ..., -0.0005221, -0.001121],
       [0.0005655, -0.000116, ..., -0.00057, -0.0006307]])

In [7]:
X.units

- `shape` is the same as the ndarray `shape` attribute and gives the shape of the
  data array, here 19 x 3112.

Then come the attributes related to the dimensions of the dataset.

- `x` : this dimension has one coordinate (a `Coord` object) made of the 3112 the
  wavenumbers.

In [8]:
print(X.x)
X.x

Coord: [float64] cm⁻¹ (size: 3112)


- `y` : this dimension contains:

  - one coordinate made of the 19 acquisition timestamps
  - two labels:

    - the acquisition date (UTC) of each spectrum
    - the name of each spectrum.

In [9]:
X.y

- `dims` : Note that the `x` and `y` dimensions are the second and first
   dimension respectively. Hence, `X[i,j]` will return the absorbance of the ith
   spectrum at the jth  wavenumber.
   However, this is subject to change, for instance if you perform operation on your
   data such as
   [Transposition](../processing/transformations.ipynb#Transposition). At any time
   the attribute `dims` gives the correct names (which can be modified) and order of
   the dimensions.

In [10]:
X.dims

['y', 'x']

### Acquisition dates and `y` axis

The acquisition timestamps are the *Unix times* of the acquisition, i.e. the time
elapsed in seconds since the
reference date of Jan 1st 1970, 00:00:00 UTC.

In [11]:
X.y.values

0,1
Magnitude,[1476798575.0 1476798846.0 ... 1476806493.0 1476806797.0]
Units,s


In OMNIC, the acquisition time is that of the start of the acquisition.
As such these may be not convenient to use directly (they are currently in the order
of 1.5 billion...)
With this respect, it can be convenient to shift the origin of time coordinate to
that of the 1st spectrum,
which has the index `0` :

In [12]:
X.y = X.y - X.y[0]
X.y.values

0,1
Magnitude,[0.0 271.0 ... 7918.0 8222.0]
Units,s


Note that you can also use the inplace subtract operator to perform the same
operation.

In [13]:
X.y -= X.y[0]

It is also possible to use the ability of SpectroChemPy to handle unit changes. For
this one can use the `to` or `ito` (inplace) methods.

```ipython
val = val.to(some_units)
val.ito(some_units)   # the same inplace
```

In [14]:
X.y.ito("minute")
X.y.values

0,1
Magnitude,[0.0 4.517 ... 131.967 137.033]
Units,min


As shown above, the values of the `Coord` object are accessed through the `values`
attribute. To get the last
values corresponding to the last row of the `X` dataset, you can use:

In [15]:
tf = X.y.values[-1]
tf

Negative index in python indicates the position in a sequence from the end, so -1
indicate the last element.

Finally, if for instance you want the `x` time axis to be shifted by 2 minutes, it
is also very easy to do so:

In [16]:
X.y = X.y + 2
X.y.values

0,1
Magnitude,[2.0 6.517 ... 133.967 139.033]
Units,min


or using the inplace add/subtract operator:

In [17]:
X.y -= 2  # this restore the previous coordinates
X.y.values

0,1
Magnitude,[0.0 4.517 ... 131.967 137.033]
Units,min


### The order of spectra

The order of spectra in OMNIC .spg files depends on the order in which the spectra
were included in the OMNIC
window before the group was saved. By default, spectrochempy reorders the spectra
by acquisition date but the
original OMNIC order can be kept using the `sortbydate=True` at the function call.
For instance:

In [18]:
X2 = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG", sortbydate=False)

In the present case, this will change nothing because the spectra in the OMNIC file
were already ordered by increasing data.

Finally, it is worth mentioning that a `NDDataset` can generally be manipulated as
numpy ndarray. Hence, for
instance, the following will inverse the order of the first dimension:

In [19]:
X = X[::-1]  # reorders the NDDataset along the first dimension going backward
X.y.values  # displays the `y` dimension

0,1
Magnitude,[137.033 131.967 ... 4.517 0.0]
Units,min


<div class='alert alert-info'>
<b>Note</b>

<strong>Case of groups with different wavenumbers</strong> <br/>
An OMNIC .spg file can contain spectra having different wavenumber axes (e.g.
different spacings or wavenumber
ranges). In its current implementation, the spg reader will purposely return an error
because such spectra
<i>cannot</i> be included in a single NDDataset which, by definition, contains items that
share common axes or dimensions !
Future releases might include an option to deal with such a case and return a list of
NDDatasets. Let us know if you
are interested in such a feature, see <a href="https://www.spectrochempy.fr/devguide/issues.html">Bug reports and enhancement requests.</a>
</div>


## Import of .spa files

The import of a single spectrum follows exactly the same rules as that of the import
of a group:

In [20]:
scp.read_omnic("irdata/subdir/7_CZ0-100_Pd_101.SPA")

The omnic reader can also import several spa files together, providing that they share
a common axis for the wavenumbers.

This is the case of the following files in the irdata/subdir directory:
"7_CZ0-100 Pd_101.SPA", ..., "7_CZ0-100 Pd_104.spa".

It is possible to import them in a single NDDataset by using
the list of filenames
in the function call:

In [21]:
list_files = (
    "7_CZ0-100_Pd_101.SPA",
    "7_CZ0-100_Pd_102.SPA",
    "7_CZ0-100_Pd_103.SPA",
    "7_CZ0-100_Pd_104.SPA",
)
scp.read_omnic(list_files, directory="irdata/subdir", name="Merged 7_CZ0-100 Pd")

When compatible .spa files are alone in a directory, a very convenient is to call the
read_omnic method
using only the directory path as argument that will gather the .spa files together:

In [22]:
scp.read_omnic("irdata/subdir/1-20")

In the case  where not all files are compatibles, they are returned in different NDDatasets(with independent merging).

For example:

In [23]:
Y = scp.read_omnic("irdata/subdir/")
Y

Here we get a list of two NDDataset because there is two type of file in the directory (`.spa` and `.srs`).

The desired dataset can be obtained using a list:

In [24]:
Y[1]

Other ways to select only the required file with extension (`.spa`)are:

- writing a list as previously explicitely  listing the required files.
- using a more specific reader:

In [25]:
scp.read_spa("irdata/subdir/")

- using a pattern filter

In [26]:
scp.read_omnic("irdata/subdir/", pattern="*.spa")

One advantage of the latter solution is a greter flexibility. For instance the lollowing will select only the `*101.spa` and `*102.spa`:

In [27]:
scp.read_omnic("irdata/subdir/", pattern="*10[12].spa", merge=False)

## Handling Metadata

Here is an example of accessing metadata

In [28]:
X = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG")
print(f"Title: {X.title}")
print(f"Origin: {X.origin}")
print(f"Description: {X.description}")

Title: absorbance
Origin: 
Description: Omnic title: Group sust Mo_Al2O3_base line.SPG
Omnic filename: /home/runner/.spectrochempy/testdata/irdata/CO@Mo_Al2O3.SPG


and now do some modifications:

In [29]:
X.title = "Modified title"
X.origin = "OMNIC measurement"
X.description = "Modified description"
print("Modified metadata:")
print(f"Title: {X.title}")
print(f"Origin: {X.origin}")
print(f"Description: {X.description}")

Modified metadata:
Title: Modified title
Origin: OMNIC measurement
Description: Modified description


Reading the metadata now reflect the change

In [30]:
X.title

'Modified title'

## Error Handling

When trying to read file, it is a good practice to handle errors explicitely. For example:

In [31]:
try:
    X = scp.read_omnic("nonexistent_file.spa")
except FileNotFoundError:
    scp.error_(FileNotFoundError, "File not found")
except Exception as e:
    scp.error_(f"Error reading file: {e}")

 File/directory not found locally: Attempt to download it from the GitHub repository `spectrochempy_data`...


 ERROR | FileNotFoundError: File not found


## Advanced Data Operations

Example of data manipulation:

In [32]:
X = scp.read_omnic("irdata/CO@Mo_Al2O3.SPG")

- Baseline correction

In [33]:
X_corrected = X - X[0]  # Subtract first spectrum as baseline

- Normalization

In [34]:
X_normalized = X / X.max()

In [35]:
print("Original data shape:", X.shape)
print("Max value before normalization:", X.max())
print("Max value after normalization:", X_normalized.max())

Original data shape: (19, 3112)
Max value before normalization: 0.24812382459640503 a.u.
Max value after normalization: 1.0
