Skip to content

What Are All Those File Outputs and How Can I Export My Data

michaelmarty edited this page Mar 14, 2022 · 1 revision

Many people have asked, "How do I export ... from my data in UniDec?" The answer is often that UniDec is already exporting that data. You just have to know where to find it.

The overall philosophy with UniDec is that you should access to any and all of your data. Thus, I've tried to make the file exports as easy and obvious as possible while still balancing keeping file sizes low and speed fast. Hopefully the info below will help you write scripts to plot and analyze your data.

If there is data that you would like to get at but can't find below, let me know, and I can usually add in a quick export to dump the data out to someplace you can get it.

To help you make sense of all these files, I've put together this wiki explaining what UniDec exports and what it means. This is not an exhaustive list, but it should hopefully answer most questions. I'll add to it as people have more questions.

Where does everything go?

When you open a file with UniDec, it creates a unidecfiles folder to store all the inputs and outputs. If you open a file, like a Thermo Raw, mzML, or text file, it will create the folder as <filename>_unidecfiles in the same directory as the file you opened. If you open a directory, like a Waters Raw or Agilent .d file, it will create a <filename>_rawdata.txt file inside the directory and create the _unidecfiles directory next to it. Note, when you see _something.ext below, assume adds the filename stem to the top to mean <filename>_something.ext.

The Config File

Arguably, the most important file in the folder is the _conf.dat file, which is the config file. This simple text file has all of the config parameters for the deconvolution and plotting in the Python GUI. It is listed simply as parameter_name value. When UniDec runs, it simply calls unidec.exe <filename>_conf.dat. All the info it needs is embedded in that config file. This config file is overwritten pretty much any time you click a button on UniDec to keep it constantly updated with the parameters on the GUI. If you want to save an old config, just copy/paste this text file or rename it.

Loading a Config File

If you want to load an old text file or one from another data set, you can click Tools > Load External Config File. You can also drag and drop a _conf.dat file into the main window to automatically load the settings.

Default Config Files

You can also set the current parameters as the default set using the Load or Save Default Config under the File menu. This simply exports a conf.dat file into the top directory of UniDec that can be retrieved simply in the future.

Custom Presets

Finally, you can create custom presets by pasting these config files into the Presets folder in the UniDec_Windows folder (if you are using the binary) or the unidec_bin/Presets folder (if you are using the source code)

Other Setup Files

In addition to the config file, there are three other files that are sometimes used for saving setup data.

  • _mfile.dat: This file saves the mass list used by the "Mass List Window" option, which allows the mass range to be limited to a fixed window around listed masses.
  • _manualfile.dat: This file stores the manual assignment information, which allows certain m/z ranges to be assigned to specific charge states.
  • _ofile.dat: This file stores the information used by the Tools > Oligomer and Mass Tools window for matching potential species to peaks. All these should look similar to the tables in the GUI but in text format.

The Actual Data Files

Ok, enough setup. Let's talk about the actual data files.

The Input Data Files

These two files are produced prior to deconvolution:

  • _rawdata.txt: This is the raw data saved again as a text file for easy retrieval.
  • _input.dat: The input file is the processed data that UniDec will read in for deconvolution.

Deconvolution Results

These next files are produced by UniDec as the results of the deconvolution:

  • _mass.txt: This is probably the most useful output. It lists the deconvolved mass with the mass in the first column and intensity in the second.
  • _error.txt: This file lists some simple parameters from the deconvolution, like the time, number of iterations, scores, and error of the fit.
  • _fitdat.bin: This file has the fit data showing how well the deconvolution fits the original m/z data. To save space, this file is exported as a binary file rather than a text file. It is an array of floats with the length equal to the length of the input data. Because the m/z values are already present in the input file and are identical to that, this file only has the intensity of the fit values. If you need the m/z values, import them from the _input.dat file.
  • _grid.bin: This file has the 2D m/z vs. charge deconvolution results. It is stored as a 1D list of intensity float values in a binary file. After you import it, you will need to reshape it into a 2D array of intensity values with the first dimension equal to the length of the input data. The second dimension is the charge state, the length will be based on the startz and endz in the config file.
  • _massgrid.bin: This file has the 2D mass vs. charge deconvolution results. Similar to the _grid.bin file, it is a 1D list of floats in a binary file. The only difference is that the first dimension is the length of the mass file.

Peak Detection

After you pick peaks, an additional set of data will be exported:

  • _peaks.dat: This is a simple list of peaks with mass in the first column and intensity in the second column.

If you run batch or click Analysis > Export Peaks Parameters and Data, you can get a bunch of other useful data on the selected peaks:

  • _peakparam.dat: This file has lots of useful information, including the peak height, peak area, centroid mass, apex mass, FWHM, average charge, and standard deviation of the charge state distribution. See headers in the file for what each column means.
  • _mzpeakdata.dat and _chargedata.dat: These two files contain the m/z and intensity values (respectively) for each charge state of each peak. The rows represent different peaks with columns representing different charge states. There is also a version called _chargedata_areas.dat that has peak areas rather than peak height for each.

Other Files

Other windows such as the 2D Grid Extraction and the Mass Defect windows will export text files giving the outputs from those modules. Contact me if you need help interpreting them, and I'll add more here as needed.

Figures

Figures are also output in the same _unidecfiles folder. There are several presets for figure exports (PNG, PDF, EPS) in the File > Save Figures Presets menu. You can also generate a PDF report there (will need to install MikTex, see main page for info). You can also use the File > Save Figures As menu to customize your figure save outputs, including the size and file format.

Shortcut to Open Folder

If you are struggling to find where UniDec is saving the files, you can use the Advance > Open Saved File Directory option to open the directory where everything is saved.

Shortcut to Export Plot Data

There are some awesome shortcuts to copy data or figures from plots, which basically let you export any data you see plotted. See the Hidden Features Wiki Page for more info.

UniDecCD NPZ File

UniDecCD (UCD) uses a special compressed numpy file type to store data after retrieving it from the raw file. This file will be called _rawdata.npz. You can load this in Python with: np.load(path)['data'].

MetaUniDec is Different

MetaUniDec was invented in part to avoid the hassle of all these text files. As such, it collects all of this data within an HDF5 file. The HDF5 file is easy to access with different programming languages, and you can use the HDFView software https://www.hdfgroup.org/downloads/hdfview/. HDFView allows you to easily copy/paste data arrays from the HDF5 and into something like Excel. Much of the data structures and nomenclature are the same and are described in the original paper: https://link.springer.com/article/10.1007%2Fs13361-018-1951-9. The 2D arrays are often loaded as flattened into 1D and will need to be reshaped into a 2D array.

For example, if you want to extract the deconvolved mass data for a specific spectrum in the HDF5 (for example, spectrum #2), you would look under /ms_dataset/2/mass_data. This should have the mass values in the first column and the intensity values in the second. If you wanted to get the summed deconvolved mass data, you would look under /ms_dataset/mass_axis for the mass values and /ms_dataset/mass_sum for the intensity values. The /ms_dataset/mass_grid is a 1D array of length n x m, where n is the number of datasets and m is the length of the summed axis.

Some technical history

UniDec was originally written only as C code, and there was a Mathematica script that I used to call the executable and plot the results. Mathematica defaults to exporting text files as .dat. I quickly switched to Python as the primary API and GUI interface, but I kept the convention of naming any exports from the Python code as .dat. Conversely, the outputs from the C code are .txt. It's a minor technical artifact, but I'm listing it in case anyone finds it useful or interesting.