Skip to content

Output data formats

Urs Ganse edited this page Dec 12, 2020 · 16 revisions

Three kinds of data

Vlasiator produces three kinds of output files during a simulation run, the contents of which vary based on simulation parameters:

  1. logfile.txt, the simulation run log. This is a timestamped ascii file providing basic diagnostic output of the run, including memory usage, time steps etc.
  2. diagnostic.txt. The contents of this file can be configured by the diagnostic = options in the run config file. In general, this ascii file will contain one line per (1, 10, or so) simulation timesteps, with the columns determined by the selected data reducers. These include, for example, simple scalar values like overall plasma mass, number of velocity space blocks in the simulation, charge balance, divergence of magnetic field etc.
  3. VLSV files are the main output data products. These files come in multiple varieties:
  • Restart files. These contain the whole simulation state, including the full phase space density, all relevant electromagnetic fields and metadata. Simulations can be restarted from them (hence the name), but they tend to be very heavy, easily multiple terabytes in size for production runs. They do not contain the output of data reducer operators (detailed below).
  • Bulk files. In these, reduced spatial simulation data is written for further scientific analysis. Usually, this includes moments of the distribution functions and electromagnetic fields, but can also contain much more complex data reducer operators, as listed below. It is also possible (and common) to configure a subset (e.g. every 25th cell) of the velocity distribution functions to be written for further analysis.

The VLSV file format

The VLSV library is used to write this versatile container format. Analysator can be used to load and handle these files in python.

The file format is optimized for parallel write performance: Data is dumped to disk in the same memory structure as it is in the Vlasiator simulation, as binary blobs. Once all data is written, an XML footer that describes the data gets added to the end.

An example XML footer might look like this:

<VLSV>
   <MESH arraysize="208101" datasize="8" datatype="uint" max_refinement_level="1" name="SpatialGrid" type="amr_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">989580</MESH>
   <MESH arraysize="652800" datasize="8" datatype="uint" name="fsgrid" type="multi_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">4011008</MESH>
   <PARAMETER arraysize="1" datasize="8" datatype="float" name="time" vectorsize="1">989488</PARAMETER>
   <PARAMETER arraysize="1" datasize="8" datatype="float" name="dt" vectorsize="1">989496</PARAMETER>
   <VARIABLE arraysize="123544" datasize="8" datatype="uint" mesh="SpatialGrid" name="CellID" vectorsize="1">1136</VARIABLE>
   <VARIABLE arraysize="652800" datasize="8" datatype="float" mesh="fsgrid" name="fg_b" unit="T" unitConversion="1.0" unitLaTeX="$\mathrm{T}$" variableLaTeX="$B$" vectorsize="3">9558184</VARIABLE>
</VLSV>

Each XML tag describes one dataset in the file, with arraysize, datatype, datasize and vectorsize describing the array. The XML tag's content contains the byte offset in the file, where this dataset's raw binary data lies.

The two most important tag types are PARAMETER, for single numbers describing the file as a whole, such as resolutions, timesteps etc., and VARIABLE, for spatially varying data reducer data maps.

Additional metadata is often added to the datasets, such as their physical units, LaTeX formatted plotting hints, etc.

Spatial ordering: Vlasov- vs. FSGrid vs. Velocity space variables

Note that the XML tags in the file do not yet give sufficient information to describe the spatial structure of the variable arrays. The construction differs depending on the grid they are linked to (denoted by the mesh= attribute):

  • Vlasov grid variables, typically marked with a vg_ in their name, are stored as cell parameters in the DCCRG grid underlying the vlasov solver. As the simulation is dynamically load balanced, their memory order changes unpredictably, so the data must be presumed completely unordered in the file.

    Fortunately, the CellID variable gets written into the file first, which contains the flattened spatial index of the given simulation cells in the same order as all further Vlasov grid variables. In the simplest, non mesh-refined version, the CellID is defined as

CellID = x_index + x_size * y_index + x_size * y_size * z_index + 1

By reading both the intended target variable and the CellID, the data can thus be brought into flattened spatial order by simply sorting both arrays in the same order. In analysator, this is typically achieved by running

c = file.read_variable("CellID")
b = file.read_variable("rho")
b = b[numpy.argsort(c)]
b.reshape(f.get_spatial_mesh_size())
  • FSGrid variables are stored on the simulations fieldsolver grid, which is partitioned quite differently for performance reasons. The spatial domain is subdivided into equally sized rectangular domains, which are written for each compute rank in parallel. If written from a simulation with a single MPI rank, the resulting array is directly ordered in spatial order, as by the cellID definition above. For simulations on multiple ranks, every rank writes its data in this structure, end-to-end. The num_writing_ranks argument in the XML tag allows the spatial partition to be reconstructed on load time. Code that does this reconstuction is available here (C++ version) and here (python version)

  • Velocity space variables (at the moment, this is only the phase space density f for every species), follow yet another structure due to the sparse velocity grid structure on which they are stored...

Simulation data reducers

This is a (hopefully) up-to date list of simulation output options that can be enabled in the config file. Note that older simulation possibly use slightly different names, as the code is in constant development.

Variable name config option unit meaning literature ref
CellID always written cells Spatial ordering of vlasov grid cells
fg_b fg_b T Overall magnetic field (vector) Palmroth et al. 2018
fg_b_background fg_backgroundb T Static background magnetic field (i.e. dipole field in a magnetosphere simulation. Vector.) Palmroth et al. 2018
fg_b_perturbed fg_perturbedb T Fluctuating component of the magnetic field (vector) Palmroth et al. 2018
fg_e fg_e V/m Electric field, calculated as ∇ × B (Vector)
vg_rhom vg_rhom kg/m³ combined mass density of all simulation species
fg_rhom fg_rhom kg/m³ "
vg_rhoq vg_rhoq C/m³ combined charge density of all simulation species
fg_rhoq fg_rhoq C/m³ "
proton_vg_rho populations_rho 1/m³ Number density for each simulated particle population
vg_v vg_v m/s Bulk plasma velocity (velocity of the centre-of-mass frame, vector)
fg_v fg_v m/s "
proton_vg_v populations_v m/s Per-population bulk velocity
proton_vg_ rho_thermal populations_moments _thermal 1/m³ Number density for the thermal component of every population
proton_vg_v _thermal " m/s Velocity (vector) for the thermal component of every population
proton_vg_ptensor _diagonal_thermal " Pa Diagonal components of the pressure tensor for the thermal component of every population
proton_vg_ptensor _offdiagonal_thermal " Pa Off-Diagonal components of the pressure tensor for the thermal component of every population
proton_vg_rho _nonthermal populations_moments _nonthermal 1/m³ Number density for the nonthermal component of every population
proton_vg_v _nonthermal " m/s Velocity (vector) for the nonthermal component of every population
proton_vg_ptensor _diagonal_nonthermal " Pa Diagonal components of the pressure tensor for the nonthermal component of every population
proton_vg_ptensor _offdiagonal_nonthermal " Pa Off-Diagonal components of the pressure tensor for the nonthermal component of every population
proton_minvalue populations_vg _effectivesparsitythreshold m^6/s³ Effective sparsity threshold for every cell. Yann's PhD Thesis, page 91
proton_rholossadjust ```populations_vg _rho_loss_adjust`` 1/m³ Tracks how much mass was lost in the sparse velocity space block removal Yann's PhD Thesis, page 90
vg_lbweight vg_lbweight arb. unit Load balance metric, used for dynamic rebalancing of computational load between mpi tasks
vg_maxdt_acceleration vg_maxdt_acceleration s Maximum timestep limit of the acceleration solver
proton_vg_maxdt_acceleration populations_vg _maxdt_acceleration s ", per-population
vg_maxdt_translation vg_maxdt_translation s Maximum timestep limit of the translation solver
proton_vg_maxdt_translation populations_vg_maxdt_translation s ", per-population