## Handling simulation meta data using pandas

Suppose your simulations are run using a parameter.cfg file that contains parameters and meta data for your simulations. Also suppose your simulation results are too large to store them on your laptop long term, so instead you outsource storage to your Google Drive and only keep the .cfg file on your laptop. In this situation we can use pandas to easily make sense of the meta data and find the simulation that we are looking for. Our example data files for this exercise are stored in "./data/".

We need the following libraries:

In [2]:
import pandas
import glob
from configparser import ConfigParser

Glob returns a list of all paths fitting a pattern:

In [33]:
files = glob.glob("./data/*.cfg")
files

['./data/simulation1.cfg',
 './data/simulation3.cfg',
 './data/simulation0.cfg',
 './data/simulation2.cfg']

We initialise a ConfigParser to read the .cfg files and tell it which section(s) we are interested in: 

In [39]:
config = ConfigParser()
config.optionxform = str
sections = ['Simulation', 'Parameter']

Create a dictionary of dictionaries holding all simulation parameters from the files. The truncated file name serves as the key for each parameter dictionary _d_ in _data_.

In [48]:
data = {}
for file in files:
    config.read(file)
    d = {}
    for section in sections:
        options = config.items(section)
        for key, value in options:
            d[key] = value
    fname = file.split("/")[-1]
    data[fname] = d

In [49]:
data

{'simulation1.cfg': {'id': 'simulation1',
  'solver': 'direct',
  'debug': '0',
  'N': '2',
  'TOL': '1e-7',
  'rho': '1000 * kg/m**3',
  'K': '1e-7 * m**2/Pa/s',
  'phi': '0.2',
  'beta': '1',
  'qi': '0',
  'qo': '0',
  'tf': '0.5 * s',
  'dt': '0.1 * s',
  'theta': '0.5'},
 'simulation3.cfg': {'id': 'simulation3',
  'solver': 'direct',
  'debug': '0',
  'N': '2',
  'TOL': '1e-7',
  'rho': '1000 * kg/m**3',
  'K': '1e-7 * m**2/Pa/s',
  'phi': '0.1',
  'beta': '1',
  'qi': '0.1',
  'qo': '0',
  'tf': '0.5 * s',
  'dt': '0.1 * s',
  'theta': '0.5'},
 'simulation0.cfg': {'id': 'simulation0',
  'solver': 'direct',
  'debug': '0',
  'N': '2',
  'TOL': '1e-7',
  'rho': '1000 * kg/m**3',
  'K': '1e-7 * m**2/Pa/s',
  'phi': '0.1',
  'beta': '1',
  'qi': '0',
  'qo': '0',
  'tf': '0.5 * s',
  'dt': '0.1 * s',
  'theta': '0.5'},
 'simulation2.cfg': {'id': 'simulation2',
  'solver': 'direct',
  'debug': '0',
  'N': '2',
  'TOL': '1e-7',
  'rho': '1000 * kg/m**3',
  'K': '1e-7 * m**2/Pa/s',
  'p

Create a pandas table from the dictionary _data_

In [24]:
tab = pandas.DataFrame.from_dict(data, orient="index")

Now we can look at the values of a parameter for each file in our table:

In [26]:
tab.TOL

simulation0.cfg    1e-7
simulation1.cfg    1e-7
simulation2.cfg    1e-7
simulation3.cfg    1e-7
Name: TOL, dtype: object

In [27]:
tab.beta

simulation0.cfg      1
simulation1.cfg      1
simulation2.cfg    0.1
simulation3.cfg      1
Name: beta, dtype: object

In [29]:
tab.qi

simulation0.cfg      0
simulation1.cfg      0
simulation2.cfg      0
simulation3.cfg    0.1
Name: qi, dtype: object

We can also filter by parameter values

In [31]:
tab[tab.beta == '0.1']

Unnamed: 0,N,TOL,rho,K,phi,beta,qi,qo,tf,dt,theta
simulation2.cfg,2,1e-07,1000 * kg/m**3,1e-7 * m**2/Pa/s,0.1,0.1,0,0,0.5 * s,0.1 * s,0.5


In [32]:
tab[tab.qi == '0']

Unnamed: 0,N,TOL,rho,K,phi,beta,qi,qo,tf,dt,theta
simulation0.cfg,2,1e-07,1000 * kg/m**3,1e-7 * m**2/Pa/s,0.1,1.0,0,0,0.5 * s,0.1 * s,0.5
simulation1.cfg,2,1e-07,1000 * kg/m**3,1e-7 * m**2/Pa/s,0.2,1.0,0,0,0.5 * s,0.1 * s,0.5
simulation2.cfg,2,1e-07,1000 * kg/m**3,1e-7 * m**2/Pa/s,0.1,0.1,0,0,0.5 * s,0.1 * s,0.5
