# rewrite_H5.py
## 1. Introduction

The COMPAS simulations might be very large in data size while the actual data you need to reproduce your results could be small, so it might make sense to reduce the number of files and columns based on some criteria.

Here we show how you can reduce your data. The main things you need are:

1 - The seeds you want to have in your data.

2 - The files you want in your data.

3 - The columns (parameters) you want for each file.

The plain Python script to do this is `$COMPAS_ROOT_DIR/postProcessing/Folders/H5/rewrite_H5.py`. Here we just show an example of how to call the script in order to reduce the data.

### 1.1 Paths needed

In [1]:
# Set the appropriate paths to the input and output data files

pathToDataInput  = '../COMPAS_Output.h5'         
pathToDataOutput = '../COMPAS_Output_reduced.h5' 

### 1.2 Imports

In [2]:
import h5py  as h5  # For handling data format
import sys

# Import the rewrite_H5.py script
sys.path.append('PythonScripts/')
import rewrite_H5

## 2. Load the Data

In [3]:
Data  = h5.File(pathToDataInput)
print("The main files I have at my disposal are:\n",list(Data.keys()))

The main files I have at my disposal are:
 ['BSE_Common_Envelopes', 'BSE_Double_Compact_Objects', 'BSE_RLOF', 'BSE_Supernovae', 'BSE_System_Parameters', 'Run_Details']


In [4]:
# To see the parameter choices in each file, use, e.g:
#print(list(Data['BSE_System_Parameters']))

## 3. Specify which files and columns you want

We use dictionaries to specifically link all the entries:

- The `filesOfInterest` dictionary should contain all files which hold any relevant data.

- The `columnsOfInterest` dictionary specifies the parameters in each file that you want to be included in the new output HDF5 file.

- Any filters or masks should be used to determine the `seedsOfInterest` (on a per file basis), and so do not need to be included in the `columnsOfInterest` dictionary.

### 3.1 Hypothetical example

Suppose you are studying double neutron star (DNS) systems, and you want to know the initial parameters of both components. Suppose you are separately curious about the eccentricity of systems following a supernova (SN) that leaves the binary intact, and you want to use the same COMPAS run to save on CPU hours.

- To be safe, you should probably keep the entire `BSE_System_Parameters` file, which contains all of the initial system settings.

- To get information about only DNSs, you will need to create a mask for them from the `BSE_Double_Compact_Objects` file.

- Information on post-SN eccentricity and whether or not the system disrupted is found in the `BSE_Supernovae` file.

You will not need any other files. You will also want to grab the 'SEED' column from any file, since the seed is the unique identifier of each binary.

In [5]:
# Which files do you want?
filesOfInterest   = {1:'BSE_System_Parameters',\
                     2:'BSE_Double_Compact_Objects',\
                     3:'BSE_Supernovae'}

# Give a list of columns you want, if you want all, say ['All']
columnsOfInterest = {1:['All'],\
                     2:['All'],\
                     3:['SEED', 'Eccentricity']}

# The seedsOfInterest are a little more involved

## 4. Which seeds do I want per file?

In [6]:
### Do not filter out any systems/seeds from SystemParameters

seedsSystems = Data['BSE_System_Parameters']['SEED'][()]



### Of all the double compact objects, keep only the DNSs

DCOs = Data['BSE_Double_Compact_Objects']
seedsDCOs       =  DCOs['SEED'][()]

typePrimary     =  DCOs['Stellar_Type(1)'][()]
typeSecondary   =  DCOs['Stellar_Type(2)'][()]
DNSs            =  (typePrimary == 13) & (typeSecondary == 13)

seedsDNS        =  seedsDCOs[DNSs]



### Filter out disrupted systems

SNe  = Data['BSE_Supernovae']
seedsSNe     = SNe['SEED']


isUnbound    = SNe['Unbound'][()] 
intact       = (isUnbound == False)

seedsIntact  = seedsSNe[intact]



### Create seedsOfInterest dictionary -- DOUBLE CHECK ORDER :) 

seedsOfInterest   = {1:seedsSystems,\
                     2:seedsDCOs,\
                     3:seedsIntact}


# Don't forget to close the original h5 data file
Data.close()

## 5. Call the function that creates the HDF5 file

In [7]:
rewrite_H5.reduceH5(pathToOld = pathToDataInput, pathToNew = pathToDataOutput,\
                     dictFiles=filesOfInterest, dictColumns=columnsOfInterest, dictSeeds=seedsOfInterest)

In [8]:
rewrite_H5.printAllColumnsInH5(pathToDataOutput)


Filename = BSE_Double_Compact_Objects
----------------------
	   column name                             unit                length
	   --------------------------------------------------------------------
	   Coalescence_Time                        b'Myr'                 143
	   Eccentricity@DCO                        b'-'                   143
	   Mass(1)                                 b'Msol'                143
	   Mass(2)                                 b'Msol'                143
	   Merges_Hubble_Time                      b'State'               143
	   --------------------------------------------------------------------
	   Recycled_NS(1)                          b'Event'               143
	   Recycled_NS(2)                          b'Event'               143
	   SEED                                    b'-'                   143


	   SemiMajorAxis@DCO                       b'AU'                  143
	   Stellar_Type(1)                         b'-'                   143
	   --------------------------------------------------------------------
	   Stellar_Type(2)                         b'-'                   143
	   Time                                    b'Myr'                 143

Filename = BSE_Supernovae
----------------------
	   column name                             unit                length
	   --------------------------------------------------------------------
	   Eccentricity                            b'-'                   2592
	   SEED                                    b'-'                   2592

Filename = BSE_System_Parameters
----------------------
	   column name                             unit                length
	   --------------------------------------------------------------------
	   CE_Alpha                                b'-'                   50000
	   Eccentricity@ZAMS          