# Reading DL1 Files and Performing Analysis

You now have the data reduced to the \_dl1.h5 level, using the data reduction method you wanted. The following sections will show you how to open these files and perform analysis with them.

## Introduction

The .h5 extension is used by HDF5 files https://support.hdfgroup.org/HDF5/whatishdf5.html.

Inside the HDF5 files are HDFStores, which are the format pandas DataFrames are stored inside HDF5 files. You can read about HDFStores here: https://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables.

Pandas DataFrames are a tabular data structure widely used by data scientists for Python analysis: https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe. They allow easy querying, sorting, grouping, and processing of data.

Inside the DL1 table, each column in the table corresponds to a different parameter that characterises the waveform, and each row corresponds to a different pixel or event.

Once you have DL1 files, you are in a position to perform investigations on the properties of the waveforms from the camera. To perform this analysis, all you need is CHECLabPy - you do not need to have any of the TARGET libraries installed.

## Files

To run this tutorial you must have a DL1 file. Here I will use the DL1 file produced from the "2_Reducing_R1_to_DL1.ipynb" tutorial:

In [2]:
dl1_path = "refdata/Run17473_dl1.h5"

## Reading DL1 Files

There are four main contents to the DL1 file: the metadata, the config used to create the file, the pixel position mapping, and the DL1 DataFrame

In [3]:
from CHECLabPy.core.io import DL1Reader
reader = DL1Reader(dl1_path)

Opening HDF5 file: refdata/Run17473_dl1.h5


In [None]:
reader.metadata

In [None]:
reader.config

In [None]:
reader.mapping

In [None]:
reader.load_entire_table()

Additionally, the config used to create the DL1 file is stored, and the correct mapping for the pixels on the camera:

As you can see, the structure of the DL1 DataFrame is very intuitive, with each row reprenting a new event+pixel. Some extra useful information about the DataFrame can be obtained:

In [None]:
print("n_bytes = ", reader.n_bytes * 1E-9, "GB")
print("n_rows = ", reader.n_rows)
print("columns = ", reader.columns)
print("n_events = ", reader.n_events)
print("n_pixels = ", reader.n_pixels)

As shown above, the `reader.load_entire_table()` method loads the entire table into memory. This may be a problem for very large runs, therefore there are a variety of methods for only loading a portion of the table:

In [None]:
# Obtain the nth event
df = reader[4]
df

In [None]:
# Load a single column from table
charge = reader.select_column('charge_cc')
charge = charge.values # Convert from Pandas Series to numpy array
charge

In [None]:
# Load a single column for rows 0 to 100
charge = reader.select_column('charge_cc', start=0, stop=100)
charge = charge.values # Convert from Pandas Series to numpy array
charge

In [None]:
# Load multiple columns with the select_columns method
pixel, charge = reader.select_columns(['pixel', 'charge_cc'])
charge = charge.values # Convert from Pandas Series to numpy array
pixel = pixel.values # Convert from Pandas Series to numpy array
print('charge = ', charge)
print('pixel = ', pixel)

In [None]:
# Loop through the rows
for row in reader.iterate_over_rows():
    break
    
row

In [None]:
# Loop through the events
for df in reader.iterate_over_events():
    break
    
df

In [None]:
# Loop through chunks
for df in reader.iterate_over_chunks(chunksize=4000):
    break
    
df

By using the "iterate" methods, one can process a portion of the table at a time, and consolidate the results at the end, avoiding the need to load the entire table in memory.

## Analysis Example

Here I will show two simple examples of some analysis:

In [None]:
%matplotlib inline

In [None]:
# Creating a histogram of all the cross-correlated charge in the DL1 file, and printng the charge corresponding to the maximum bin
from matplotlib import pyplot as plt
from CHECLabPy.core.io import DL1Reader
reader = DL1Reader(dl1_path)
charge = reader.select_column('charge_cc').values
hist, edges, _ = plt.hist(charge, bins=100)
between = (edges[1:] + edges[:-1]) / 2
max_ = between[hist.argmax()]
print(max_)

In [None]:
# Plotting a camera image of charge extracted per pixel for the nth event
from matplotlib import pyplot as plt
from CHECLabPy.core.io import DL1Reader
from CHECLabPy.plotting.camera import CameraImage

reader = DL1Reader(dl1_path)

charge = reader.select_column('charge_cc').values
iev = reader.select_column('iev').values
charge = charge[iev == 10]

# Or alternatively:
charge = reader[10]['charge_cc'].values

camera = CameraImage.from_mapping(reader.mapping)
camera.image = charge
camera.add_colorbar(label="Charge")
camera.annotate_on_telescope_up()

plt.show()