# The ROOT file

* With ROOT, objects can be written to files

* ROOT provides its own file class, [TFile](https://root.cern/doc/master/classTFile.html), to interact with these files

* ROOT files are _binary_ and can be transparently _compressed_ to reduce disk usage

* ROOT files have a logical “file-system-like” structure

  * E.g. a directory hierarchy

Let's start with importing ROOT as usual

In [None]:
import ROOT


This is how you create a `TFile`:

In [None]:
f = ROOT.TFile("my_file.root", "RECREATE")

<center><img src="images/tfile1.png"><center>

and how you close it (note that when `f` is destroyed, the file is closed automatically):

In [None]:
f.Close()

Note that you can use the Python context manager syntax for opening a TFile, which will automatically run the necessary cleanup for you:

In [None]:
with ROOT.TFile("my_file.root", "RECREATE") as myfile:
    # do something with the file inside the scope
    pass

With the following example we demonstrate how to write an object inside a file:

In [None]:
with ROOT.TFile("my_file.root", "RECREATE") as f:
    h = ROOT.TH1D("my_histo", "Example histogram", 100, -4, 4)
    f.WriteObject(h, h.GetName())


The `"my_histo"` argument of the `TH1D` constructor is the name of the histogram, and it is also how it will be identified inside the file, we'll see that in a minute.

We should now have a file called `my_file.root` in the current directory. We will check that by using the `%%bash` magic, which allows us to run bash commands from a cell:

In [None]:
%%bash
ls -l my_file.root

We can also use the `rootls` command to inspect the contents of the ROOT file. See how the file contains an object called `my_histo` of type `TH1D`.

In [None]:
%%bash
rootls -l my_file.root

Finally, let's see how we can programmatically retrieve the histogram we just wrote in the file. 

We can access the histogram by its name using `TFile::Get()`.

In [None]:
with ROOT.TFile("my_file.root") as f: # READ is the default mode
    h = f.Get("my_histo")
    print(h)

# The HEP dataset

High Energy Physics data is made of many statistically independent collision events. 

Laying data into an "event class", then serialise and write out `N` instances of the class into a file would be very inefficient. 

In ROOT, a dataset is organised columns that can store elements of any C++ type:
* fundamental types: `int`, `float`
* C++ standard collections: `std::vector`, `std::map`
* User created C++ classes

The ROOT dataset is represented by the `TTree` class and is often simply called a tree. Columns in the dataset are instances of the `TBranch` class (often referred to as "branches").

<center><img src="images/dataset.png"></center>

- A `TTree` dataset can be written to a `TFile` (just like any other C++ object). 

- The ROOT format is logically and physically (on disk) a columnar format. 

- Different columns can be read from disk independently. 

- This translates into faster IO performance with respect to other dataset formats (HDF5, SQL).

In [None]:
%%bash
rootls -l data/example_file.root