# The ROOT file

* In ROOT, objects are written in files
* ROOT provides its own file class: the `TFile`
* TFiles are _binary_ and can be _compressed_ (transparently for the user)
* TFiles have a logical “file-system-like” structure
  * E.g. a directory hierarchy

Let's start with importing ROOT as usual

In [None]:
import ROOT


This is how you create a `TFile`:

In [None]:
f = ROOT.TFile("my_file.root", "RECREATE")

<center><img src="images/tfile1.png"><center>

and how you close it (note that when `f` is destroyed, the file is closed automatically).

In [None]:
f.Close()

The file we've just created is empty, let's actually write something in it this time.

We will write a histogram object in it. Note how we create the histogram after creating the file, we write the histogram and we finally close the file.

In [None]:
f = ROOT.TFile("my_file.root", "RECREATE")

h = ROOT.TH1F("my_histo", "Example histogram", 100, -4, 4)
h.Write()

f.Close()


The `"my_histo"` argument of the `TH1F` constructor is the name of the histogram, and it is also how it will be identified inside the file, we'll see that in a minute.

We should now have a file called `my_file.root` in the current directory. We will check that by using the `%%bash` magic, which allows us to run bash commands from a cell:

In [None]:
%%bash
ls -l my_file.root

We can also use the `rootls` command to inspect the contents of the ROOT file. See how the file contains an object called `my_histo` of type `TH1F`.

In [None]:
%%bash
rootls -l my_file.root

Another way of inspecting the contents of a ROOT file is by using the `TBrowser` interactive tool, which allows to graphically browse ROOT files.

It can be launched with the `rootbrowse` command line tool.

<center><img src="images/tfile2.png"><center>

Finally, let's see how we can programatically retrieve the histogram we just wrote in the file. In Python, we can access the histogram by its name as if it were an attribute of the file.

In [None]:
f = ROOT.TFile("my_file.root") # READ is the default mode

h = f.my_histo
print(h)

# The HEP dataset

High Energy Physics data is made of many statistically independent collision events. Laying data into an "event class", then serialise and write out `N` instances of the class into a file would be very inefficient. In ROOT, a dataset is organised columns that can store elements of any C++ type:
* fundamental types: `int`, `float`
* C++ standard collections: `std::vector`, `std::map`
* User created C++ classes

The ROOT dataset is represented by the `TTree` class and can be simply called a tree. Columns in the dataset are instances of the `TBranch` class and can be also called branches.

<center><img src="images/dataset.png"></center>

A `TTree` dataset can be written to a `TFile` (just like any other C++ object). The ROOT format is logically and phisically (on disk) a columnar format. Different columns can be read independently from disk. This translates into faster IO performance with respect to other dataset formats (HDF5, SQL).

In [None]:
%%bash
rootls -l data/example_dataset.root