# Basic file I/O with Uproot

![uproot](img/uproot_logo.png)

# What is Uproot?

Uproot is a Python package that reads and writes ROOT files and is *only* concerned with reading and writing (no analysis, no plotting, etc.). It interacts with NumPy, Awkward Array, and Pandas for computations, boost-histogram/hist for histogram manipulation and plotting, Vector for Lorentz vector functions and transformations, Coffea for scale-up, etc.

Uproot is implemented using only Python and Python libraries. It doesn't have a compiled part or require a specific version of ROOT. (This means that if you *do* use ROOT for something other than I/O, your choice of ROOT version is not constrained by I/O.)

![abstraction-layers](img/abstraction-layers.png)

As a consequence of being an independent implementation of ROOT I/O, Uproot might not be able to read/write certain data types. Which data types are not implemented is a moving target, as new ones are always being added. A good approach for reading data is to just try it and see if Uproot complains. For writing, see the lists of supported types in the [Uproot documentation](https://uproot.readthedocs.io/en/latest/basic.html#writing-objects-to-a-file) (blue boxes in the text).

# Reading data from a file

## Opening the file

To open a file for reading, pass the name of the file to [uproot.open](https://uproot.readthedocs.io/en/latest/uproot.reading.open.html). In scripts, it is good practice to use [Python's with statement](https://realpython.com/python-with-statement/) to close the file when you're done, but if you're working interactively, you can use a direct assignment.

In [None]:
import skhep_testdata

filename = skhep_testdata.data_path(
    "uproot-Event.root"
)  # downloads this test file and gets a local path to it

import uproot

file = uproot.open(filename)

To access a remote file via HTTP or XRootD, use a `"http://..."`, `"https://..."`, or `"root://..."` URL. If the Python interface to XRootD is not installed, the error message will explain how to install it.

## Listing contents

This "`file`" object actually represents a directory, and the named objects in that directory are accessible with a dict-like interface. Thus, `keys`, `values`, and `items` return the key names and/or read the data. If you want to just list the objects without reading, use `keys`. (This is like ROOT's `ls()`, except that you get a Python list.)

In [None]:
file.keys()

Often, you want to know the type of each object as well, so [uproot.ReadOnlyDirectory](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html) objects also have a `classnames` method, which returns a dict of object names to class names (without reading them).

In [None]:
file.classnames()

## Reading a histogram

If you're familiar with ROOT, `TH1F` would be recognizable as histograms and `TTree` would be recognizable as a dataset. To read one of the histograms, put its name in square brackets:

In [None]:
h = file["hstat"]
h

Uproot doesn't do any plotting or histogram manipulation, so the most useful methods of `h` begin with "to": `to_boost` (boost-histogram), `to_hist` (hist), `to_numpy` (NumPy's 2-tuple of contents and edges), `to_pyroot` (PyROOT), etc.

In [None]:
h.to_hist().plot();

Uproot histograms also satisfy the [UHI plotting protocol](https://uhi.readthedocs.io/en/latest/plotting.html), so they have methods like `values` (bin contents), `variances` (errors squared), and `axes`.

In [None]:
h.values()

In [None]:
h.variances()

In [None]:
list(h.axes[0])  # "x", "y", "z" or 0, 1, 2

## Reading a TTree

A TTree represents a potentially large dataset. Getting it from the [uproot.ReadOnlyDirectory](https://uproot.readthedocs.io/en/latest/uproot.reading.ReadOnlyDirectory.html) only returns its TBranch names and types. The `show` method is a convenient way to list its contents:

In [None]:
t = file["T"]
t.show()

Be aware that you can get the same information from `keys` (an [uproot.TTree](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TTree.TTree.html) is dict-like), `typename`, and `interpretation`.

In [None]:
t.keys()

In [None]:
t["event/fNtrack"], t["event/fNtrack"].typename, t["event/fNtrack"].interpretation

(If an [uproot.TBranch](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TBranch.TBranch.html) has no `interpretation`, it can't be read by Uproot.)

The most direct way to read data from an [uproot.TBranch](https://uproot.readthedocs.io/en/latest/uproot.behaviors.TBranch.TBranch.html) is by calling its `array` method.

In [None]:
t["event/fNtrack"].array()

We'll consider other methods in the next lesson.

## Reading a... what is that?

This file also contains an instance of type [TProcessID](https://root.cern.ch/doc/master/classTProcessID.html). These aren't typically useful in data analysis, but Uproot manages to read it anyway because it follows certain conventions (it has "class streamers"). It's presented as a generic object with an `all_members` property for its data members (through all superclasses).

In [None]:
file["ProcessID0"]

In [None]:
file["ProcessID0"].all_members

Here's a more useful example of that: a supernova search with the IceCube experiment has custom classes for its data, which Uproot reads and represents as objects with `all_members`.

In [None]:
icecube = uproot.open(skhep_testdata.data_path("uproot-issue283.root"))
icecube.classnames()

In [None]:
icecube["config/detector"].all_members

In [None]:
icecube["config/detector"].all_members["ChannelIDMap"]

# Writing data to a file

Uproot's ability to *write* data is more limited than its ability to *read* data, but some useful cases are possible.

## Opening files for writing

First of all, a file must be opened for writing, either by creating a completely new file or updating an existing one.

In [None]:
output1 = uproot.recreate("completely-new-file.root")

```python
output2 = uproot.update("existing-file.root")
```

(Uproot cannot write over a network; output files must be local.)

## Writing strings and histograms

These [uproot.WritableDirectory](https://uproot.readthedocs.io/en/latest/uproot.writing.writable.WritableDirectory.html) objects have a dict-like interface: you can put data in them by assigning to square brackets.

In [None]:
output1["some_string"] = "This will be a TObjString."

output1["some_histogram"] = file["hstat"]

import numpy as np

output1["nested_directory/another_histogram"] = np.histogram(
    np.random.normal(0, 1, 1000000)
)

In ROOT, the name of an object is a property of the object, but in Uproot, it's a key in the TDirectory that holds the object, so that's why the name is on the left-hand side of the assignment, in square brackets. Only the data types listed in the blue box [in the documentation](https://uproot.readthedocs.io/en/latest/basic.html#writing-objects-to-a-file) are supported: mostly just histograms.

## Writing TTrees

TTrees are potentially large and might not fit in memory. Generally, you'll need to write them in batches.

One way to do this is to assign the first batch and `extend` it with subsequent batches:

In [None]:
import numpy as np

output1["tree1"] = {
    "x": np.random.randint(0, 10, 1000000),
    "y": np.random.normal(0, 1, 1000000),
}
output1["tree1"].extend(
    {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
)
output1["tree1"].extend(
    {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
)

another is to create an empty TTree with [uproot.WritableDirectory.mktree](https://uproot.readthedocs.io/en/latest/uproot.writing.writable.WritableDirectory.html#mktree), so that every write is an extension.

In [None]:
output1.mktree("tree2", {"x": np.int32, "y": np.float64})
output1["tree2"].extend(
    {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
)
output1["tree2"].extend(
    {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
)
output1["tree2"].extend(
    {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
)

Performance tips are given in the next lesson, but in general, it pays to write few large batches, rather than many small batches.

The only data types that can be assigned or passed to `extend` are listed in the blue box [in this documentation](https://uproot.readthedocs.io/en/latest/basic.html#writing-ttrees-to-a-file). This includes jagged arrays (described in the lesson after next), but not more complex types.

# Reading and writing RNTuples

TTree has been the default format to store large datasets in ROOT files for decades. However, it has slowly become outdated and is not optimized for modern systems. This is where the RNTuple format comes in. It is a modern serialization format that is designed with modern systems in mind and is planned to replace TTree in the coming years. [Version 1.0.0.0](https://cds.cern.ch/record/2923186) is out and will be supported "forever".

RNTuples are much simpler than TTrees by design, and this time there is an official specification, which makes it much easier for third-party I/O packages like Uproot to support. Uproot already supports reading the full RNTuple specification, meaning that you can read any RNTuple you find in the wild. It also supports writing a large part of the specification, and intends to support as much as it makes sense for data analysis.

To ease the transition into RNTuples, we are designing the interface to match the one for TTrees as closely as possible. Let's look at a simple example for reading and writing RNTuples.

Again, we'll use a sample file from the

In [None]:
filename = skhep_testdata.data_path("test_stl_containers_rntuple_v1-0-0-0.root")

file = uproot.open(filename)

This time, if we print the class names, we see that there is an RNTuple instead of a TTree.

In [None]:
file.classnames()

Let's look at the available keys with `.keys`, but restrict to only keys at the top level by using `recursive=False`.

In [None]:
rntuple = file["ntuple"]
rntuple.keys(recursive=False)

We can show the structure of the RNTuple more clearly by using `.show`, which works in a similar way to TTrees, but it is more complete and show the structure better since RNTuples are fully readable by Uproot.

In [None]:
rntuple.show()

Reading data into arrays works in the exact same way as for TTrees, so you don't have to worry to distinguish when you are reading a TTree or an RNTuple.

In [None]:
data = rntuple.arrays()
data

Writing again works in a very similar way to TTrees. However, since TTrees are still the default format used in more places, writing something like `file[key] = data` will default to writing the data as a TTree (although it will give us a warning that this will change in the near future). When we want to write an RNTuple, we need to specifically tell Uproot that we want to do so. We can input the data in the same way as for TTrees.

In [None]:
data = {"my_int_data": [1, 2, 3], "my_float_data": [1.0, 2.0, 3.0]}
more_data = {"my_int_data": [4, 5, 6], "my_float_data": [4.0, 5.0, 6.0]}

output3 = uproot.recreate("new-file-with-rntuple.root")

rntuple = output3.mkrntuple("my_rntuple", data)
rntuple.extend(more_data)

For the rest of the tutorial we will mostly stick to using TTrees since this is still the main data format that you'll encounter in the near future.

`````{tip}
# Exercise 1 (10-15 minutes)

There is a file in `skhep_testdata` named `ntpl001_staff_rntuple_v1-0-0-0.root` that contains CERN staff data from 1988. As the name suggests, it is an RNTuple and not a TTree.

1.  Open it with Uproot, look around at what's in there, and then find the number of French employees who were at least 35 years old, and had one or two children. For bonus points, use `with uproot.open(...) as f:` instead of `f = uproot.open(...)` to follow best practices.
2.  With the selection from the previous part, make a histogram of the employee grade with `np.histogram(np.array(data))`, and save it to a new ROOT file. (You might need to wrap the data with `np.array` due to a bug I found while writing this, but it might be fixed by the time you read this.)
3.  Read back the file you just made. Open the histogram, use `to_hist()` to convert it to a `hist` histogram, and then plot it with `.plot()`.
4.  If you are in a training event, right-click on the image, then click on "Create New View for Cell Output", then right-click on the image in the new view, then "Copy Image", and paste it as a reply in the Slack channel.

````{note}
:class: dropdown
## Solution (no peeking!)


```python
import skhep_testdata
import uproot
import numpy as np

with uproot.open(skhep_testdata.data_path("ntpl001_staff_rntuple_v1-0-0-0.root")) as file:
    staff = file["Staff"]
    staff_age = staff["Age"].array()
    staff_nation = staff["Nation"].array()
    staff_children = staff["Children"].array()
    staff_grade = staff["Grade"].array()

cut = ((staff_nation == "FR") 
    & (staff_age >= 35) 
    & (1 <= staff_children) 
    & (2 >= staff_children))

n = len(staff_age[cut])

n = np.sum(cut) # A simpler alternative is to count the number of True values in cut

print(f"The number of employees with the selected criteria is {n}")

with uproot.recreate("my_file.root") as file:
    file["my_hist"] = np.histogram(np.array(staff_grade[cut]))

with uproot.open("my_file.root") as file:
    h = file["my_hist"].to_hist()

h.plot();
```
````
`````