# Working with collections and object selections

RDataFrame reads collections as the special type [ROOT::RVec](https://root.cern/doc/master/classROOT_1_1VecOps_1_1RVec.html) (e.g. a branch containing an array of floating point numbers can be read as a `ROOT::RVec<float>`). C-style arrays (with variable or static size), `std::vectors` and most other collection types can be read this way. When reading ROOT data, column values of type `ROOT::RVec<T>` perform no copy of the underlying array.

`RVec` is a container similar to `std::vector` (and can be used just like a `std::vector`) but it also offers a rich interface to operate on the array elements in a vectorised fashion, similarly to Python's NumPy arrays.

In [None]:
import ROOT

treename = "myDataset"
filename = "https://github.com/root-project/root/raw/master/tutorials/dataframe/df017_vecOpsHEP.root"
df = ROOT.RDataFrame(treename, filename)

print(f"Columns in the dataset: {df.GetColumnNames()}")

To quickly inspect the data we can export it as a dictionary of `numpy` arrays thanks to the `AsNumpy` RDataFrame method. Note that for each row, `E` is an array of values:

In [None]:
npy_dict = df.AsNumpy(["E"])

for row, vec in enumerate(npy_dict["E"]):
    print(f"\nRow {row} contains:\n{vec}")

### Define a new column with operations on RVecs

In [None]:
df1 = df.Define("good_pt", "sqrt(px*px + py*py)[E>100]")

`sqrt(px*px + py*py)[E>100]`:
* `px`, `py` and `E` are columns the elements of which are `RVec`s
* Operations on `RVec`s like sum, product, sqrt preserve the dimensionality of the array
* `[E>100]` selects the elements of the array that satisfy the condition
* `E > 100`: boolean expressions on `RVec`s such as `E > 100` return a mask, that is an array with information on which values pass the selection (e.g. `[0, 1, 0, 0]` if only the second element satisfies the condition)

### Now we can plot the newly defined column values in a histogram

In [None]:
c = ROOT.TCanvas()
h = df1.Histo1D(("pt", "pt", 16, 0, 4), "good_pt")
h.Draw()
c.Draw()