# Quickstart

With a valid Python environment, nested-pandas and it's dependencies are easy to install using the `pip` package manager. The following command can be used to install it:

In [None]:
# % pip install nested-pandas

Nested-Pandas is tailored towards efficient analysis of nested datasets. Let's load a toy dataset to show how it works.

In [None]:
from nested_pandas.datasets import generate_data

# generate_data creates some toy data
nf = generate_data(10, 100)  # 10 rows, 100 nested rows per row
nf

The above dataframe is a `NestedFrame`, which extends the capabilities of the Pandas `DataFrame` to support columns with nested information. In this example, we have the top level dataframe with 10 rows and 2 typical columns, "a" and "b". The "nested" column contains a dataframe in each row. We can inspect the contents of the "nested" column using pandas API tooling like `loc`.

In [None]:
nf.loc[0]["nested"]

Here we see that within the "nested" column there are `NestedFrame` objects with their own data. In this case we have 3 columns ("t", "flux", and "band"). Alternatively, we could inspect the available columns using some custom properties of the `NestedFrame`.

In [None]:
# Shows which columns have nested data
nf.nested_columns

In [None]:
# Provides a dictionary of "base" (top-level) and nested column labels
nf.all_columns

nested-pandas extends the Pandas API, meaning any operation you could do in Pandas is available within nested-pandas. However, nested-pandas has additional functionality and tooling to better support working with nested datasets. For example, let's look at `query`:

In [None]:
# Normal queries work as expected, rejecting rows from the dataframe that don't meet the criteria
nf.query("a > 0.2")

The above query is native Pandas, however with nested-pandas we can use hierarchical column names to extend `query` to nested layers.

In [None]:
# Applies the query to "nested", filtering based on "t >17"
nf_g = nf.query("nested.t > 17.0")
nf_g

This query does not affect the rows of the top-level dataframe, but rather applies the query to the "nested" dataframes. If we look at one of them, we can see the effect of the query.

In [None]:
# All t <= 17.0 have been removed
nf_g.loc[0]["nested"]

A limited set of functions have been extended in this way so far, with the aim being to fully support this hierarchical access where applicable in the Pandas API.

Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to Pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in "nested":

In [None]:
import numpy as np

# use hierarchical column names to access the flux column
# passed as an array to np.mean
nf.reduce(np.mean, "nested.flux")

This can be used to apply any custom functions you need for your analysis, and just to illustrate that point further let's define a custom function that just returns it's inputs.

In [None]:
def show_inputs(*args):
    return args

Applying some inputs via reduce, we see how it sends inputs to a given function.

In [None]:
nf_inputs = nf.reduce(show_inputs, "a", "nested.band")
nf_inputs

In [None]:
nf_inputs.loc[0]