# Fine Data Manipulation with Nested-Pandas

This tutorial will briefly showcase how one would perform data manipulation operations from `pandas`, like adding columns, replacing values, etc. with `nested-pandas`.

In [None]:
import nested_pandas as npd
from nested_pandas.datasets import generate_data

# Begin by generating an example dataset
ndf = generate_data(5, 20, seed=1)
ndf

In [None]:
# Show one of the nested dataframes
ndf.iloc[0].nested

## Nested Column Selection

First, we can directly fetch a column from our nested column (aptly called "nested"). For example, below we can fetch the time column, "t", by specifying `"nested.t"` as the column to retrieve. This returns a "flat" view of the nested `t` column, where all rows from all dataframes are present in one dataframe.

In [None]:
# Directly Nested Column Selection
ndf["nested.t"]

The advantage of the flat view being that this is easily manipulatable just as any `pandas.Series` object. 

In [None]:
ndf["nested.t"] + 100

## Adding or Replacing Nested Columns

> *A Note on Performance: These operations involve full reconstruction of the nested columns so expect impacted performance when doing this at scale. It may be appropriate to do these operations within reduce functions directly (e.g. subtracting a value from a column) if performance is key.*

We can use the "base_column.nested_sub_column" syntax to also perform operations that add new columns or replace existing columns for a nested column. For example, we can directly replace the "band" column with a new column that appends an additional string to the values.

In [None]:
# prepend lsst_ to the band column

ndf["nested.band"] = "lsst_" + ndf["nested.band"]

ndf["nested.band"]

Next, we can create a new column in the "nested" column. For example, we can subtract a value from each time value and return it as a new column.

In [None]:
# create a new "corrected_t" column in "nested"

ndf["nested.corrected_t"] = ndf["nested.t"] - 5

ndf["nested.corrected_t"]

In [None]:
# Show the first dataframe again
ndf.iloc[0].nested

## Adding New Nested Structures

Finally, we can also add entirely new nested structures using the above syntax.

In [None]:
ndf["bands.band_label"] = ndf["nested.band"]
ndf

This is functionally equivalent to using `add_nested`:

In [None]:
ndf.add_nested(ndf["nested.band"].to_frame(), "bands_from_add_nested")

In addition to assigning individual nested columns, we can use the above syntax to nest an entire flat dataframe.

As an example, we can flatten our existing "nested" frame and use the `[]` syntax to assign it as an additional nested frame.

In [None]:
# Create a flat dataframe from our existing nested dataframe
flat_df = ndf["nested"].nest.to_flat()

# Nest our flat dataframe back into our original dataframe
ndf["example"] = flat_df
ndf

The above again being shorthand for the following `add_nested` call:

In [None]:
ndf.add_nested(flat_df, "example_from_add_nested")

## Embedding "base" column into nested column

We can also assign some "base" (non-nested) column to a nested column, which will be broadcasted to all nested dataframes with the values being repeated.

In [None]:
ndf["nested.a"] = ndf["a"]
ndf["nested.a"]

Or we can do some operations over the base columns first:

In [None]:
ndf["nested.ab"] = ndf["a"] + ndf["b"] * 2
ndf["nested.ab"]

## Combining Nested Structures

There may be cases where you would want to combine two nested structures into a single nested structure. There are multiple ways to do this, but by far the most direct path is through direct assignment, first let's set up a toy example:

In [None]:
# Setup a toy dataframe with two nested columns
list_nf = npd.NestedFrame(
    {
        "a": ["cat", "dog", "bird"],
        "b": [1, 2, 3],
        "c": [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
        "d": [[10, 20, 30], [40, 50, 60], [70, 80, 90]],
    }
)

list_nf = list_nf.nest_lists(["c"], "c")
list_nf = list_nf.nest_lists(["d"], "d")
list_nf


Given the nested structures, "c" and "d", they can be combined directly as shown below. Note that this requires "c" and "d" to be compatible, which means that the shapes of the inner dataframes should be aligned for every row of your top-level nestedframe.

In [None]:
# Combine "c" and "d"
list_nf["nested"] = list_nf[["c", "d"]]
list_nf = list_nf.drop(columns=["c", "d"])  # drop the original columns
list_nf