# Torcharrow mutability

Tocharrow columns and dataframe support row and column **append only** mutability. Append only models provide the big benefit of a declarative semantics for views of data, while still allowing for efficient imperative updates.

We show the consequence of this model first for columns, then we explore mutability of dataframes i more deatils, we finally analyze the technical condition for mutability.
 
 ## Mutability of Columns: Owners and Views 
 
 Here is a simple column `x`:

In [None]:
import torcharrow as ta

x = ta.Column([0,1,2,3])
x

We say `x` is the *owner* of the data since it created/updated the data last.

Any selection of this column (here `y`) produces a *view*. Most selections can be done in constant time:

In [None]:
y = x[:4]
y

When we update the owner `x` ...


In [None]:
x.append(4)
x

... the view `y` is unimpacted:

In [None]:
y

When we try to update the view, we get ...

In [None]:
try: y.append(99)
except AttributeError as err: print(err)

... since the view did not copy the data. However if we copy `y` first, `y` becomes the owner of the copied data.


In [None]:
y= y.copy()
y.append(44)
y

The original owner `x` has been unimpacted from the change to the copied view. So it can be extended without constraints.

In [None]:
x.append(55)
x

## Mutability for Dataframes: Sharing of Columns accross Dataframes

A dataframe can be composed of owned columns and views. Columns can always be added to a dataframe as long as they have the same size ...

In [None]:
df = ta.DataFrame()
df['str'] = ['a','b','c']
df['int'] = [0,1,2]
df

... and as long as you don't override an existing column; otherwise you will get:


In [None]:
try: df['int'] = [100,101,102]
except AttributeError as err: print(err)

\[Remark: You can use the dataframes `assign` method to update a column, but this is not discussed here.\]

Any selection creates a new dataframe that shares the underlying data, here `ef` has a view on `dfs `'str'` column:

In [None]:
ef = df[['str']]
ef

We can append a new column to df or ef, as long as they are of the same length:

In [None]:
df['float'] = [1.0, 2.0, 3.0]
ef['bool'] = [True, False, True]
ef

But as soon as one of dataframes performs an append (here `df`), it becomes the owner of its columns, the others references like `ef` become viewers of the shared columns.

In [None]:
df.append(('d',33,55.0))
df

Views are never impacted by appending to owners, so `ef` is still the same:

In [None]:
ef

But adding to `ef` fails, since `ef` has only a view on the `'str'` column. 

In [None]:
try: ef.append(('e',True))
except AttributeError as err: print(err)

We can fix this by first copying the data, which makes the receiver the owner of the columns. Once you own all columns you can append. 

In [None]:
ef = ef.copy()
ef.append(('e',True))
ef

## Summary 

Torcharrow allows that
 - you can always append columns to a dataframe, provided they have the same length and the columns have a new name.
 - you can append a row to a column or dataframe, provided you own the column or set of of columns of the dataframe.

We say
 - you own a column or set of columns of a dataframe, if you can read the columns or dataframes last row.
 - you can read the last row if you have created or written it or you have a view that includes it.
 - you can always become the owner of a column or dataframe by a copying it first. 

While this sounds complicated it is not: In fact few programs will ever require a copy! 

So torcharrow's append-only model gives us the best of both worlds: declarative programming with imperative efficiency!