# Torcharrow mutability

Tocharrow columns and dataframe support row and column **append only** mutability. Append only models provide the big benefit of a declarative semantics for views of data, while still allowing for efficient imperative updates.

We show the consequence of this model first for columns, then we explore mutability of dataframes.
 
 ## Mutability of Columns: Owners and Views 
 
 Here is a simple column `x`:

In [1]:
import torcharrow as ta

x = ta.Column([0,1,2,3])
x

0  0
1  1
2  2
3  3
dtype: int64, length: 4, null_count: 0

We say `x` is the *owner* of the data since it created/updated the data last.

Any selection of this column (here `y`) produces a *view*. Most selections can be done in constant time:

In [2]:
y = x[:4]
y

0  0
1  1
2  2
3  3
dtype: int64, length: 4, null_count: 0

When we update the owner `x` ...


In [3]:
x.append(4)
x

0  0
1  1
2  2
3  3
4  4
dtype: int64, length: 5, null_count: 0

... the view `y` is unimpacted:

In [4]:
y

0  0
1  1
2  2
3  3
dtype: int64, length: 4, null_count: 0

When we try to update the view, we get ...

In [5]:
try: y.append(99)
except AttributeError as err: print(err)

column is not appendable


... since the view did not copy the data. However if we copy `y` first, `y` becomes the owner of the copied data.


In [6]:
y= y.copy()
y.append(44)
y

0   0
1   1
2   2
3   3
4  44
dtype: int64, length: 5, null_count: 4

The original owner `x` has been unimpacted from the change to the copied view. So it can be extended without constraints.

In [7]:
x.append(55)
x

0   0
1   1
2   2
3   3
4   4
5  55
dtype: int64, length: 6, null_count: 0

Note that `copy` duplicates only the minimal amount of data needed to establish the receiver of the copy as the new owner of the copied data.

## Mutability for Dataframes: Sharing of Columns accross Dataframes

A dataframe can be composed of owned columns and views. Columns can always be added to a dataframe as long as they have the same size ...

In [8]:
df = ta.DataFrame()
df['str'] = ['a','b','c']
df['int'] = [0,1,2]
df

  index  str      int
-------  -----  -----
      0  a          0
      1  b          1
      2  c          2
dtype: Struct([Field('str', string), Field('int', int64)]), count: 3, null_count: 0

... and as long as you don't override an existing column; otherwise you will get:


In [9]:
try: df['int'] = [100,101,102]
except AttributeError as err: print(err)

cannot override existing column int


\[Remark: You can use the dataframes `assign` method to update a column, but this is not discussed here.\]

Any selection creates a new dataframe that shares the underlying data, here `ef` has a view on `dfs `'str'` column:

In [10]:
ef = df[['str']]
ef

  index  str
-------  -----
      0  a
      1  b
      2  c
dtype: Struct([Field('str', string)]), count: 3, null_count: 0

We can append a new column to df or ef, as long as they are of the same length:

In [11]:
df['float'] = [1.0, 2.0, 3.0]
ef['bool'] = [True, False, True]
ef

  index  str    bool
-------  -----  ------
      0  a      True
      1  b      False
      2  c      True
dtype: Struct([Field('str', string), Field('bool', boolean)]), count: 3, null_count: 0

But as soon as one of dataframes performs an append (here `df`), it becomes the owner of its columns. Other references like `ef` become viewers of the shared columns.

In [12]:
df.append(('d',33,55.0))
df

  index  str      int    float
-------  -----  -----  -------
      0  a          0        1
      1  b          1        2
      2  c          2        3
      3  d         33       55
dtype: Struct([Field('str', string), Field('int', int64), Field('float', float64)]), count: 4, null_count: 0

Views are never impacted by appending to owners, so `ef` is still the same:

In [13]:
ef

  index  str    bool
-------  -----  ------
      0  a      True
      1  b      False
      2  c      True
dtype: Struct([Field('str', string), Field('bool', boolean)]), count: 3, null_count: 0

But adding to `ef` fails, since `ef` has only a view on the `'str'` column. 

In [14]:
try: ef.append(('e',True))
except AttributeError as err: print(err)

column is not appendable


We can fix this by first copying the data, which makes the receiver the owner of the columns. Once you own all columns you can append. 

In [15]:
ef = ef.copy()
ef.append(('e',True))
ef

  index  str      bool
-------  -----  ------
      0  a           1
      1  b           0
      2  c           1
      3  e           1
dtype: Struct([Field('str', string), Field('bool', boolean)]), count: 4, null_count: 3

## Summary 

Torcharrow allows that
 - you can always append columns to a dataframe, provided they have the same length and the columns have a new name.
 - you can append a row to a column or dataframe, provided you own the column or set of of columns of the dataframe.

We say
 - you own a column or set of columns of a dataframe, if you can read the columns or dataframes last row.
 - you can read the last row if you have created or written it or you have a view that includes it.
 - you can always become the owner of a column or dataframe by a copying it first. 

While this sounds complicated it is not: In fact few programs will ever require a copy! 

So torcharrow's append-only model gives us the best of both worlds: declarative programming with imperative efficiency!