# StaticFrame 0.8.14 to 0.8.30

* 0.8.14: 2021-06-14
* 0.8.30: 2021-11-30


## Keeping up with StaticFrame

* All major features and fixes are on "micro" releases (i.e.: 0.8.30 to 0.8.31)
* Backward-incompatible releases (0.8 to 0.9) tend to focus just on API changes
* Release Notes: https://github.com/InvestmentSystems/static-frame/releases
* Code: https://github.com/InvestmentSystems/static-frame/commits/master

## What is New

* New features
* Extension to existing features
* Performance enhancements
* Better errors
* Linux (or WSL) features
* Advanced features 


## New Features

* `*.rank_*` methods.
* Support for `falsy` in many places where we have `na`.
* New functionality on ``via_dt`` interfaces.
* String slicing with ``*.via_str[]``
* Added ``Yarn``
* Unpersisting on `Bus`, `Yarn`, `Quilt`.
* NPZ, NPY formats for `Frame`, `Bus`, `Yarn`, `Quilt`, and `Batch`. 
* `Quilt.sample()`.
* `IndexHiearchy.relabel_at_depth()`.
 

## Extensions to Existing Features

* `sort_*` methods now take `ascending` as an iterable of Booleans.
* `*.count()` methods now have `skipfalsy` and `unique` parameters.
* `*.equals()` methods distringuish between ``datetime64`` unit.
* Added `dtypes` parameter to `Frame.from_pandas()`
* `Frame.iter_group()` and `Frame.iter_group_items()` take a `drop` parameter.
* `Frame.unset_index` works with `IndexHierarchy`
* Improved `__repr__` for `Quilt`
* Added `index_constructors` and `columns_constructors` to many interfaces.

## Performance Enhancements

* Improvements to `Bus` iteration when ``max_persist`` is greater than one.
* Enhancements to `Bus` internal architecture.
* `Bus` now uses weakrefs to avoid re-loading `Frame` already in-memory.
* `Frame.iter_group()` and `Frame.iter_group_items()`
* `Frame.pivot()`
* `Frame.from_concat()`
* `Frame.to_pandas()` creates ``pd.RangeIndex`` for ``IndexAutoFactory``-created indices.

### Better Errors:
* Incorrectly formed ``Batch`` iterables raise ``BatchIterableInvalid``

### Linux (or WSL) only
* Support for VisiData via `Frame.to_visidata()`, `Bus.to_visidata()`

### Advanced Features
* `IndexAutoFactory` takes a `size` parameter.
* `FrameGO.via_fill_value()` works with `__setitem__()`
* `IndexDefaultFactory` can specify ``name`` of index built with a default constructor.

## The Five Ways to Rank

Full implementation of all ranking methods after `scipy.stats.rankdata`.

Each feature `skipna`, `ascending`, `start`, and `fill_value` parameters.

All of Pandas `na_option` functionality (and more) can be handled by `skpina` and `fill_value` 

Start defaults to 0.

On `Frame`, `Series`:

* `*.rank_ordinal`
* `*.rank_dense`
* `*.rank_mean`
* `*.rank_min`
* `*.rank_max`

In [1]:
from itertools import chain
from IPython.display import display, Markdown, Latex
import numpy as np
import static_frame as sf

s = sf.Series((0, 0, 1), index=('a', 'b', 'c'), name='src')
methods = ('rank_ordinal', 'rank_dense', 'rank_mean', 'rank_min', 'rank_max')
sf.Frame.from_concat(chain((s,), (getattr(s, m)().rename(m) for m in methods)), axis=1)
              

Unnamed: 0,src,rank_ordinal,rank_dense,rank_mean,rank_min,rank_max
a,0,0,0,0.5,0,1
b,0,1,0,0.5,0,1
c,1,2,1,2.0,2,2


In [2]:
s = sf.Series((20, 3, np.nan, 5, np.nan), index=tuple('abcde'))
params = (dict(skipna=False), dict(skipna=True), dict(skipna=True, fill_value=-1))
sf.Frame.from_concat((s.rank_mean(**p).rename(str(p)) for p in params), axis=1)


Unnamed: 0,{'skipna': False},{'skipna': True},"{'skipna': True, 'fill_value': -1}"
a,2.0,2.0,2.0
b,0.0,0.0,0.0
c,3.0,,-1.0
d,1.0,1.0,1.0
e,4.0,,-1.0


## Na Meet Falsy

The family of `*na*` functions, which process `None`, `np.nan`, `np.nat` now have corresponding `*falsy*` functions, which process `None`, `np.nan`, `np.nat`, `0`, `False`, and `""`. 

On `Frame`, `Series`:

* `*.isfalsy()` 
* `*.notfalsy()` 
* `*.dropfalsy()` 
* `*.fillfalsy()`
* `*.fillfalsy_forward()`
* `*.fillfalsy_backward()`
* `*.fillfalsy_leading()`
* `*.fillfalsy_trailing()`



In [3]:
s = sf.Series(('foo', '', 'bar')).rename('src')
methods = ('isfalsy', 'notfalsy', 'dropfalsy')
sf.Frame.from_concat(chain((s,), (getattr(s, m)().rename(m) for m in methods)), axis=1)

Unnamed: 0,src,isfalsy,notfalsy,dropfalsy
0,foo,False,True,foo
1,,True,False,
2,bar,False,True,bar


In [4]:
f = sf.Frame.from_records((['', '', ''], ['', 'x', ''], ['', '', '']))
display(f)
methods = ('fillfalsy_forward', 'fillfalsy_backward')
for m in methods:
    display(getattr(f, m)(axis=0))


Unnamed: 0,0,1,2
0,,,
1,,x,
2,,,


Unnamed: 0,0,1,2
0,,,
1,,x,
2,,x,


Unnamed: 0,0,1,2
0,,x,
1,,x,
2,,,


In [5]:
f = sf.Frame.from_records((['', '', ''], ['', 'x', ''], ['', '', '']))
display(f)
methods = ('fillfalsy_forward', 'fillfalsy_backward')
for m in methods:
    display(getattr(f, m)(axis=1))


Unnamed: 0,0,1,2
0,,,
1,,x,
2,,,


Unnamed: 0,0,1,2
0,,,
1,,x,x
2,,,


Unnamed: 0,0,1,2
0,,,
1,x,x,
2,,,


In [6]:
f = sf.Frame.from_records((['', '', ''], ['', 'x', ''], ['', '', '']))
display(f)
methods = ('fillfalsy_leading', 'fillfalsy_trailing')
for m in methods:
    display(getattr(f, m)('o', axis=0))

Unnamed: 0,0,1,2
0,,,
1,,x,
2,,,


Unnamed: 0,0,1,2
0,o,o,o
1,o,x,o
2,o,,o


Unnamed: 0,0,1,2
0,o,,o
1,o,x,o
2,o,o,o


In [7]:
f = sf.Frame.from_records((['', '', ''], ['', 'x', ''], ['', '', '']))
display(f)
methods = ('fillfalsy_leading', 'fillfalsy_trailing')
for m in methods:
    display(getattr(f, m)('o', axis=1))

Unnamed: 0,0,1,2
0,,,
1,,x,
2,,,


Unnamed: 0,0,1,2
0,o,o,o
1,o,x,
2,o,o,o


Unnamed: 0,0,1,2
0,o,o,o
1,,x,o
2,o,o,o


## Lazily Stringing `Bus`s together with `Yarn`

The `Yarn` is a container of `Bus` that permits relabelling an arbitrary `Index` over the virtual concatenation of all contained `Bus`.

Each `Bus` retains its lazy-loading and (optionally) `max_persist` characteristics.




In [15]:
import frame_fixtures as ff

f = ff.parse('s(4,4)')
b1 = sf.Bus.from_frames((f.rename('a'), f.rename('b')), name='x')
b2 = sf.Bus.from_frames((f.rename('c'), f.rename('d')), name='y')
y1 = sf.Yarn.from_buses((b1, b2), retain_labels=True) 
y2 = y1.relabel(sf.IndexDate.from_date_range('2021-01-01', '2021-01-04')) 

display(y1)
display(y2)

<Yarn>
<IndexHierarchy>
x                a     Frame
x                b     Frame
y                c     Frame
y                d     Frame
<<U1>            <<U1> <object>

<Yarn>
<IndexDate>
2021-01-01      Frame
2021-01-02      Frame
2021-01-03      Frame
2021-01-04      Frame
<datetime64[D]> <object>

## More Features on `via_dt` Interfaces

All faster than using `datetime` objects.

* ``hour``
* ``minute``
* ``second``
* ``is_month_start()``
* ``is_month_end()``
* ``is_year_start()``
* ``is_year_end()``
* ``is_quarter_start()``
* ``is_quarter_end()`` 
* ``quarter()``

In [8]:
index = sf.IndexDate.from_date_range('2020-12-30', '2021-01-01')
s = sf.Series(index, index=index).rename('src')
methods = ('is_month_start', 
           'is_month_end', 
           'is_year_start', 
           'is_year_end', 
           'is_quarter_start', 
           'is_quarter_end', 
           'quarter')
sf.Frame.from_concat(
        chain((s,), (getattr(s.via_dt, m)().rename(m) for m in methods)), 
        axis=1,
        )

Unnamed: 0,src,is_month_start,is_month_end,is_year_start,is_year_end,is_quarter_start,is_quarter_end,quarter
2020-12-30,2020-12-30,False,False,False,False,False,False,4
2020-12-31,2020-12-31,False,True,False,True,False,True,4
2021-01-01,2021-01-01,True,False,True,False,True,False,1


## More Features on `via_str` Interfaces

Access characters via `__getitem__` selection and slices

Get the first character of all elements: `s.via_str[0]`

Get the last two characters of all elemetns: `s.via_str[-2:]`


In [9]:
s = sf.Series(('USD', 'AUD', 'JPY')).rename('src')
args = (0, -1, slice(-2, None))
sf.Frame.from_concat(chain((s,), (s.via_str[a].rename(str(a)) for a in args)), axis=1)

Unnamed: 0,src,0,-1,"slice(-2, None, None)"
0,USD,U,D,SD
1,AUD,A,D,UD
2,JPY,J,Y,PY


## NPY and NPZ

A new serialization format that fully captures all `Frame` characteristics, `dtype`s, and is faster than Parquet.

NPZ is a zip archive; NPY is the same files in a directory for memory mapping.

The `to_npz()` and `to_npy()` interfaces offer a `consolidate_blocks` parameter.

`Bus`, `Yarn`, `Quilt`, and `Batch` all support NPZ just as other formats.

* `Frame.to_npz()`
* `Frame.from_npz()`
* `Frame.to_npy()`
* `Frame.from_npy()`
* `Frame.from_npy_mmap()` (0.8.31)
* ``*.to_zip_npz()``
* ``*.from_zip_npz()``



## Unpersisting

Force "forgetting" all loaded `Frame`s, regardless of `max_persist` configuration.


### Interfaces
* `Bus.unpersist()`
* `Yarn.unpersist()`
* `Quilt.unpersist()`



## Exclude what You Group-By in Your Groups

Permit removing the values used in grouping from the group `Frame`s.

Simply pass `drop=True`.


In [11]:
f = ff.parse('s(4,3)|v(bool,str,int)').relabel(columns=tuple('abc')) 
for frame in chain((f,), f.iter_group('a', drop=True)):
    display(frame)

Unnamed: 0,a,b,c
0,False,zaji,-3648
1,False,zJnC,91301
2,False,zDdR,30205
3,True,zuVU,54020


Unnamed: 0,b,c
0,zaji,-3648
1,zJnC,91301
2,zDdR,30205


Unnamed: 0,b,c
3,zuVU,54020


## Using `IndexAutoFactory` to set initial `FrameGO` size

Previously, `IndexAutoFactory` could only be applied on an already-sized container.

Now, you can size a `FrameGO.index` with minal overhead.

Generally only useful for `FrameGO`.

In [12]:
f1 = sf.FrameGO(index=sf.IndexAutoFactory(size=4))
f1['a'] = None 
display(f1)

f2 = sf.FrameGO(index=sf.IndexAutoFactory(size=6))
f2['b'] = reversed(range(6)) 
display(f2)

Unnamed: 0,a
0,
1,
2,
3,


Unnamed: 0,b
0,5
1,4
2,3
3,2
4,1
5,0


## Using `via_fill_value` with `__setitem__()`

`via_*` intefaces present containers with a different context.

`via_T` permits operator application of a `Series` by column instead of row.

`via_fillvalue()` permits specifying fill value in the context of a binary operators.

With a `FrameGO`, `via_fillvalue()[]` can be used to provide a fill value in column assignment.

In [14]:
f = ff.parse('s(4,2)|v(bool,str)').to_frame_go() 

f['default'] = sf.Series.from_element('foo', index=range(3)) 
display(f)

f.via_fill_value('')['via_fill_value'] = sf.Series.from_element('foo', index=range(3))
display(f)

Unnamed: 0,0,1,default
0,False,zaji,foo
1,False,zJnC,foo
2,False,zDdR,foo
3,True,zuVU,


Unnamed: 0,0,1,default,via_fill_value
0,False,zaji,foo,foo
1,False,zJnC,foo,foo
2,False,zDdR,foo,foo
3,True,zuVU,,


## Performance

* Significant performance gains in implementations of grouping, pivoting, concatenating, and core  `TypeBlocks` routines.
* Improvements to `Bus` performance in time and space.
* Continued implementation of optimzed C routines through `arraykit`.

## StaticFrame 0.9 Coming Soon

Backward-incompatible changes

* Dissallow creating `Index` with `datetime64` arrays
    * Requires using `datetime64` `Index` subclasses.
    * Motivates new `index_constructors`, `columns_constructors` arguments.
* Change `Bus.__init__()` to no longer accept a Series but take internally managed components.
* `Batch` no longer automatically promotes container types, must explicitly use `via_container`.
* `*.from_overlay()` will be renamed `*.from_overlay_na()` to permit adding a `*.from_overlay_falsy()`.