# StaticFrame 0.8.14 to 0.8.30


## Keeping up with StaticFrame

* All major features and fixes are on "micro" releases (i.e.: 0.8.30 to 0.8.31)
* Backward-incompatible releases (0.8 to 0.9) tend to focus just on API changes
* Release Notes: https://github.com/InvestmentSystems/static-frame/releases
* Code: https://github.com/InvestmentSystems/static-frame/commits/master

### Tech Forum Presentation 12/16: "A Reintroduction to StaticFrame"

## StaticFrame 0.9 Coming Soon

Backward-incompatible changes

* Dissallow creating `Index` with `datetime64` arrays; require using `datetime64` `Index` types.
* Change `Bus.__init__()` to no longer accept a Series but take internally managed components.
* `Batch` no longer automatically promotes container types, must explicitly use `via_container`.
* `*.from_overlay()` will be renamed `*.from_overlay_na()` to permit adding a `*.from_overlay_falsy()`.

## Overview of New Features

### New features
* `*.rank_*` methods.
* Support for `falsy` in many places where we have `na`.
* New functionality on ``via_dt`` interfaces.
* String slicing with ``*.via_str[]``
* Added ``Yarn``
* Unpersisting on `Bus`, `Yarn`, `Quilt`.
* NPZ, NPY formats for `Frame`, `Bus`, `Yarn`, `Quilt`, and `Batch`. 
* `Quilt.sample()`.
* `IndexHiearchy.relabel_at_depth()`.
 
### Extension to existing features
* `sort_*` methods now take `ascending` as an iterable of Booleans.
* `*.count()` methods now have `skipfalsy` and `unique` parameters.
* `*.equals()` methods distringuish between ``datetime64`` unit.
* Added `dtypes` parameter to `Frame.from_pandas()`
* `Frame.iter_group()` and `Frame.iter_group_items()` take a `drop` parameter.
* `Frame.unset_index` works with `IndexHierarchy`
* Improved `__repr__` for `Quilt`
* Added `index_constructors` and `columns_constructors` to many interfaces.

### Performance Enhancements
* Improvements to `Bus` iteration when ``max_persist`` is greater than one.
* Enhancements to `Bus` internal architecture.
* `Bus` now uses weakrefs to avoid re-loading `Frame` already in-memory.
* `Frame.iter_group()` and `Frame.iter_group_items()`
* `Frame.pivot()`
* `Frame.from_concat()`
* `Frame.to_pandas()` creates ``pd.RangeIndex`` for ``IndexAutoFactory``-created indices.

### Better Errors:
* Incorrectly formed ``Batch`` iterables raise ``BatchIterableInvalid``

### Linux (or WSL) only
* Support for `VisiData` via `Frame.to_visidata()`, `Bus.to_visidata()`

### Advanced Features
* `FrameGO.via_fill_value()` works with `__setitem__()`
* `IndexAutoFactory` takes a `size` parameter.
* `IndexDefaultFactory` permits specifying ``name`` of default index constructor.


## The Five Ways to Rank

Full implementation of all ranking methods after `scipy.stats.rankdata`.

Each feature `skipna`, `ascending`, `start`, and `fill_value` parameters.

All of Pandas `na_option` functionality (and more) can be handled by `skpina` and `fill_value` 

Start defaults to 0.

On `Frame`, `Series`:

* `*.rank_ordinal`
* `*.rank_dense`
* `*.rank_mean`
* `*.rank_min`
* `*.rank_max`

In [1]:
from itertools import chain
from IPython.display import display, Markdown, Latex
import numpy as np
import static_frame as sf

s = sf.Series((0, 0, 1), index=('a', 'b', 'c'), name='src')
methods = ('rank_ordinal', 'rank_dense', 'rank_mean', 'rank_min', 'rank_max')
sf.Frame.from_concat(chain((s,), (getattr(s, m)().rename(m) for m in methods)), axis=1)
              

Unnamed: 0,src,rank_ordinal,rank_dense,rank_mean,rank_min,rank_max
a,0,0,0,0.5,0,1
b,0,1,0,0.5,0,1
c,1,2,1,2.0,2,2


In [2]:
s = sf.Series((20, 3, np.nan, 5, np.nan), index=tuple('abcde'))
params = (dict(skipna=False), dict(skipna=True), dict(skipna=True, fill_value=-1))
sf.Frame.from_concat((s.rank_mean(**p).rename(str(p)) for p in params), axis=1)


Unnamed: 0,{'skipna': False},{'skipna': True},"{'skipna': True, 'fill_value': -1}"
a,2.0,2.0,2.0
b,0.0,0.0,0.0
c,3.0,,-1.0
d,1.0,1.0,1.0
e,4.0,,-1.0


## Na Meet Falsy

The family of `*na*` functions, which process `None`, `np.nan`, `np.nat` now have corresponding `*falsy*` functions, which process `None`, `np.nan`, `np.nat`, `0`, `False`, and `""`. 

> I know `np.nan` is not Falsy, but practicality beats purity

On `Frame`, `Series`:

* `*.isfalsy()`, 
* `*.notfalsy()`, 
* `*.dropfalsy()`, 
* `*.fillfalsy`.
* `*.fillfalsy_forward()`
* `*.fillfalsy_backward()`
* `*.fillfalsy_leading()`
* `*.fillfalsy_trailing()`



In [3]:
s = sf.Series(('foo', '', 'bar')).rename('src')
methods = ('isfalsy', 'notfalsy', 'dropfalsy')
sf.Frame.from_concat(chain((s,), (getattr(s, m)().rename(m) for m in methods)), axis=1)

Unnamed: 0,src,isfalsy,notfalsy,dropfalsy
0,foo,False,True,foo
1,,True,False,
2,bar,False,True,bar


In [4]:
from itertools import product
f = sf.Frame.from_records((['', '', ''], ['', 'x', ''], ['', '', '']))
display(Markdown('src'))
display(f)
methods = ('fillfalsy_forward', 'fillfalsy_backward')
for m, axis in product(methods, (0, 1)):
    display(Markdown(f'`{m}` axis {axis}'))
    display(getattr(f, m)(axis=axis))


src

Unnamed: 0,0,1,2
0,,,
1,,x,
2,,,


`fillfalsy_forward` axis 0

Unnamed: 0,0,1,2
0,,,
1,,x,
2,,x,


`fillfalsy_forward` axis 1

Unnamed: 0,0,1,2
0,,,
1,,x,x
2,,,


`fillfalsy_backward` axis 0

Unnamed: 0,0,1,2
0,,x,
1,,x,
2,,,


`fillfalsy_backward` axis 1

Unnamed: 0,0,1,2
0,,,
1,x,x,
2,,,


In [5]:
from itertools import product
f = sf.Frame.from_records((['', '', ''], ['', 'x', ''], ['', '', '']))
display(Markdown('src'))
display(f)
methods = ('fillfalsy_leading', 'fillfalsy_trailing')
for m, axis in product(methods, (0, 1)):
    display(Markdown(f'`{m}` axis {axis}'))
    display(getattr(f, m)('o', axis=axis))

src

Unnamed: 0,0,1,2
0,,,
1,,x,
2,,,


`fillfalsy_leading` axis 0

Unnamed: 0,0,1,2
0,o,o,o
1,o,x,o
2,o,,o


`fillfalsy_leading` axis 1

Unnamed: 0,0,1,2
0,o,o,o
1,o,x,
2,o,o,o


`fillfalsy_trailing` axis 0

Unnamed: 0,0,1,2
0,o,,o
1,o,x,o
2,o,o,o


`fillfalsy_trailing` axis 1

Unnamed: 0,0,1,2
0,o,o,o
1,,x,o
2,o,o,o


## More Features on `via_dt` Interfaces

All faster than using `datetime` objects.

### New attributes:
* ``hour``
* ``minute``
* ``second``

### New methods:
* ``is_month_start()``
* ``is_month_end()``
* ``is_year_start()``
* ``is_year_end()``
* ``is_quarter_start()``
* ``is_quarter_end()`` 
* ``quarter()``

In [6]:
index = sf.IndexDate.from_date_range('2020-12-30', '2021-01-01')
s = sf.Series(index, index=index).rename('src')
methods = ('is_month_start', 
           'is_month_end', 
           'is_year_start', 
           'is_year_end', 
           'is_quarter_start', 
           'is_quarter_end', 
           'quarter')
sf.Frame.from_concat(
        chain((s,), (getattr(s.via_dt, m)().rename(m) for m in methods)), 
        axis=1,
        )

Unnamed: 0,src,is_month_start,is_month_end,is_year_start,is_year_end,is_quarter_start,is_quarter_end,quarter
2020-12-30,2020-12-30,False,False,False,False,False,False,4
2020-12-31,2020-12-31,False,True,False,True,False,True,4
2021-01-01,2021-01-01,True,False,True,False,True,False,1


## More Features on `via_str` Interfaces

Access characters via `__getitem__` selection and slices

Get the first character of all elements: `s.via_str[0]`

Get the last two characters of all elemetns: `s.via_str[-2:]`


In [7]:
s = sf.Series(('USD', 'AUD', 'JPY')).rename('src')
args = (0, -1, slice(-2, None))
sf.Frame.from_concat(chain((s,), (s.via_str[a].rename(str(a)) for a in args)), axis=1)

Unnamed: 0,src,0,-1,"slice(-2, None, None)"
0,USD,U,D,SD
1,AUD,A,D,UD
2,JPY,J,Y,PY


## NPY and NPZ

A new serialization format that fully captures all `Frame` characteristcis and `dtype` and is faster than Parquet.

NPZ is a zip archive; NPY is the same files in an directory for memory mapping.

The `to_npz()` and `to_npy()` interfaces offer a `consolidate_blocks` parameter.

### Frame interfaces

* `Frame.to_npz()`
* `Frame.from_npz()`
* `Frame.to_npy()`
* `Frame.from_npy()`
* `Frame.from_npy_mmap()` (0.8.31)

### Bus interfaces
* ``Bus.to_zip_npz()``
* ``Bus.from_zip_npz()``
* ``Quilt.to_zip_npz()``
* ``Quilt.from_zip_npz()``
* ``Batch.to_zip_npz()``
* ``Batch.from_zip_npz()``
* ``Yarn.to_zip_npz()``



## Unpersisting

Force "forgetting" all loaded `Frame`, regardless of `max_persist` configuration.


### Interfaces
* `Bus.unpersist()`
* `Yarn.unpersist()`
* `Quilt.unpersist()`



## Index Constructors Everywhere

Index constructor arguments are now available in (hopefully) all places where needed. This is necessary for the 0.9 change disallowing `datetime64` in normal `Index`.

### Intefaces now with `index_constructor`, and possibly `columns_constructor`:

* `*.from_concat_items()`
* `apply`
* `apply_pool`
* `map_any`
* `map_fill`
* `map_all`

### Intefaces now with `index_constructors`, and possibly `columns_constructors`

Note that interfaces that hae `*_depth` arguments use `*_constructors` arguments, not `_constructor` arguments to permit specifying per-depth `Index` types. Single constructor arguments are permitted.

* ``StoreConfig``
* ``Frame.from_sql()``
* ``Frame.from_structured_array()``
* ``Frame.from_delimited()``
* ``Frame.from_csv()``
* ``Frame.from_clipboard``
* ``Frame.from_tsv()``
* ``Frame.from_xlsx()``
* ``Frame.from_sqlite()``
* ``Frame.from_hdf5()``
* ``Frame.from_arrrow()``
* ``Frame.from_parquet()``

## Stringing `Bus` together with `Yarn`

A container of `Bus` that permits assigning an arbitrary `Index` over the virtual concatenation of all contained `Bus`.

Each `Bus` retains its lazy-loading and (optionally) `max_persist` characteristics.




In [8]:
import frame_fixtures as ff


b1 = sf.Bus.from_frames((ff.parse('s(4,4)').rename('a'), ff.parse('s(4,4)').rename('b')))
b1

<Bus>
<Index>
a       Frame
b       Frame
<<U1>   <object>