# API Changes in StaticFrame 0.8


## StaticFrame follows ZeroVer (for now)

* Pre 1.0 Semantic Versioning
* https://0ver.org/
* Backward incompatibilities introduced between "major" releases
    * from 0.6 to 0.7
    * from 0.7 to 0.8
* Within a "major" release we overwhelmingly maintain backwards compatibility
* SF 1.0 is coming
    * API is nearly complete
    * ArrayKit: C extensions for SF performance

## Keeping up with StaticFrame

* All major features and fixes are on "micro" releases
* Backward-incompatible releases (0.7 to 0.8) tend to focus just on API changes
* Release Notes: https://github.com/InvestmentSystems/static-frame/releases
* Code: https://github.com/InvestmentSystems/static-frame/commits/master

## Overview of API Changes
* `Frame.sort_values()`
* `Index.iter_label().apply()`
* `Frame.iter_tuple()` and `Frame.iter_tuple_items()`
* `iter_array`, `iter_series`, and `iter_tuple` now require `axis` as a kwarg
* `Frame.from_element_loc_items()` renamed `Frame.from_element_items`

## API Change

* `Frame.sort_values()`
* Changed name of first positional argument from `key` to `label`
    * If using positional arguments, no change necessary
    * `key` is now used for pre-sort function application
* Changed selection of multiple rows/columns
    * If selecting multiple labels, they must be in a list
    * Previously, either a list or a tuple would select multiple columns
    * Previously, no way to select a single column that had a tuple as a label
* Tuples always refer to a single label
    * Lists are not hashable, and cannot be labels
    * Lists are always used for selecting multiple labels
    * Consistent with all other selection interfaces: `__getitem__`, `iloc`, `loc`

In [14]:
# Get a potentially problematic Frame
import frame_fixtures as ff
f = ff.parse('s(4,3)|v(float,float,int)').to_frame_go()
f = f.relabel(columns=('x', 'y', ('x', 'y')))
f

Unnamed: 0,x,y,"('x', 'y')"
0,1930.4,-610.8,-3648
1,-1760.34,3243.94,91301
2,1857.34,-823.14,30205
3,1699.34,114.58,54020


In [15]:
# What does this sort?
f.sort_values(['x', 'y'])

Unnamed: 0,x,y,"('x', 'y')"
1,-1760.34,3243.94,91301
3,1699.34,114.58,54020
2,1857.34,-823.14,30205
0,1930.4,-610.8,-3648


In [16]:
# This is not the same selection as ['x', 'y']
f.sort_values(('x', 'y'))

Unnamed: 0,x,y,"('x', 'y')"
0,1930.4,-610.8,-3648
2,1857.34,-823.14,30205
3,1699.34,114.58,54020
1,-1760.34,3243.94,91301


## What to Do
* Find cases of `f.sort_values(('x', 'y'))` and replace them with `f.sort_values(['x', 'y'])`
* Without change will very likely fail fast



## API Change
* `Index.iter_label().apply()`
* Formerly returned a `Series` with an auto-incremented integer index.
    * Index on returned Series was useless
    * Prior to 0.7 other operations on `Index` returned `Series`
* Now returns a `np.ndarray`.
* Consistent with general move since 0.7 to have operations on `Index` always return `np.ndarray`. 
* Useful for `key` based sorting
    * `key` functions must return either the same container called from (`Index`, in this case) or `np.ndarray`

In [17]:
# A sample Frame
f = ff.parse('s(4,3)|v(float,float,int)|c(I,str)').to_frame_go()
f

Unnamed: 0,zZbu,ztsv,zUvW
0,1930.4,-610.8,-3648
1,-1760.34,3243.94,91301
2,1857.34,-823.14,30205
3,1699.34,114.58,54020


In [18]:
# Function application returns an array
f.columns.iter_label().apply(lambda label: label.upper())

array(['ZZBU', 'ZTSV', 'ZUVW'], dtype='<U4')

In [19]:
# Returning an array is consistent with other methods
f.columns.via_str.upper()

array(['ZZBU', 'ZTSV', 'ZUVW'], dtype='<U4')

In [20]:
# Returning an array is consistent with other operators
f.columns * 2

array(['zZbuzZbu', 'ztsvztsv', 'zUvWzUvW'], dtype='<U8')

In [21]:
# Returning an array is required when providing a key function
# Here, we sort columns by the lowered second character
f.sort_columns(key=lambda c: c.iter_label().apply(lambda label: label[1].lower()))

Unnamed: 0,ztsv,zUvW,zZbu
0,-610.8,-3648,1930.4
1,3243.94,91301,-1760.34
2,-823.14,30205,1857.34
3,114.58,54020,1699.34


## What to Do
* Find cases of `iter_label().apply().values` and replace them with `iter_label().apply()`
* Without change will likely fail fast


## API Change
* `Frame.iter_tuple()` and `Frame.iter_tuple_items()`
* Previously tried to give you a `NamedTuple`
    * `NamedTuple` fields have to be valid identifiers
    * If `NamedTuple` not possible  would automatically fall back on `tuple`
* Too nice (as Pandas)
* Introduced `constructor` argument back in 0.7.8
* Now `constructor=tuple` is required if a `NamedTuple` is not possible

In [22]:
# A sample Frame with problematic labels
f = ff.parse('s(4,3)|v(float,int,bool)').relabel(columns=("a a", "*", "3"))
f

Unnamed: 0,a a,*,3
0,1930.4,162197,True
1,-1760.34,-41157,False
2,1857.34,5729,False
3,1699.34,-168387,True


In [23]:
# Iterating tuples now fails without a constructor
for t in f.iter_tuple(): print(t)

ValueError: invalid fields for namedtuple; pass `tuple` as constructor

In [24]:
# Providing the constructor restores previous behavior
for t in f.iter_tuple(constructor=tuple): print(t)

(1930.4, -1760.34, 1857.34, 1699.34)
(162197, -41157, 5729, -168387)
(True, False, False, True)


In [25]:
# The constructor argument is good for many things
for t in f.iter_tuple(constructor=set): print(t)

{-1760.34, 1857.34, 1930.4, 1699.34}
{5729, -168387, -41157, 162197}
{False, True}


In [26]:
# We can even supply a custom NamedTuple
from collections import namedtuple
for t in f.iter_tuple(constructor=namedtuple('A', tuple('wxyz'))): print(t)

A(w=1930.4, x=-1760.34, y=1857.34, z=1699.34)
A(w=162197, x=-41157, y=5729, z=-168387)
A(w=True, x=False, y=False, z=True)


## What to Do
* Find cases of `iter_tuple()` and `iter_tuple_items()` that now fail
* Supply `constructor=tuple` to restore previous behavior
* Withou change will always fail


## Additional API Changes
* A few smaller changes
* `iter_array`, `iter_series`, and `iter_tuple` now require `axis` as a kwarg
* `Frame.from_element_loc_items()` renamed `Frame.from_element_items`