In [2]:
%run nb_helpers.py

from datar.all import *

debug_kwargs = {'prefix': '\n', 'sep': f'\n{"-" * 20}\n'}
nb_header(
    cut, diff, identity, expandgrid, outer, 
    make_names, make_unique, rank,
)

### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ cut</div>

##### Divides the range of x into intervals and codes the values in x
according to which interval they fall. The leftmost interval corresponds  
to level one, the next leftmost to level two and so on.  

##### Args:
&emsp;&emsp;`x`: a numeric vector which is to be converted to a factor by cutting.  
&emsp;&emsp;`breaks`: either a numeric vector of two or more unique cut points or  
&emsp;&emsp;&emsp;&emsp;a single number (greater than or equal to 2) giving the number of  
&emsp;&emsp;&emsp;&emsp;intervals into which x is to be cut.  

&emsp;&emsp;`labels`: labels for the levels of the resulting category. By default,  
&emsp;&emsp;&emsp;&emsp;labels are constructed using "(a,b]" interval notation.  
&emsp;&emsp;&emsp;&emsp;If labels = False, simple integer codes are returned instead  
&emsp;&emsp;&emsp;&emsp;of a factor.  

&emsp;&emsp;`include_lowest`: bool, indicating if an ‘x[i]` equal to the lowest  
&emsp;&emsp;&emsp;&emsp;(or highest, for right = FALSE) ‘breaks’ value should be included.  

&emsp;&emsp;`right`: bool, indicating if the intervals should be closed on the right  
&emsp;&emsp;&emsp;&emsp;(and open on the left) or vice versa.  

&emsp;&emsp;precision:integer which is used when labels are not given. It determines  
&emsp;&emsp;&emsp;&emsp;the precision used in formatting the break numbers. Note, this  
&emsp;&emsp;&emsp;&emsp;argument is different from R's API, which is dig.lab.  
&emsp;&emsp;`ordered_result`: bool, should the result be an ordered categorical?  

##### Returns:
&emsp;&emsp;A categorical object with the cuts  


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ diff</div>

##### Calculates suitably lagged and iterated differences.

If the data is a vector of length n and differences = 1, then the computed  
result is equal to the successive differences  
`x[lag:] – x[:-lag]`.  

##### Args:
&emsp;&emsp;`x`: The data  
&emsp;&emsp;`lag`: The lag to use. Could be negative.  
&emsp;&emsp;&emsp;&emsp;It always calculates `x[lag:] - x[:-lag]` even when `lag` is negative  

&emsp;&emsp;`differences`: The order of the difference  

##### Returns:
&emsp;&emsp;An array of `x[lag:] – x[:-lag]`.  
&emsp;&emsp;If `differences > 1`, the rule applies `differences` times on `x`  


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ identity</div>

##### Return whatever passed in

Expression objects are evaluated using parent context  


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ expandgrid</div>

##### Expand all combinations into a dataframe. R's `expand.grid()`


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ outer</div>

##### Compute the outer product of two vectors.

##### Args:
&emsp;&emsp;`x`: The first vector  
&emsp;&emsp;`y`: The second vector  
&emsp;&emsp;`fun`: The function to handle how the result of the elements from  
&emsp;&emsp;&emsp;&emsp;the first and second vectors should be computed.  
&emsp;&emsp;&emsp;&emsp;The function has to be vectorized at the second argument, and  
&emsp;&emsp;&emsp;&emsp;return the same shape as y.  

##### Returns:
&emsp;&emsp;The data frame of the outer product of x and y  


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ make_names</div>

##### Make names available as columns and can be accessed by `df.<name>`

The names will be transformed using `python-slugify` with  
`lowercase=False` and `separator="_"`. When the first character is  
a digit, preface it with "_".  

If `unique` is True, the results will be fed into  
`datar.core.names.repair_names(names, "unique")`  

##### Args:
&emsp;&emsp;`names`: The names  
&emsp;&emsp;&emsp;&emsp;if it is scalar, will make it into a list.  
&emsp;&emsp;&emsp;&emsp;Then all elements will be converted into strings  

&emsp;&emsp;`unique`: Whether to make the names unique  

##### Returns:
&emsp;&emsp;Converted names  


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ make_unique</div>

##### Make the names unique.

It's a shortcut for `make_names(names, unique=True)`  

##### Args:
&emsp;&emsp;`names`: The names  
&emsp;&emsp;&emsp;&emsp;if it is scalar, will make it into a list.  
&emsp;&emsp;&emsp;&emsp;Then all elements will be converted into strings  

##### Returns:
&emsp;&emsp;Converted names  


### <div style="background-color: #EEE; padding: 5px 0 8px 0">★ rank</div>

##### Returns the sample ranks of the values in a vector.

##### Args:
&emsp;&emsp;`x`: A numeric vector  
&emsp;&emsp;`na_last`: for controlling the treatment of `NA`s.  If `True`, missing  
&emsp;&emsp;&emsp;&emsp;values in the data are put last; if `False`, they are put  
&emsp;&emsp;&emsp;&emsp;first; if `"keep"` they are kept  with rank `NA`.  

&emsp;&emsp;`ties_method`: a character string specifying how ties are treated  
&emsp;&emsp;&emsp;&emsp;One of `average`, `first`, `dense`, `max`, and `min`  
&emsp;&emsp;&emsp;&emsp;Note that the ties_method candidates are different than the ones  
&emsp;&emsp;&emsp;&emsp;from R, because we are using `pandas.Series.rank()`. See  
&emsp;&emsp;&emsp;&emsp;https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html  

##### Returns:
&emsp;&emsp;A numeric rank vector of the same length as `x`  


In [3]:
debug(
    cut(seq(1,10), 3), 
    diff([1, 2, 3]),
    identity(1.23),
    expandgrid([1,2], [3,4]),
    outer([1,2], [3,4]),
    make_names([1, 2, 3]),
    make_unique([1, 1, 1]),
    rank([3, 4, 1, -1]),
    **debug_kwargs
)




cut(seq(1,10), 3)
--------------------
[(0.99, 4.0], (0.99, 4.0], (0.99, 4.0], (0.99, 4.0], (4.0, 7.0], (4.0, 7.0], (4.0, 7.0], (7.0, 10.0], (7.0, 10.0], (7.0, 10.0]]
Categories (3, interval[float64, right]): [(0.99, 4.0] < (4.0, 7.0] < (7.0, 10.0]]

diff([1, 2, 3])
--------------------
array([1, 1])

identity(1.23)
--------------------
1.23

expandgrid([1,2], [3,4])
--------------------
   [1, 2]  [3, 4]
  <int64> <int64>
0       1       3
1       1       4
2       2       3
3       2       4

outer([1,2], [3,4])
--------------------
        0       1
  <int64> <int64>
0       3       4
1       6       8

make_names([1, 2, 3])
--------------------
['_1', '_2', '_3']

make_unique([1, 1, 1])
--------------------
['__0', '__1', '__2']

rank([3, 4, 1, -1])
--------------------
array([3., 4., 2., 1.])
