# Vector - RedAmber

This notebook walks through [Vector.md of RedAmber](https://github.com/heronshoes/red_amber/blob/main/doc/Vector.md).

In [1]:
require 'red_amber' # require 'red-amber' is also OK.
include RedAmber
{RedAmber: VERSION, Arrow: Arrow::VERSION}

{:RedAmber=>"0.3.0", :Arrow=>"10.0.0"}

# Vector

Class `RedAmber::Vector` represents a series of data in the DataFrame.

## Constructor

### Create from a column in a DataFrame

In [2]:
df = DataFrame.new(x: [1, 2, 3])
df[:x]

#<RedAmber::Vector(:uint8, size=3):0x000000000000f640>
[1, 2, 3]


### New from an Array

In [3]:
vector = Vector.new([1, 2, 3])
# or
vector = Vector.new(1, 2, 3)
# or
vector = Vector.new(1..3)
# or
vector = Vector.new(Arrow::Array.new([1, 2, 3]))

#<RedAmber::Vector(:uint8, size=3):0x000000000000f654>
[1, 2, 3]


In [4]:
# or
require 'arrow-numo-narray'
vector = Vector.new(Numo::Int8[1, 2, 3])

#<RedAmber::Vector(:int8, size=3):0x000000000000f780>
[1, 2, 3]


## Properties

### `to_s`

### `values`, `to_a`, `entries`

### `indices`, `indexes`, `indeces`

  Return indices in an Array.

### `to_ary`

  It implicitly converts a Vector to an Array when required.

In [5]:
[1, 2] + Vector.new([3, 4])

[1, 2, 3, 4]

### `size`, `length`, `n_rows`, `nrow`

### `empty?`

### `type`

### `boolean?`, `numeric?`, `string?`, `temporal?`

### `type_class`

### `each`, `map`, `collect`

  If block is not given, returns Enumerator.

### `n_nils`, `n_nans`

  - `n_nulls` is an alias of `n_nils`

### `has_nil?`

  Returns `true` if self has any `nil`. Otherwise returns `false`.

### `inspect(limit: 80)`

  - `limit` sets size limit to display a long array.

In [6]:
vector = Vector.new((1..50).to_a)

#<RedAmber::Vector(:uint8, size=50):0x000000000000f794>
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ... ]


## Selecting Values

### `take(indices)`, `[](indices)`

- Acceptable class for indices:
  - Integer, Float
  - Vector of integer or float
  - Arrow::Arry of integer or float
- Negative index is also OK like the Ruby's primitive Array.

In [11]:
array = Vector.new(%w[A B C D E])
indices = Vector.new([0.1, 1, -1])
array.take(indices)
# or
array[indices]

["A", "B", "E"]

### `filter(booleans)`, `select(booleans)`, `[](booleans)`

- Acceptable class for booleans:
  - An array of true, false, or nil
  - Boolean Vector
  - Arrow::BooleanArray

In [12]:
array = Vector.new(%w[A B C D E])
booleans = [true, false, nil, false, true]
array.filter(booleans)

#<RedAmber::Vector(:string, size=2):0x000000000000f7a8>
["A", "E"]


In [13]:
# or
array[booleans]

["A", "E"]

`filter` and `select` also accepts a block.

## Functions

### Unary aggregations: `vector.func => scalar`

  ![unary aggregation](https://github.com/heronshoes/red_amber/raw/main/doc/image/vector/unary_aggregation_w_option.png)

| Method    |Boolean|Numeric|String|Options|Remarks|
| ----------- | --- | --- | --- | --- | --- |
| ✓ `all?`     |  ✓  |     |     | ✓ ScalarAggregate| alias `all` |
| ✓ `any?`     |  ✓  |     |     | ✓ ScalarAggregate| alias `any` |
| ✓ `approximate_median`|  |✓|  | ✓ ScalarAggregate| alias `median`|
| ✓ `count`   |  ✓  |  ✓  |  ✓  | ✓  Count  |     |
| ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓  Count  |alias `count_uniq`|
|[ ]`index`   | [ ] | [ ] | [ ] |[ ] Index  |     |
| ✓ `max`     |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
| ✓ `mean`    |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
| ✓ `min`     |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
| ✓ `min_max` |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
|[ ]`mode`    |     | [ ] |     |[ ] Mode    |     |
| ✓ `product` |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
| ✓ `quantile`|     |  ✓  |     | ✓ Quantile|Specify probability in (0..1) by a parameter (default=0.5)|
| ✓ `sd    `  |     |  ✓  |     |          |ddof: 1 at `stddev`|
| ✓ `stddev`  |     |  ✓  |     | ✓ Variance|ddof: 0 by default|
| ✓ `sum`     |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
|[ ]`tdigest` |     | [ ] |     |[ ] TDigest |     |
| ✓ `var    `|     |  ✓  |     |   |ddof: 1 at `variance`<br>alias `unbiased_variance`|
| ✓ `variance`|     |  ✓  |     | ✓ Variance|ddof: 0 by default|

Options can be used as follows.
See the [document of C++ function](https://arrow.apache.org/docs/cpp/compute.html) for detail.

In [10]:
double = Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])

#<RedAmber::Vector(:double, size=6):0x000000000000f26c>
[1.0, NaN, -Infinity, Infinity, nil, 0.0]


In [11]:
double.count

5

In [12]:
double.count(mode: :only_valid) # default

5

In [13]:
double.count(mode: :only_null)

1

In [14]:
double.count(mode: :all)

6

In [15]:
boolean = Vector.new([true, true, nil])

#<RedAmber::Vector(:boolean, size=3):0x000000000000f280>
[true, true, nil]


In [16]:
boolean.all

true

In [17]:
boolean.all(skip_nulls: true)

true

In [18]:
boolean.all(skip_nulls: false)

false

### Unary element-wise: `vector.func => vector`

  ![unary element-wise](https://github.com/heronshoes/red_amber/raw/main/doc/image/vector/unary_element_wise.png)

| Method    |Boolean|Numeric|String|Options|Remarks|
| ------------ | --- | --- | --- | --- | ----- |
| ✓ `-@`       |     |  ✓  |     |     |as `-vector`|
| ✓ `negate`   |     |  ✓  |     |     |`-@`   |
| ✓ `abs`      |     |  ✓  |     |     |       |
| ✓ `acos`     |     |  ✓  |     |     |       |
| ✓ `asin`     |     |  ✓  |     |     |       |
| ✓ `atan`     |     |  ✓  |     |     |       |
| ✓ `bit_wise_not`|  | (✓) |     |     |integer only|
| ✓ `ceil`     |     |  ✓  |     |     |       |
| ✓ `cos`      |     |  ✓  |     |     |       |
| ✓`fill_nil_backward`| ✓ | ✓ | ✓ |    |       |
| ✓`fill_nil_forward` | ✓ | ✓ | ✓ |    |       |
| ✓ `floor`    |     |  ✓  |     |     |       |
| ✓ `invert`   |  ✓  |     |     |     |`!`, alias `not`|
| ✓ `ln`       |     |  ✓  |     |     |       |
| ✓ `log10`    |     |  ✓  |     |     |       |
| ✓ `log1p`    |     |  ✓  |     |     |Compute natural log of (1+x)|
| ✓ `log2`     |     |  ✓  |     |     |       |
| ✓ `round`    |     |  ✓  |     | ✓ Round (:mode, :n_digits)|    |
| ✓ `round_to_multiple`| | ✓ |   | ✓ RoundToMultiple :mode, :multiple| multiple must be an Arrow::Scalar|
| ✓ `sign`     |     |  ✓  |     |     |       |
| ✓ `sin`      |     |  ✓  |     |     |       |
| ✓`sort_indexes`| ✓  | ✓  | ✓  |:order|alias `sort_indices`|
| ✓ `tan`      |     |  ✓  |     |     |       |
| ✓ `trunc`    |     |  ✓  |     |     |       |

Examples of options for `#round`;

- `:n-digits` The number of digits to show.
- `round_mode` Specify rounding mode.

In [19]:
double = RedAmber::Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])

#<RedAmber::Vector(:double, size=5):0x000000000000f294>
[15.15, 2.5, 3.5, -4.5, -5.5]


In [20]:
double.round

#<RedAmber::Vector(:double, size=5):0x000000000000f2a8>
[15.0, 2.0, 4.0, -4.0, -6.0]


In [21]:
double.round(mode: :half_to_even)

#<RedAmber::Vector(:double, size=5):0x000000000000f2bc>
[15.0, 2.0, 4.0, -4.0, -6.0]


In [22]:
double.round(mode: :towards_infinity)

#<RedAmber::Vector(:double, size=5):0x000000000000f2d0>
[16.0, 3.0, 4.0, -5.0, -6.0]


In [23]:
double.round(mode: :half_up)

#<RedAmber::Vector(:double, size=5):0x000000000000f2e4>
[15.0, 3.0, 4.0, -4.0, -5.0]


In [24]:
double.round(mode: :half_towards_zero)

#<RedAmber::Vector(:double, size=5):0x000000000000f2f8>
[15.0, 2.0, 3.0, -4.0, -5.0]


In [25]:
double.round(mode: :half_towards_infinity)

#<RedAmber::Vector(:double, size=5):0x000000000000f30c>
[15.0, 3.0, 4.0, -5.0, -6.0]


In [26]:
double.round(mode: :half_to_odd)

#<RedAmber::Vector(:double, size=5):0x000000000000f320>
[15.0, 3.0, 3.0, -5.0, -5.0]


In [27]:
double.round(n_digits: 0)

#<RedAmber::Vector(:double, size=5):0x000000000000f334>
[15.0, 2.0, 4.0, -4.0, -6.0]


In [28]:
double.round(n_digits: 1)

#<RedAmber::Vector(:double, size=5):0x000000000000f348>
[15.2, 2.5, 3.5, -4.5, -5.5]


In [29]:
double.round(n_digits: -1)

#<RedAmber::Vector(:double, size=5):0x000000000000f35c>
[20.0, 0.0, 0.0, -0.0, -10.0]


### Binary element-wise: `vector.func(vector) => vector`

  ![binary element-wise](https://github.com/heronshoes/red_amber/raw/main/doc/image/vector/binary_element_wise.png)

| Method       |Boolean|Numeric|String|Options|Remarks|
| ----------------- | --- | --- | --- | --- | ----- |
| ✓ `add`           |     |  ✓  |     |     | `+`   |
| ✓ `atan2`         |     |  ✓  |     |     |       |
| ✓ `and_kleene`    |  ✓  |     |     |     | `&`   |
| ✓ `and_org   `    |  ✓  |     |     |     |`and` in Red Arrow|
| ✓ `and_not`       |  ✓  |     |     |     |       |
| ✓ `and_not_kleene`|  ✓  |     |     |     |       |
| ✓ `bit_wise_and`  |     | (✓) |     |     |integer only|
| ✓ `bit_wise_or`   |     | (✓) |     |     |integer only|
| ✓ `bit_wise_xor`  |     | (✓) |     |     |integer only|
| ✓ `divide`        |     |  ✓  |     |     | `/`   |
| ✓ `equal`         |  ✓  |  ✓  |  ✓  |     |`==`, alias `eq`|
| ✓ `greater`       |  ✓  |  ✓  |  ✓  |     |`>`, alias `gt`|
| ✓ `greater_equal` |  ✓  |  ✓  |  ✓  |     |`>=`, alias `ge`|
| ✓ `is_finite`     |     |  ✓  |     |     |       |
| ✓ `is_inf`        |     |  ✓  |     |     |       |
| ✓ `is_na`         |  ✓  |  ✓  |  ✓  |     |       |
| ✓ `is_nan`        |     |  ✓  |     |     |       |
|[ ]`is_nil`        |  ✓  |  ✓  |  ✓  |[ ] Null|alias `is_null`|
| ✓ `is_valid`      |  ✓  |  ✓  |  ✓  |     |       |
| ✓ `less`          |  ✓  |  ✓  |  ✓  |     |`<`, alias `lt`|
| ✓ `less_equal`    |  ✓  |  ✓  |  ✓  |     |`<=`, alias `le`|
| ✓ `logb`          |     |  ✓  |     |     |logb(b) Compute base `b` logarithm|
|[ ]`mod`           |     | [ ] |     |     | `%`   |
| ✓ `multiply`      |     |  ✓  |     |     | `*`   |
| ✓ `not_equal`     |  ✓  |  ✓  |  ✓  |     |`!=`, alias `ne`|
| ✓ `or_kleene`     |  ✓  |     |     |     | `\|`  |
| ✓ `or_org`        |  ✓  |     |     |     |`or` in Red Arrow|
| ✓ `power`         |     |  ✓  |     |     | `**`  |
| ✓ `subtract`      |     |  ✓  |     |     | `-`   |
| ✓ `shift_left`    |     | (✓) |     |     |`<<`, integer only|
| ✓ `shift_right`   |     | (✓) |     |     |`>>`, integer only|
| ✓ `xor`           |  ✓  |     |     |     | `^`   |

### `uniq`

  Returns a new array with distinct elements.

### `tally` and `value_counts`

  Compute counts of unique elements and return a Hash.

  It returns almost same result as Ruby's tally. These methods consider NaNs are same.

In [30]:
array = [0.0/0, Float::NAN]
array.tally

{NaN=>1, NaN=>1}

In [31]:
vector = RedAmber::Vector.new(array)
vector.tally

{NaN=>2}

In [32]:
vector.value_counts

{NaN=>2}

### `index(element)`

  Returns index of specified element.

### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`

  Returns quantiles for specified probabilities in a DataFrame.

### `sort_indexes`, `sort_indices`, `array_sort_indices`

## Coerce

In [33]:
vector = Vector.new(1,2,3)

#<RedAmber::Vector(:uint8, size=3):0x000000000000f370>
[1, 2, 3]


In [34]:
# Vector's `#*` method
vector * -1

#<RedAmber::Vector(:int16, size=3):0x000000000000f384>
[-1, -2, -3]


In [35]:
# coerced calculation
-1 * vector

#<RedAmber::Vector(:int16, size=3):0x000000000000f398>
[-1, -2, -3]


In [36]:
# `@-` operator
-vector

#<RedAmber::Vector(:uint8, size=3):0x000000000000f3ac>
[255, 254, 253]


## Update vector's value
### `replace(specifier, replacer)` => vector

- Accepts Scalar, Range  of Integer, Vector, Array, Arrow::Array as a specifier
- Accepts Scalar, Vector, Array and Arrow::Array as a replacer.
- Boolean specifiers specify the position of replacer in true.
  - If booleans.any is false, no replacement happen and return self.
- Index specifiers specify the position of replacer in indices.
- replacer specifies the values to be replaced.
  - The number of true in booleans must be equal to the length of replacer

In [37]:
vector = Vector.new([1, 2, 3])
booleans = [true, false, true]
replacer = [4, 5]
vector.replace(booleans, replacer)

#<RedAmber::Vector(:uint8, size=3):0x000000000000f3c0>
[4, 2, 5]


- Scalar value in replacer can be broadcasted.

In [38]:
replacer = 0
vector.replace(booleans, replacer)

#<RedAmber::Vector(:uint8, size=3):0x000000000000f3d4>
[0, 2, 0]


- Returned data type is automatically up-casted by replacer.

In [39]:
replacer = 1.0
vector.replace(booleans, replacer)

#<RedAmber::Vector(:double, size=3):0x000000000000f3e8>
[1.0, 2.0, 1.0]


- Position of nil in booleans is replaced with nil.

In [40]:
booleans = [true, false, nil]
replacer = -1
vector.replace(booleans, replacer)

#<RedAmber::Vector(:int8, size=3):0x000000000000f3fc>
[-1, 2, nil]


- replacer can have nil in it.

In [41]:
booleans = [true, false, true]
replacer = nil
vector.replace(booleans, replacer)

#<RedAmber::Vector(:uint8, size=3):0x000000000000f410>
[nil, 2, nil]


- An example to replace 'NA' to nil.

In [42]:
vector = RedAmber::Vector.new(['A', 'B', 'NA'])
vector.replace(vector == 'NA', nil)

#<RedAmber::Vector(:string, size=3):0x000000000000f424>
["A", "B", nil]


- Specifier in indices.

Specified indices are used 'as sorted'. Position in indices and replacer may not have correspondence.

In [43]:
vector = RedAmber::Vector.new([1, 2, 3])
indices = [2, 1]
replacer = [4, 5]
vector.replace(indices, replacer)

#<RedAmber::Vector(:uint8, size=3):0x000000000000f438>
[1, 4, 5]


### `fill_nil_forward`, `fill_nil_backward` => vector

Propagate the last valid observation forward (or backward).
Or preserve nil if all previous values are nil or at the end.

In [44]:
integer = RedAmber::Vector.new([0, 1, nil, 3, nil])
integer.fill_nil_forward

#<RedAmber::Vector(:uint8, size=5):0x000000000000f44c>
[0, 1, 1, 3, 3]


In [45]:
integer.fill_nil_backward

#<RedAmber::Vector(:uint8, size=5):0x000000000000f460>
[0, 1, 3, 3, nil]


### `boolean_vector.if_else(true_choice, false_choice)` => vector

Choose values based on self. Self must be a boolean Vector.

`true_choice`, `false_choice` must be of the same type scalar / array / Vector.
`nil` values in `cond` will be promoted to the output.

This example will normalize negative indices to positive ones.

In [46]:
indices = Vector.new([1, -1, 3, -4])
array_size = 10
normalized_indices = (indices < 0).if_else(indices + array_size, indices)

#<RedAmber::Vector(:int16, size=4):0x000000000000f474>
[1, 9, 3, 6]


### `is_in(values)` => boolean vector

For each element in self, return true if it is found in given `values`, false otherwise.
By default, nulls are matched against the value set. (This will be changed in SetLookupOptions: not impremented.)

In [47]:
vector = Vector.new %W[A B C D]
values = ['A', 'C', 'X']
vector.is_in(values)

#<RedAmber::Vector(:boolean, size=4):0x000000000000f488>
[true, false, true, false]


`values` are casted to the same Class of Vector.

In [48]:
vector = Vector.new([1, 2, 255])
vector.is_in(1, -1)

#<RedAmber::Vector(:boolean, size=3):0x000000000000f49c>
[true, false, true]


### `shift(amount = 1, fill: nil)`

Shift vector's values by specified `amount`. Shifted space is filled by value `fill`.

In [49]:
vector = Vector.new([1, 2, 3, 4, 5])
vector.shift

#<RedAmber::Vector(:uint8, size=5):0x000000000000f4b0>
[nil, 2, 3, 4, 5]


In [50]:
vector.shift(-2)

#<RedAmber::Vector(:uint8, size=5):0x000000000000f4c4>
[1, 2, 3, 4, 5]


In [51]:
vector.shift(fill: Float::NAN)

#<RedAmber::Vector(:double, size=5):0x000000000000f4d8>
[NaN, 2.0, 3.0, 4.0, 5.0]


### `split_to_columns(sep = ' ', limit = 0)`

Split string type Vector with any ASCII whitespace as separator.
Returns an Array of Vectors.

In [2]:
vector = Vector.new(['a b', 'c d', 'e f'])
vector.split_to_columns

[#<RedAmber::Vector(:string, size=3):0x000000000000f640>
["a", "c", "e"]
, #<RedAmber::Vector(:string, size=3):0x000000000000f654>
["b", "d", "f"]
]

It will be used for column splitting in DataFrame.

In [3]:
df = DataFrame.new(year_month: %w[2022-01 2022-02 2022-03])
  .assign(:year, :month) { year_month.split_to_columns('-') }
  .drop(:year_month)

year,month
2022,1
2022,2
2022,3


### `split_to_rows(sep = ' ', limit = 0)`

Split string type Vector with any ASCII whitespace as separator.
Returns an flattend into rows by Vector.

In [4]:
vector = Vector.new(['a b', 'c d', 'e f'])
vector.split_to_rows

#<RedAmber::Vector(:string, size=6):0x000000000000f67c>
["a", "b", "c", "d", "e", "f"]


### `merge(other, sep: ' ')`

Merge String or other string Vector to self using aseparator.
Self must be a string Vector.
Returns merged string Vector.

In [5]:
# with vector
vector = Vector.new(%w[a c e])
other = Vector.new(%w[b d f])
vector.merge(other)

#<RedAmber::Vector(:string, size=3):0x000000000000f690>
["a b", "c d", "e f"]


If other is a String it will be broadcasted.

In [6]:
# with vector
vector = Vector.new(%w[a c e])

#<RedAmber::Vector(:string, size=3):0x000000000000f6a4>
["a", "c", "e"]


You can specify separator string by :sep.

In [7]:
# with vector
vector = Vector.new(%w[a c e])
other = Vector.new(%w[b d f])
vector.merge(other, sep: '')

#<RedAmber::Vector(:string, size=3):0x000000000000f6b8>
["ab", "cd", "ef"]
