# README - RedAmber

This notebook walks through [README of RedAmber](https://github.com/heronshoes/red_amber/blob/main/README.md).

![screenshot from jupyterlab](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/screenshot.png)

In [1]:
require 'red_amber' # require 'red-amber' is also OK.
include RedAmber
{RedAmber: VERSION, Arrow: Arrow::VERSION}

{:RedAmber=>"0.4.0", :Arrow=>"11.0.0"}

## Data frame in `RedAmber`

It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object. 

![dataframe model of RedAmber](https://github.com/heronshoes/red_amber/raw/main/doc/image/dataframe_model.png)

### Example: diamonds dataset

First do

```
    gem install red-datasets-arrow
```

If you did not install it yet. Then

In [2]:
require 'datasets-arrow' # to load sample data

dataset = Datasets::Diamonds.new
diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.

carat,cut,color,clarity,depth,table,price,x,y,z
0.23,Ideal,E,SI2,61.5,55,326,3.95,3.98,2.43
0.21,Premium,E,SI1,59.8,61,326,3.89,3.84,2.31
0.23,Good,E,VS1,56.9,65,327,4.05,4.07,2.31
0.29,Premium,I,VS2,62.4,58,334,4.2,4.23,2.63
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
0.7,Very Good,D,SI1,62.8,60,2757,5.66,5.68,3.56
0.86,Premium,H,SI2,61,58,2757,6.15,6.12,3.74
0.75,Ideal,D,SI2,62.2,55,2757,5.83,5.87,3.64


For example, we can compute mean prices per 'cut' for the data larger than 1 carat.

In [3]:
df = diamonds
  .slice { carat > 1 }
  .group(:cut)
  .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
  .sort('-mean(price)')

cut,mean(price)
Ideal,8674.23
Premium,8487.25
Very Good,8340.55
Good,7753.6
Fair,7177.86


Arrow data is immutable, so these methods always return new objects.
Next example will rename a column and create a new column by simple calcuration.

In [4]:
usdjpy = 110.0

df.rename('mean(price)': :mean_price_USD)
  .assign(:mean_price_JPY) { mean_price_USD * usdjpy }

cut,mean_price_USD,mean_price_JPY
Ideal,8674.23,954165
Premium,8487.25,933597
Very Good,8340.55,917460
Good,7753.6,852896
Fair,7177.86,789564


### Example: starwars dataset

Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.

In [5]:
uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')

starwars = DataFrame.load(uri)

starwars
  .drop(0) # delete unnecessary index column
  .remove { species == "NA" } # delete unnecessary rows
  .group(:species) { [count(:species), mean(:height, :mass)] }
  .slice { count > 1 }

species,count,mean(height),mean(mass)
Human,35,176.645,82.7818
Droid,6,131.2,69.75
Wookiee,2,231.0,124.0
Gungan,3,208.667,74.0
Zabrak,2,173.0,80.0
Twi'lek,2,179.0,55.0
Mirialan,2,168.0,53.1
Kaminoan,2,221.0,88.0


See [DataFrame](DataFrame.ipynb) for other examples and details.

- [raw Notebook file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/DataFrame.ipynb)
- [DataFrame.ipynb on binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=DataFrame.ipynb)

### `Vector` for 1D data object in column

Class `RedAmber::Vector` represents a series of data in the DataFrame.

See [Vector](Vector.ipynb) for details.

- [raw Notebook file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/Vector.ipynb)
- [Vector.ipynb on binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=Vector.ipynb)

## Another Jupyter notebook

[Examples of Red Amber](examples_of_red_amber.ipynb) shows more examples in jupyter notebook.

- [raw Notebook file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/examples_of_red_amber.ipynb)
- [Examples of Red Amber on binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)