# このnotebookを開くには

```shell
git clone https://github.com/sciruby-jp/ruby-datascience-examples.git
cd ruby-datascience-examples
docker run -p 8888:8888 sciruby/ruby-datascience-examples start-notebook.sh --NotebookApp.token=''
```

Go to `http://localhost:8888`

# 赤ワインと白ワインのどっちがおいしい？

https://archive.ics.uci.edu/ml/datasets/wine
←このデータから、赤ワインの評価と白ワインの評価に差があるかを調べる

In [None]:
require 'daru'
require 'rbplotly'
require 'daru_plotly'
require 'statsample'

## csvを読み込む

In [None]:
wine = Daru::DataFrame.from_csv('./winequality-both.csv')
wine.head 10 # 最初の10行を表示

In [None]:
wine['type'].uniq.to_a

In [None]:
wine['quality'].uniq.to_a.sort

In [None]:
include Daru::Plotly::Initializer # plot, generate_data

In [None]:
plot(wine['quality'], type: :histogram, x: :quality).show

## 赤ワインと白ワインを分ける

In [None]:
reds = wine.where(wine['type'].eq('red'))
reds.head 10

In [None]:
whites = wine.where(wine['type'].eq('white'))
whites.head 10

## qualityのヒストグラムを書く

In [None]:
red_qualities = generate_data(reds['quality'], type: :histogram)
white_qualities = generate_data(whites['quality'], type: :histogram)
Plotly::Plot.new(data: red_qualities + white_qualities).show

In [None]:
plot(whites['quality'], type: :histogram).show

## t検定

赤ワインと白ワインの `quality` の平均値に差があるのか仮説検定する。

In [None]:
reds['quality'].mean

In [None]:
whites['quality'].mean

In [None]:
Statsample::Analysis.store(Statsample::Test::T) do
  t = Statsample::Test.t_two_samples_independent(reds['quality'], whites['quality'])
  summary t
end
Statsample::Analysis.run_batch

## アルコール度数に差はあるのか？

In [None]:
plot(reds['alcohol'], type: :histogram).show
plot(whites['alcohol'], type: :histogram).show

In [None]:
reds['alcohol'].mean

In [None]:
whites['alcohol'].mean

In [None]:
Statsample::Analysis.store(Statsample::Test::T) do
  t = Statsample::Test.t_two_samples_independent(reds['alcohol'], whites['alcohol'])
  summary t
end
Statsample::Analysis.run_batch

これも `p < 0.01`を下回ってるので差がある。

In [None]:
reds['pH'].mean

In [None]:
whites['pH'].mean

In [None]:
Statsample::Analysis.store(Statsample::Test::T) do
  t = Statsample::Test.t_two_samples_independent(reds['pH'], whites['pH'])
  summary t
end
Statsample::Analysis.run_batch

In [None]:
plot(wine.corr, type: :heatmap, layout: { width: 500, height: 500 }).show

In [None]:
plot(reds.corr, type: :heatmap, layout: { width: 500, height: 500 }).show

In [None]:
plot(whites.corr, type: :heatmap, layout: { width: 500, height: 500 }).show