# Online vs. Offline

This notebook compares how to calculate statistics and models offline vs. online.

- [Count](#Count)
- [Count of Unique Values](#Count-of-Unique-Values)
- [Correlation Matrix](#Correlation-Matrix)
- [Covariance Matrix](#Covariance-Matrix)
- [Extrema](#Extrema)
- [Histogram](#Histogram)
- [Mean](#Mean)
- [Quantiles](#Quantiles)
- [Standard Deviation](#Standard-Deviation)
- [Sum](#Sum)
- [Variance](#Variance)

# Setup

In [26]:
using OnlineStats, Plots

y = randn(1000)
x = randn(1000, 2)
z = rand(1:4, 1000);

<br><br><br>

---

<br><br>

# Count

In [24]:
@show length(y)

Series(y, Count())

length(y) = 1000


[32m▦ Series{0}  |  EqualWeight  |  nobs = 1000[39m
└── Count(1000)

# Count of Unique Values

In [23]:
@show StatsBase.countmap(z)

Series(z, CountMap(Int))

StatsBase.countmap(z) = Dict(4=>247,2=>240,3=>257,1=>256)


[32m▦ Series{0}  |  EqualWeight  |  nobs = 1000[39m
└── CountMap{Int64}(Dict(4=>247,2=>240,3=>257,1=>256))

# Correlation Matrix

In [18]:
@show cor(x)

o = CovMatrix(2)
Series(x, o)
cor(o)

cor(x) = [1.0 -0.0119822; -0.0119822 1.0]


2×2 Array{Float64,2}:
  1.0        -0.0119822
 -0.0119822   1.0      

# Covariance Matrix

In [17]:
@show cov(x)

Series(x, CovMatrix(2))

cov(x) = [0.95111 -0.0119085; -0.0119085 1.03851]


[32m▦ Series{1}  |  EqualWeight  |  nobs = 1000[39m
└── CovMatrix([0.95111 -0.0119085; -0.0119085 1.03851])

# Extrema

In [25]:
@show extrema(y)

Series(y, Extrema())

extrema(y) = (-2.987079696391375, 3.2347949479529734)


[32m▦ Series{0}  |  EqualWeight  |  nobs = 1000[39m
└── Extrema((-2.98708, 3.23479))

# Histogram

In [32]:
h = fit(StatsBase.Histogram, y; closed=:left)

o = Hist(25)
o2 = Hist(-5:5)
Series(y, o, o2)

plot(plot(h), plot(o), plot(o2), label=[:StatsBase :Hist1 :Hist2])

# Mean

In [8]:
@show mean(y)

Series(y, Mean())

mean(y) = -0.0061267272900730665


[32m▦ Series{0}  |  EqualWeight  |  nobs = 100000[39m
└── Mean(-0.00612673)

# Quantiles

In [37]:
q = [.25, .5, .75]

@show quantile(y, q)

Series(y, PQuantile.(q)...)

quantile(y, q) = [-0.65369, 0.0423661, 0.70692]


[32m▦ Series{0}  |  LearningRate(r = 0.6)  |  nobs = 1000[39m
├── PQuantile(0.25, -0.6642007526443003)
├── PQuantile(0.5, 0.036925035319063874)
└── PQuantile(0.75, 0.7134127994026149)

In [38]:
Series(y, Quantile(q, OMAS()), Quantile(q, SGD()), Quantile(q, MSPI()))

[32m▦ Series{0}  |  LearningRate(r = 0.6)  |  nobs = 1000[39m
├── Quantile{OMAS}([-0.645057, 0.0362048, 0.646843])
├── Quantile{SGD}([-0.64843, -0.00949251, 0.662958])
└── Quantile{MSPI}([-0.644607, -0.00525413, 0.647619])

# Standard Deviation

In [20]:
@show std(y)

o = Variance()
Series(y, o)
std(o)

std(y) = 0.9966697007886478


0.9966697007886479

# Sum

In [9]:
@show sum(y)

Series(y, Sum())

sum(y) = -612.6727290073067


[32m▦ Series{0}  |  EqualWeight  |  nobs = 100000[39m
└── Sum{Float64}(-612.673)

# Variance

In [19]:
@show var(y)

Series(y, Variance())

var(y) = 0.9933504924701329


[32m▦ Series{0}  |  EqualWeight  |  nobs = 1000[39m
└── Variance(0.99335)