In [15]:
using Distributions, DataFrames, Random, CSV

ArgumentError: ArgumentError: Package CSV not found in current path:
- Run `import Pkg; Pkg.add("CSV")` to install the CSV package.


## Normal Distribution

In [2]:
Random.seed!(1234)

MersenneTwister(UInt32[0x000004d2], Random.DSFMT.DSFMT_state(Int32[-1393240018, 1073611148, 45497681, 1072875908, 436273599, 1073674613, -2043716458, 1073445557, -254908435, 1072827086  …  -599655111, 1073144102, 367655457, 1072985259, -1278750689, 1018350124, -597141475, 249849711, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], UInt128[0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000  …  0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x000000000000

In [3]:
n = Distributions.Normal()
params(n)

(0.0, 1.0)

Using the params() function, we note a mean on $0$ and a standard deviation of $1$, also called the standard normal distribution.

The fieldnames() function provides the actual parameters of the given distribution. In the case of the normal distribution, it will be the average and the standard deviation, namely $\mu$ and $\sigma$.

In [4]:
# Returning the parameters of the normal distribution
fieldnames(Normal)

(:μ, :σ)

In [5]:
var1 = rand(n, 10)

10-element Array{Float64,1}:
  0.8673472019512456
 -0.9017438158568171
 -0.4944787535042339
 -0.9029142938652416
  0.8644013132535154
  2.2118774995743475
  0.5328132821695382
 -0.27173539603462066
  0.5023344963886675
 -0.5169836206932686

In [6]:
mean(var1), std(var1)

(0.18909179133831322, 0.9879593623730926)

In [7]:
#Probability density function value at x = 0.3
pdf(Normal(), 0.3)

0.38138781546052414

In [8]:
#Cumulative distribution function as x = 0.25
cdf(Normal(), 0.25)

0.5987063256829237

In [9]:
var2 = rand(Normal(100, 10), 100);

In [10]:
# Using fit() to calculate the parameters of a distribution
fit(Normal, var2)

Normal{Float64}(μ=98.50583989904842, σ=9.591211638396837)

In [11]:
# Quantiles
quantile(Normal(), 0.025)

-1.9599639845400592

In [12]:
quantile(Normal(), 0.975)

1.9599639845400576

### Other Distribution types

In [13]:
# Beta distribution
b = Beta(1, 1)
params(b)
var3 = rand(b, 100);
fit(Beta, var3)

Beta{Float64}(α=1.0960317409697764, β=1.0578819792921308)

In [14]:
# χ2 distribution
c = Chisq(1)
var4 = rand(c, 100)
fieldnames(Chisq) # Degrees of freedom

(:ν,)

# DataFrames

In [None]:
#Create and empty DataFrame
df = DataFrame();

In [None]:
# Add a column with data point values (rows)
df[:Var2] = var2;

In [None]:
#View first five rows
first(df, 5)

In [None]:
# Add another column
df[:Var3] = var3;

In [None]:
# View last three rows
last(df, 3)

In [None]:
# Dimensions of a DataFrame
size(df)

In [None]:
# Summarize the content
describe(df)

In [None]:
# Data type only
eltypes(df)

In [None]:
df2 = DataFrame()
df2[:A] = 1:10
df2[:B] = ["I", "II", "II", "I", "II","I", "II", "II", "I", "II"]
Random.seed!(1234)
df2[:C] = rand(Normal(), 10)
df2[:D] = rand(Chisq(1), 10);

In [None]:
# First three rows with all the colums
df2[1:3, :]

In [None]:
# All rows columns 1 and 3
df2[:, [1, 3]]

In [None]:
# Different notation
df2[:, [:A, :C]]