# MultivariateDiscretization.jl v0.1.0

### Installing necessary packages

In [None]:
# Installation
Pkg.add("MultivariateDiscretization");

In [11]:
# Loading packages
using MultivariateDiscretization
using Pkg
using DataFrames
using Distributions
using StatsBase
using Statistics
using LinearAlgebra
using MultivariateStats

### Creating sample data

In [9]:
# number of dimensions
dims=10
# number of points
points=100

100

In [28]:
# generate test dataset
A = rand(Float64, (dims,dims))
Σ = A*A'
μ = rand(dims)
d1 = MvNormal(μ, 1/4*Σ)
Testdata = rand(d1,points);

## Discretization methods

The input for all methods has to be an ```Array{Float64,N}```. Discretization is performed __*row-wise*__.<br/>
The follwing algorithms are currently implemented(v0.1.0):
* Greedy Interarction Perserving Discretization
* Clustered Greedy Interarction Perserving Discretization
* Correlation Preserving Discretization
* clustered Correlation Preserving Discretization
* Independent Bayesian Blocks
* Independent Uniform Width with adapted number of bins

The results will be return as DataFrame.

### (Clustered) Greedy Interarction Perserving Discretization

```python
greedy_IPD(M::Array{Float64},ndim::Int64,T::Int64,disc=:km,skip=[],limit=10)
```
* ```M```: Input data
* ```ndim```: number of dimensions
* ```T```: number of microbins
* ```disc```: type of discretizer for microbins select ```:km``` for k-means or ```:uw``` for uniform width

```python
greedy_IPD_clustered(M::Array{Float64},ndim::Int64,T::Int64,c::Int64,disc=:km)
```
* ```M```: Input data
* ```ndim```: number of dimensions
* ```T```: number of microbins
* ```c```: number of clusters
* ```disc```: type of discretizer for microbins select ```:km``` for k-means or ```:uw``` for uniform width

In [27]:
ipd = greedy_IPD(Testdata,dims,35)
ipd_clustered = greedy_IPD_clustered(Testdata,dims,35,3);

### (Clustered) Correlation Preserving Discretization

```python
CPD(M::Array{Float64}, ndim::Int64, npoints::Int64, k::Int64, d=:km, bintype=:manual, bins=Int(round(ndim)),processing=:none)
```
* ```M```: Input data
* ```ndim```: number of dimensions
* ```npoints```: number of datapoints
* ```k```: number of principal components to be retained
* ```d```: type of discretizer for microbins select ```:km``` for k-means or ```:uw``` for uniform width
* ```bintype```: choose type of binning from ```Discretizers.jl```
* ```bins```: select number of bins per principal component
* ```processing```: select ```:none``` for standard or ```:cutpoints```to discretize on cutpoints

```python
CPD_clustered(M::Array{Float64}, ndim::Int64, npoints::Int64, c::Int64, p::Float64, d=:km)
```
* ```M```: Input data
* ```ndim```: number of dimensions
* ```npoints```: number of datapoints
* ```c```: number of clusters
* ```p```: percentage of principle components to be retained
* ```d```: type of discretizer for microbins select ```:km``` for k-means or ```:uw``` for uniform width

In [31]:
cpd = CPD(Testdata,dims,points,3,:km,:manual,:10)
cpd_clustered = CPD_clustered(Testdata,dims,points,3,float(3));

### Independent Bayesian Blocks

```
BayesianBlocks(M::Array{Float64},ndim::Int64)
```
* ```M```: Input data
* ```ndim```: number of dimensions

In [29]:
bb = BayesianBlocks(Testdata,dims);

### Independent Uniform Width

```
UW(M::Array{Float64},ndim::Int64,nbins::Int64)
```
* ```M```: Input data
* ```ndim```: number of dimensions
* ```nbins```: number of bins

In [34]:
uw = MultivariateDiscretization.UW(Testdata,dims,3);

## Calculating the Cumulative Jensen Shannon distance

```
CJS_empirical(F::Matrix,ndim::Int64,nsamples::Int64,nsteps::Int64)
```
* ```M```: Input data
* ```ndim```: number of dimensions
* ```nsamples```: number of datapoints
* ```nsteps```: number of steps for numeric integration

In [37]:
cjs_ipd = MultivariateDiscretization.CJS_empirical(Matrix(float.(ipd)),dims,points,200)
cjs_cpd = MultivariateDiscretization.CJS_empirical(Matrix(float.(cpd)),dims,points,200)
cjs_bb = MultivariateDiscretization.CJS_empirical(Matrix(float.(bb)),dims,points,200)
cjs_uw = MultivariateDiscretization.CJS_empirical(Matrix(float.(uw)),dims,points,200)
cjs_or = MultivariateDiscretization.CJS_empirical(Testdata,dims,points,200);