# Anamorphosis

<!-- SUMMARY: An example for Gaussian Anamorphosis -->

<!-- CATEGORY: Basic_Objects -->

## Import packages

In [None]:
import numpy as np
import pandas as pd
import sys
import os
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc

gdoc.setNoScroll()

## Reading data

The data are stored in a CSV format in the file called Pollution.dat. We concentrate on the varibale named **Pb**.

In [None]:
filepath = gdoc.loadData("Pollution", "Pollution.dat")
mydb = gl.Db.createFromCSV(filepath,gl.CSVformat())
mydb.setLocators(["X","Y"],gl.ELoc.X)
mydb.setLocator("Pb",gl.ELoc.Z)

dbfmt = gl.DbStringFormat.createFromFlags(flag_vars=True, flag_extend=True, flag_stats=True,
                                         names=["*Pb"]) 
mydb.display(dbfmt)

We denote that one sample has no value defined: therefore only 101 values are available. Moreover, the following histogram shows the presence of two outliers (value above 24).

In [None]:
ax = gp.histogram(mydb, name="Pb", bins=50)
ax.decoration(title="Pb (initial)")

We decide to mask these two outliers off. This is an opportunity to create a selection applied on the Data Base.

In [None]:
tab = mydb.getColumn("Pb")
iuid = mydb.addSelection(tab<24)

In [None]:
ax = gp.histogram(mydb, name="Pb", bins=50)
ax.decoration(title="Pb after filtering the two outliers")

The updated statistics show that the active values of the variable Pb now vary between 3 and 12.7. Note the variance of the Pb variable is equal to 2.881 (instead of 12.9 prior to masking the outliers off).

In [None]:
mydb.display(dbfmt)

In [None]:
ax = mydb.plot(nameColor="Pb",size=50)
ax.decoration(title="Data Set (Outliers have been masked off)")
plt.show()

## Variograms

We first define the geometry of the variogram calculations

In [None]:
myVarioParamOmni = gl.VarioParam()
mydir = gl.DirParam.create(npas=10, dpas=1.)
myVarioParamOmni.addDir(mydir)

We calculate the experimental omni-directional variogram

In [None]:
myvario = gl.Vario(myVarioParamOmni)
err = myvario.compute(mydb,gl.ECalcVario.VARIOGRAM)

The variogram is represented graphically. 

In [None]:
ax = myvario.plot()
ax.decoration(title="Omni-directional Variogram for Pb")

## Model

Fitting a Model. We call the Automatic Fitting procedure providing the list of covariance functions to be tested.

In [None]:
mymodel = gl.Model.createFromDb(mydb)
err = mymodel.fit(myvario,[gl.ECov.EXPONENTIAL,gl.ECov.SPHERICAL])

Visualizing the resulting model, overlaid on the experimental variogram

In [None]:
ax = gp.varmod(myvario,mymodel)
ax.decoration(title="Model for Pb")

In [None]:
mymodel.setDriftIRF()
mymodel.display()

## Empirical Anamorphosis

We first perform the simplest Anamorphosis transform which turns the histogram of the raw variable into the histogram of a gaussian variable. This is done by **comparing** the cumulated density functions.

In [None]:
myanamE = gl.AnamEmpirical()
myanamE.fitFromLocator(mydb)
myanamE.display()

## Gaussian Anamorphosis

We transform the Data into Gaussian. This requires the definition of a transform function called **Gaussian Anamophosis**. This function is expanded on a basis of Hermite polynomials: here 30 polynomials are used.

In [None]:
myanam = gl.AnamHermite(30)
myanam.fitFromLocator(mydb)
myanam.display()

We can produce the Gaussian Anamorphosis graphically within its definition domain.

In [None]:
ax = gp.anam(myanam)
ax.decoration(title="Anamorphosis")

The next step consists in translating the target variable ('Pb') into its Gaussian transform. We can check that the newly created variable is centered with a mean close to 0 and a variance close to 1.

In [None]:
err = myanam.rawToGaussianByLocator(mydb)
mydb.display(dbfmt)

The histogram of the transformed values show the expected beel shape.

In [None]:
ax = gp.histogram(mydb, name="Y.Pb", bins=50)

## Variogram in the Gaussian scale

We calculate the experimental (omni-directional) variogram on the Gaussian transformed variable.

In [None]:
myvarioG = gl.Vario(myVarioParamOmni)
err = myvarioG.compute(mydb,gl.ECalcVario.VARIOGRAM)

We fit the model by automatic fit. In some cases, it is required the resulting model to have its sill equal to 1: this constraints is added to the fitting step;

In [None]:
mymodelG = gl.Model.createFromDb(mydb)
constr = gl.Constraints(1)
err = mymodelG.fit(myvarioG,[gl.ECov.EXPONENTIAL], constr)
ax = gp.varmod(myvarioG,mymodelG)
ax.decoration(title="Model for Gaussian Pb")

## Back transform from Gaussian to Raw scale

We turn the Gaussian values back to the Raw scale. This exercise is not very demonstrative when based on the initial data themselves: in operational framework, we use this transform to turn newly created values in the Gaussian scale (results of Simulations for example) back in the Raw scale.

In [None]:
myanam.gaussianToRaw(mydb,"Y.Pb")
mydb.display(dbfmt)

The back transformation, from Gaussian to Raw scale, is performed using the Hermite polynomial expansion (with a limited number of polynomials). This is the reason why we may expect each datum not to coincide exactly with its initial value. This is demonstrated in the next correlation plot.

In [None]:
ax = gp.correlation(mydb, namex="Pb", namey="Z.Y.Pb", asPoint=True)