# Naming Convention for Db in gstlearn

## Preamble

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    // Remove Scrollbar in outputs
    return false;
}

This tutorial gives answers to the frequently asked question regarding the Naming Convention used for variables in a Data Base (Db) of gstlearn 

In [None]:
import numpy as np
import gstlearn as gl
import os
import sys

## Prepare the Environment

This paragraph defines the Space Dimension for the whole notebook. It also set the name of the Container (and a Prefix) used if Objects are saved as Neutral Files.

In [None]:
ndim = 2
gl.ASpaceObject.defineDefaultSpace(gl.SPACE_RN,ndim)

gl.ASerializable.setContainerName(True)
gl.ASerializable.setPrefixName("DbTest-");

The following object will enable having a complete view of the column / attribute manipulation. It will be used later in the notebook.

In [None]:
dbfmt = gl.DbStringFormat()
dbfmt.setParams(gl.FLAG_LOCATOR)

## Creating a data file

A Data Base is created for experimentation. It is constructed as a regular Grid (named **grid**). The variable *nech* will contain the number of samples within *grid*. The number of meshes is voluntarily limited. The mesh is square with dimension 1. The origin (lower left corner) is set to (10,20) in order to be able to distinguish coordinates along first and second axes.

In [None]:
grid = gl.DbGrid.create([5,5], [1,1], [10,20])
nech = grid.getSampleNumber()
print("Number of sample =",nech)
grid

The data base contains 3 fields, created automatically and respectively called *rank*, *x1* and *x2*. Note that the last two fields are considered as coordinates (locator *x*).

## Naming convention

We now add one new field (named *first*) where values are generated randomly (uniform drawn between 0 and 1). Note that, when adding this new field, a value is returned which corresponds to the number of the newly created *attribute*.

**Important remark: all numerical variables used to identify a field within a Db are considered as indices, i.e. they are numbered starting from 0**

In [None]:
tab = gl.ut_vector_simulate_uniform(nech)
iatt1 = grid.addColumns(tab,"first")
print("Attribute corresponding to 'first' =",iatt1)

We can double-check the attribute information by visiting the current contents of the *grid* Db. We check that the field *first* is the fourth (i.e. attribute #3).

In [None]:
grid

Let us add a series (3) of fields created simulateneously. They are filled with a constant value equal to 5. We also define a locator assigned to all the newly created variables: they will be considered as data variable (locator = *z*). 
Note the returned value: it corresponds to the attribute number assigned to the first new variable.

In [None]:
iatt2 = grid.addColumnsByConstant(3,5.,"second",gl.ELoc.Z)
print("Attribute corresponding to the first variable named 'second-x' =",iatt2)
grid

Note that the newly created fields are automatically named using the provided string (*second*) as a radix: the variables are names *second-1", "second-2" and "second-3".

Let us now envisage renaming the variable *second-2* into *first*.

In [None]:
grid.setName("second-2","first")
grid

As the name *first* already exists, the field has been renamed to *first.1* instead.

We now wish to rename the field *second-3* into *first*.

In [None]:
grid.setName("second-3","first")
grid

The automatic renaming procedure has been applied (adding ".1") iteratively until names are all different: the field is now called *first.1.1".

Now that we have demonstrated the uniqueness of the names, are there are ways to designate a field?
For the next demonstrations, we first recall the current status of the current Db.

In order to make the next paragrah more demonstrative, we change the contents of several fields

In [None]:
grid.setColumn(gl.ut_vector_simulate_uniform(nech),"second-1")
grid.setColumn(gl.ut_vector_simulate_uniform(nech),"first.1")
grid.setColumn(gl.ut_vector_simulate_uniform(nech),"first.1.1")

In [None]:
grid

### By Name

As an example, we access to the field named *first.1. For short, only the four first values are systematically printed. 

In [None]:
grid.getColumn("first.1")[0:4]

### By Column Index

In [None]:
grid.getColumnByColIdx(5)[0:4]

### By Attribute Index

In [None]:
grid.getColumnByUID(5)[0:4]

### By Locator

We note that the target variable corresponds to the locator *z2* which is the second one (index 1) or the Z-locator type.

In [None]:
grid.getColumnByLocator(gl.ELoc.Z,1)[0:4]

## Difference between Column and Attribute

We need to recall the *attribute*  value returned when adding the fields:
- *iatt1* (3) when adding the field named *first*
- *iatt2* (4) when adding the series of 3 fields (originally named after the radix *second*)

To better understand, we need to ask for the display of the data base with a specific option which describes the current status of the attributes, either unsorted or through an order driven by the locator

In [None]:
grid.display(dbfmt)

We can see that the 7 existing fields currently correspond to the 7 first columns of the Data Base *grid*. The second display gives the indices of the locators in use (*x* and *z*) and the indeices of the attributes corresponding to the ranks of the items for each locator type.

Things become more interesting if a field is deleted. To avoid any ambiguity, the field is designated by its name (say *x1*)

In [None]:
grid

In [None]:
grid.deleteColumn("x1")
grid

The previous printout shows the current contents of the data base where the field *x1* has been suppressed.
Note an important feature of the *locator* notion. For a given locator type (say *x* for coordinates), the locator type is unique and sorted continuously starting from 1.
Therefore, when we suppressed the variable *x1* (which corresponded to the locator type *x* and locator rank *1*), the variable *x2* is modified: its name and locator type are not changed but the locator rank is update from *2* to *1*.

We now look at the attributes internal management

In [None]:
grid.display(dbfmt)

We can see that the list of attributes has not been reduced: the maximum number of positions is still equal to 7. Instead, the rank of the attribute which corresponded to *x1* is now set to -1, to signify that the column is actually missing. The display sorted by locator does not need any additional explanation.

Let us now retrieve the information of variable *first.1*  as we did before. We start by addressing the variable by name.

In [None]:
grid.getColumn("first.1")[0:4]

We can similarly address it by its column index (the column has moved to rank 5)

In [None]:
grid.getColumnByColIdx(4)[0:4]

The magic of the *attribute* notion is that it can still be used **unchanged**

In [None]:
grid.getColumnByUID(5)[0:4]

Obviously, trying to read the field which corresponds to the field *x1* (that has just been deleted) returns an empty vector.

In [None]:
grid.getColumnByUID(1)

## Remark on Space Dimension

It might be considered as surprising to see that *grid* is considered as a 2-D Grid while there is only **one** coordinate field (locator *x*). In order to avoid any missunderstanding, let us recall this important fact.

The data base *grid* is organized as a grid and for that sake, it contains a descrption of the grid organization. This organization is used to elaborate the coordinates (for example when calling *getCoordinate()* method). The coordinate vectors must only be considered as decoration: they will not be used in any internal operation.

As an example this makes particular sense here as the contents of the variable *x2*, despite its locator rank *1* (i.e. index 0) actually contains the **second** coordinate of the samples, as demonstrated in the next line

In [None]:
grid.getColumnByLocator(gl.ELoc.X, 0)

Note that at any time, the coordinate vectors can be regenerated. To avoid confusion, the newly generated coordinate fields are named using the radix "X" (uppercase). This feature is obviously only available in the case of a grid

In [None]:
grid.generateCoordinates("X")

In [None]:
grid.getColumnByLocator(gl.ELoc.X, 0)

Similarly, we can generate a field containing the sample rank (similar as the information contained in the Field #1). Here again, we generate a new field containing this rank information: in order to avoid confusion, the new variable is called *RANK* (uppercase). Note that this field does not have any locator attached.

In [None]:
grid.generateRank("RANK")
grid

## Conclusion

As a conclusion:

- the variables can be used **safely** when designating them by their **name**
- the variables can be used easily when addressing them using the locator notion (type and index)
- the use of (column) index is always valid. This index must be defined precisely when using the variable (it must be updated in case of addition or deletion of other variables)
- the use of attribute is clever... but it must be used by expert who understands the process. It allows using  fix values, independently of the management of other fields

We also recall that all numbering refer to indices (0 based numbering). This is the case for *(column) index* as well as *locator index* per locator type.