# Creating variables

Please note: this method currently only works for the development version of nctoolkit, and the feature will be available in the public release on pypi and conda-forge in February 2021.

Variable creation in nctoolkit can be done using the `assign` method, which works in a similar way to the method available in Pandas. As with other tutorials on this site, we can use a global sea surface temperature set available from NOAA which is described [here](https://psl.noaa.gov/data/gridded/data.cobe2.html).

In [30]:
import nctoolkit as nc

Let's start by reading the data into a dataset, and select the first time step.

In [31]:
data = nc.open_thredds("https://psl.noaa.gov/thredds/dodsC/Datasets/COBE2/sst.mon.mean.nc")
data.select(time = 0)

In [32]:
data.plot()

The `assign` method works using lambda functions. Let's say we want to convert temperature, in celsius, to temperature in Kelvin. We can do it in the following way:

In [33]:
data_k = data.copy()
data_k.assign(sst_k = lambda x: x.sst + 273.15)

We can now see that a new variable has been created:

In [34]:
data_k.plot("sst_k")

However, we now have 2 variables in our dataset:

In [35]:
data_k.variables

['sst', 'sst_k']

We may only want the new variable. In that case you can use the drop argument:

In [36]:
data_k = data.copy()
data_k.assign(sst_k = lambda x: x.sst + 273.15, drop = True)

This results in only one variable:

In [37]:
data_k.variables

['sst_k']

Note that the `assign` method uses kwargs for the lambda functions, so drop can be positioned anywhere. So the following will do the same thing

In [38]:
data_k = data.copy()
data_k.assign(sst_k = lambda x: x.sst + 273.15, drop = True)
data_k = data.copy()
data_k.assign(drop = True, sst_k = lambda x: x.sst + 273.15)

The way to read the lambda functions sent to `assign` is as what you want to do to each grid cell for each time step. So every part of the lambda function must evaluate to a number. So the following will work:

In [39]:
k = 273.15
data_k = data.copy()
data_k.assign(drop = True, sst_k = lambda x: x.sst + k)

However, if you set `k` to a string or anything other than a number it will throw an error.

## Applying mathematical functions to dataset variables

As part of your lambda function you can use a number of standard mathematical functions. These all have the same names as those in numpy: `abs`, `floor`, `ceil`, `sqrt`, `exp`, `log10`, `sin`, `cos`, `tan`, `arcsin`, `arccos` and `arctan`.

    
    
    


For example if you wanted to calculate the ceiling of temperature you could do the following:

In [40]:
data_k = data.copy()
data_k.assign(sst_ceil = lambda x: ceil(x.sst))

## Using spatial statistics

The `assign` method carries out its calculations in each time step, and you can access spatial statistics for each time step when generating new variables. A series of functions are available that have the same names as nctoolkit methods for spatial statistics:  `spatial_mean`, `spatial_max`, `spatial_min`, `spatial_sum`, `vertical_mean`, `vertical_max`, `vertical_min`, `vertical_sum`, `zonal_mean`, `zonal_max`, `zonal_min` and `zonal_sum`.

If we are working with spatial data, we might want to identify regions which are warmer than average. We can do this for the temperature dataset as follows:

In [41]:
data_warm = data.copy()
data_warm.assign(sst_comp = lambda x: x.sst - spatial_mean(x.sst), drop = True)

We can then see which regions are warmer than the global average:

In [42]:
data_warm.plot()

You can process multiple variables at once using `assign`. Variables will be created in the order given, and variables created by the first lambda function can be used by the next one, and so on. The simple example below shows how this works. First we create a var1, which is temperature plus 1. Then var2, which is var1 plus 1. Finally, we calculate the difference between var1 and var2, and this should be 1 everywhere:

In [43]:
data2 = data.copy()
data2.assign(var1 = lambda x: x.sst + 1, var2 = lambda x: x.var1 + 1, diff = lambda x: x.var2 - x.var1)
data2.plot("diff")

## Functions that work with nctoolkit variables

The following functions can be used on nctoolkit variables as part of lambda functions.

| Function      | Description |  Example |
| ----------- | ----------- | ----------- |
| `abs`      | Absolute value       | `abs(x.sst)` |
| `floor`      | Floor of variable       | `floor(x.sst + 8.2)` |
| `ceiling`      | Ceiling of variable       | `ceiling(x.sst -1)` |
| `sqrt`      | Square root of variable       | `sqrt(x.sst + 273.15)` |
| `exp`      | Exponential of variable       | `exp(x.sst)` | 
| `log10`      | Base log10 of variable       | `log10(x.sst + 1)`  |
| `log`      | Natural log of variable       | `log10(x.sst + 1)`  |
| `sin`      | Trigonometric sine of variable       | `sin(x.var)`  |
| `cos`      | Trigonometric cosine of variable       | `cos(x.var)`  |
| `tan`      | Trigonometric tangent of variable       | `tan(x.var)`  |
| `spatial_mean`      | Spatial mean of variable at time-step       | `spatial_mean(x.var)`  |
| `spatial_max`      | Spatial max of variable at time-step       | `spatial_max(x.var)`  |
| `spatial_min`      | Spatial min of variable at time-step       | `spatial_min(x.var)`  |
| `spatial_sum`      | Spatial sum of variable at time-step       | `spatial_sum(x.var)`  |
| `zonal_mean`      | Zonal mean of variable at time-step       | `zonal_mean(x.var)`  |
| `zonal_max`      | Zonal max of variable at time-step       | `zonal_max(x.var)`  |
| `zonal_min`      | Zonal min of variable at time-step       | `zonal_min(x.var)`  |
| `zonal_sum`      | Zonal sum of variable at time-step       | `zonal_sum(x.var)`  |
| `isnan`      | Is variable a missing value/NA?      | `isnan(x.var)`  |
| `cell_area`      | Area of grid-cell (m^2)      | `cell_area(x.var)`  |
| `isnan`      | Is variable a missing value/NA?      | `isnan(x.var)`  |
| `level`      | Vertical level of variable      | `level(x.var)`  |
| `timestep`      | Time step of variable. Using Python indexing. | `timestep(x.var)`  |
| `longitude`      | Longitude of the grid cell | `longitude(x.var)`  |
| `latitude`      | Latitude of the grid cell | `latitude(x.var)`  |
| `year`      | Year of the variable | `year(x.var)`  |
| `month`      | Month of the variable | `month(x.var)`  |
| `day`      | Day of the month of the variable | `day(x.var)`  |
| `hour`      | Hour of the day of the variable | `hour(x.var)`  |