# Intro Into Features
-----

## What are features?

According to Wiki, a feature is an individual measurable property or characteristic of a phenomenon. It is basically the datapoint we use as input to machine learning models.
Before we do any sort of modeling, we need to have an understanding of the features we will use as input and construct them in a way a neural net model can use them.

This is what the `f3atur3s` and `eng1n3` packages do. The f3atures package defines the features, the eng1n3 package reads them from files for instance.

We will start off with some simple examples of Rank-0 and Rank-1 features, these are to most common in traditional machine learning. Each individual feature will be a scalar or a array when you have the feature as columns and multiple rows of data, you get something that looks like an excel spreadsheet, i.e a Rank-2 data-structure. Where the columns are various features and the lines are training/test samples.

## Requirements
Before running the experiment, make sure to import the `numpy`, `pandas` and `numba` packages in your virtual environment
```
> pip install numpy
> pip install pandas
> pip install numba
```
And that the notebook can find the `f3atur3s` and `eng1n3` packages.


## Preparation

Before creating features, we will have to import a couple of packages

In [1]:
import numpy as np
import pandas as pd
import f3atur3s as ft
import eng1n3.pandas as en

And we define the **file** we will read from.

In [3]:
file = './data/intro_card.csv'

Let's have a look at the raw content of the file. It's just a very simple comma delimited file with a header column for the name.

In [4]:
raw_file = open(file)
raw_content = raw_file.read()
print(raw_content)
raw_file.close()

Date,Amount,Card,Merchant,MCC,Country,Fraud
20200101,1.0,CARD-1,MRC-1,0001,DE,0
20200102,2.0,CARD-2,MRC-2,0002,GB,0
20200103,3.0,CARD-1,MRC-3,0003,DE,1
20200104,4.0,CARD-1,MRC-3,0003,FR,0
20200104,5.0,CARD-2,MRC-2,0002,GB,0
20200106,6.0,CARD-2,MRC-4,,DE,0


## Rank 1

### FeatureSource
First let's build a couple of simple string features

In [5]:
card = ft.FeatureSource(
    'Card',                 # The name of the source feature, for source features this must match with the header 
    ft.FEATURE_TYPE_STRING  # The data type of the feature, in this case we intepret as a string
)

merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)

td = ft.TensorDefinition(
    'Features',       # Name for the TensorDefinition
    [card, merchant]        # A list of features to build
)

# Now ask the EnginePandas to make a Pandas DataFrame from the TensorDefinition
with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(
        td,                 # Our TensorDefinition with the 2 features in it.
        file,               # The file we want to read
        inference=False     # Inference is False, we assume we are building a training dataset.
    )

# Display the Pandas DataFrame
df

2023-03-07 11:03:03.900 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:03:03.901 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:03:03.901 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:03:03.902 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:03:03.912 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Card,Merchant
0,CARD-1,MRC-1
1,CARD-2,MRC-2
2,CARD-1,MRC-3
3,CARD-1,MRC-3
4,CARD-2,MRC-2
5,CARD-2,MRC-4


We can also read numerical features into floats

In [6]:
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)

td = ft.TensorDefinition('Features', [card, merchant, amount])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)
    
df

2023-03-07 11:03:06.511 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:03:06.511 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:03:06.512 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:03:06.513 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:03:06.515 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Card,Merchant,Amount
0,CARD-1,MRC-1,1.0
1,CARD-2,MRC-2,2.0
2,CARD-1,MRC-3,3.0
3,CARD-1,MRC-3,4.0
4,CARD-2,MRC-2,5.0
5,CARD-2,MRC-4,6.0


And validate that the type is indeed different

In [7]:
df.dtypes

Card         object
Merchant     object
Amount      float64
dtype: object

In order to read the dates, we need to provide a format_code that explains how to interpret the raw string and form a date from it.

These are the standard Python format codes, more info in the standard documentation. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

In [8]:
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')

td = ft.TensorDefinition('Features', [date, card, merchant, amount])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)
    
df

2023-03-07 11:03:20.820 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:03:20.822 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:03:20.822 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:03:20.823 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:03:20.829 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Amount
0,2020-01-01,CARD-1,MRC-1,1.0
1,2020-01-02,CARD-2,MRC-2,2.0
2,2020-01-03,CARD-1,MRC-3,3.0
3,2020-01-04,CARD-1,MRC-3,4.0
4,2020-01-04,CARD-2,MRC-2,5.0
5,2020-01-06,CARD-2,MRC-4,6.0


And validate we did indeed get a datetime field

In [9]:
df.dtypes

Date        datetime64[ns]
Card                object
Merchant            object
Amount             float64
dtype: object

And finally let's interpret the merchant code and country as categorical features. Pandas has an efficient way of storing them. If you have a string feature with relatively low cardinality, it's best to read it as categorical

In [10]:
# Note this feature has a default, the mcc value for the 5th row is empty. We don't want empty values.
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)

td = ft.TensorDefinition('Features', [date, card, merchant, amount, mcc, country])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)
    
df

2023-03-07 11:03:28.009 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:03:28.010 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:03:28.010 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:03:28.011 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:03:28.049 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Amount,MCC,Country
0,2020-01-01,CARD-1,MRC-1,1.0,1,DE
1,2020-01-02,CARD-2,MRC-2,2.0,2,GB
2,2020-01-03,CARD-1,MRC-3,3.0,3,DE
3,2020-01-04,CARD-1,MRC-3,4.0,3,FR
4,2020-01-04,CARD-2,MRC-2,5.0,2,GB
5,2020-01-06,CARD-2,MRC-4,6.0,0,DE


In [11]:
df.dtypes

Date        datetime64[ns]
Card                object
Merchant            object
Amount             float64
MCC               category
Country           category
dtype: object

### FeatureOneHot
Neural Nets want numerical input, they can not directly use string or categorical values. So we'll need to transform features such as the country and MCC into a number before we can use them. One way of doing this is through one-hot encoding.

A One Hot transformation will create a new column for each unique value of the original *base* feature. Those columns only contain 0 or 1. The value is 1 if the row of the base feature contained the respective value.

For instance country has 3 unique values. 'DE', 'FR' and 'GB'. A One Hot encoding will create 3 columns, `Country__DE`, `Country__FR` and `Country_GB`. For row one the original value was 'DE', so the column `Country__DE` will be 1, the other columns 0.

In [11]:
card = ft.FeatureSource('Card', ft.FEATURE_TYPE_STRING)
merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)

# Define 2 OneHot Features
mcc_oh = ft.FeatureOneHot(
    'MCC_OH',               # Name of the feature
    ft.FEATURE_TYPE_INT_8,  # Data type of the feature, in this case a small Int. Must be INT for OneHot Features
    mcc                     # The base feature, i.e. the feature to be converted into a one-hot encoding.
)
country_oh = ft.FeatureOneHot('Country_OH', ft.FEATURE_TYPE_INT_8, country)

td = ft.TensorDefinition('Features', [date, card, merchant, amount, mcc_oh, country_oh])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)

df

2023-03-06 12:48:26.776 eng1n3.common.engine           INFO     Start Engine...
2023-03-06 12:48:26.777 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-06 12:48:26.777 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-06 12:48:26.778 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-06 12:48:26.785 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Amount,MCC__0001,MCC__0002,MCC__0003,MCC__0000,Country__DE,Country__FR,Country__GB
0,2020-01-01,CARD-1,MRC-1,1.0,1,0,0,0,1,0,0
1,2020-01-02,CARD-2,MRC-2,2.0,0,1,0,0,0,0,1
2,2020-01-03,CARD-1,MRC-3,3.0,0,0,1,0,1,0,0
3,2020-01-04,CARD-1,MRC-3,4.0,0,0,1,0,0,1,0
4,2020-01-04,CARD-2,MRC-2,5.0,0,1,0,0,0,0,1
5,2020-01-06,CARD-2,MRC-4,6.0,0,0,0,1,1,0,0


We now have more numbers for a model to work with

In [12]:
df.dtypes

Date           datetime64[ns]
Card                   object
Merchant               object
Amount                float64
MCC__0001               uint8
MCC__0002               uint8
MCC__0003               uint8
MCC__0000               uint8
Country__DE             uint8
Country__FR             uint8
Country__GB             uint8
dtype: object

### FeatureIndex
Another way to convert a string or categorical feature into a number is applying an indexing transformation. Indexing convert the value of the original *base* feature. 

A FeatureIndex will keep a dictionary of unique values. For instance for Country we have 3 unique values  'DE', 'FR' and 'GB'. The dictionary will map 'DE'->1, 'FR'->2 and 'GB'->3.

In [13]:
card = ft.FeatureSource('Card', ft.FEATURE_TYPE_STRING)
merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)

# Define 2 Index Features
mcc_i = ft.FeatureIndex(
    'MCC_Index',             # Name of the feature
    ft.FEATURE_TYPE_INT_16,  # Data type of the feature, in this case a small Int. Must be INT for Index Features
    mcc                      # The base feature, i.e. the feature to be converted into an index
)
country_i = ft.FeatureIndex('Country_Index', ft.FEATURE_TYPE_INT_16, country)

td = ft.TensorDefinition('Features', [date, card, merchant, amount, mcc_i, country_i])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)

df

2023-03-06 12:48:43.898 eng1n3.common.engine           INFO     Start Engine...
2023-03-06 12:48:43.899 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-06 12:48:43.899 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-06 12:48:43.900 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-06 12:48:43.907 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Amount,MCC_Index,Country_Index
0,2020-01-01,CARD-1,MRC-1,1.0,1,1
1,2020-01-02,CARD-2,MRC-2,2.0,2,2
2,2020-01-03,CARD-1,MRC-3,3.0,3,1
3,2020-01-04,CARD-1,MRC-3,4.0,3,3
4,2020-01-04,CARD-2,MRC-2,5.0,2,2
5,2020-01-06,CARD-2,MRC-4,6.0,4,1


In [14]:
df.dtypes

Date             datetime64[ns]
Card                     object
Merchant                 object
Amount                  float64
MCC_Index                 int16
Country_Index             int16
dtype: object

We can have a look at the dictionary, the feature now knows about the values it encoutered

In [15]:
country_i.dictionary

{'DE': 1, 'GB': 2, 'FR': 3}

### Feature Bin
In some case we may want to convert a float number into a integer, something that can be used as a 'categorical' learning type feature. (For instance in order to use it in an embedding). We can use a binning feature to achieve this. Binning will take the total range of value in the base feature (the feature to bin) into equal parts, for instance 0->10 is bin-0, 10->20 is bin-1 etc.... 

In [21]:
card = ft.FeatureSource('Card', ft.FEATURE_TYPE_STRING)
merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)

# Define FeatureBin
amount_b = ft.FeatureBin(
    'Binned_Amount',         # Name of the feature
    ft.FEATURE_TYPE_INT_16,  # Data type of the feature, in this case a small Int. Must be INT for Bin Features
    amount,                  # The base feature, i.e. the feature to be converted into be binned
    3                        # Specify the number of bins we want
)

td = ft.TensorDefinition('Features', [date, card, merchant, amount, amount_b, mcc, country])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)

df

2023-03-07 11:39:36.927 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:39:36.928 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:39:36.928 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:39:36.929 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:39:36.936 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Amount,Binned_Amount,MCC,Country
0,2020-01-01,CARD-1,MRC-1,1.0,0,1,DE
1,2020-01-02,CARD-2,MRC-2,2.0,1,2,GB
2,2020-01-03,CARD-1,MRC-3,3.0,1,3,DE
3,2020-01-04,CARD-1,MRC-3,4.0,2,3,FR
4,2020-01-04,CARD-2,MRC-2,5.0,2,2,GB
5,2020-01-06,CARD-2,MRC-4,6.0,2,0,DE


The `Binned_Amount` feature is of type 'category', for optimal storage in the dataframe

In [16]:
df.dtypes

Date             datetime64[ns]
Card                     object
Merchant                 object
Amount                  float64
Binned_Amount          category
MCC                    category
Country                category
dtype: object

And we can ask the feature which bins it was using. It will report the maximum value for each bin. In this case
- Anything <= 1, is bin `0`
- Anything > 1 and <= 3.5 is bin `1`
- Anything > 3.5 <= postive max float value is bin `2`

In [17]:
amount_b.bins

[1.0, 3.5, 1.7976931348623157e+308]

### FeatureConcat
It is also possible to concatenate two string features into a new feature. For instance concatenate the card and merchant features. We will see later on that this can be useful for tracking behaviours of a specifici card/merchant combination.

In [23]:
card = ft.FeatureSource('Card', ft.FEATURE_TYPE_STRING)
merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)

# Define FeatureConcat
card_merchant = ft.FeatureConcat(
    'Card_Merchant',         # Name of the feature
    ft.FEATURE_TYPE_STRING,  # Data type of the feature, in this case a STRING. We can only concat strings.
    card,                    # The base feature, i.e. the first feature in the concat operation.
    merchant                 # The concat feature, i.e the second feature in the concat operations
)

td = ft.TensorDefinition('Features', [date, card, merchant, card_merchant, amount, mcc, country])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)

df

2023-03-07 11:39:59.994 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:39:59.995 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:39:59.995 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:39:59.996 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:40:00.005 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Card_Merchant,Amount,MCC,Country
0,2020-01-01,CARD-1,MRC-1,CARD-1MRC-1,1.0,1,DE
1,2020-01-02,CARD-2,MRC-2,CARD-2MRC-2,2.0,2,GB
2,2020-01-03,CARD-1,MRC-3,CARD-1MRC-3,3.0,3,DE
3,2020-01-04,CARD-1,MRC-3,CARD-1MRC-3,4.0,3,FR
4,2020-01-04,CARD-2,MRC-2,CARD-2MRC-2,5.0,2,GB
5,2020-01-06,CARD-2,MRC-4,CARD-2MRC-4,6.0,0,DE


In [24]:
df.dtypes

Date             datetime64[ns]
Card                     object
Merchant                 object
Card_Merchant            object
Amount                  float64
MCC                    category
Country                category
dtype: object

### FeatureExpression
Eventhough there are multiple types of Features, in some instance we may want to create a feature as a custom function of some other features. That is where the FeatureExpression comes in handy. It allows us to define a feature as that takes a function and a list of parameters as input and it executes the function on earch row.

Please note that executing a function, may **not be as efficient** as doing true vectorized operations on the rows.

In [35]:
# Create a small Python function that doubles an input amount. 
# Note: The function must be available in the ROOT context of the engine!
def double_fn(x: float) -> float:
    return x * 2

card = ft.FeatureSource('Card', ft.FEATURE_TYPE_STRING)
merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)

# Define FeatureExpression
double_amount = ft.FeatureExpression(
    'Double_Amount',         # Name of the feature
    ft.FEATURE_TYPE_FLOAT,   # Data type of the feature, in this case a Float, the output of 'double_fn' is a float 
    double_fn,               # The expression we want to evaluate. Our 'double' function
    [amount]                 # The parameter features as list, in this case we only have one input, the 'x' value
)

td = ft.TensorDefinition('Features', [date, card, merchant, amount, double_amount, mcc, country])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)

df


2023-03-07 11:51:21.793 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 11:51:21.794 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 11:51:21.794 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 11:51:21.795 eng1n3.pandas.pandasengine     INFO     Building Panda for : SourceFeatures from file ./data/intro_card.csv
2023-03-07 11:51:21.798 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: SourceFeatures


Unnamed: 0,Date,Card,Merchant,Amount,Double_Amount,MCC,Country
0,2020-01-01,CARD-1,MRC-1,1.0,2.0,1,DE
1,2020-01-02,CARD-2,MRC-2,2.0,4.0,2,GB
2,2020-01-03,CARD-1,MRC-3,3.0,6.0,3,DE
3,2020-01-04,CARD-1,MRC-3,4.0,8.0,3,FR
4,2020-01-04,CARD-2,MRC-2,5.0,10.0,2,GB
5,2020-01-06,CARD-2,MRC-4,6.0,12.0,0,DE


### FeatureRatio
As the name suggests the FeatureRatio creates a new feature by dividing a base_feature by a denominator feature. This is generally faster than using a FeatureExpression like shown earlier. And it has some 0 division protection.

In [43]:
def double(x: float) -> float:
    return x * 2

card = ft.FeatureSource('Card', ft.FEATURE_TYPE_STRING)
merchant = ft.FeatureSource('Merchant', ft.FEATURE_TYPE_STRING)
amount = ft.FeatureSource('Amount', ft.FEATURE_TYPE_FLOAT)
date = ft.FeatureSource('Date', ft.FEATURE_TYPE_DATE, format_code='%Y%m%d')
mcc = ft.FeatureSource('MCC', ft.FEATURE_TYPE_CATEGORICAL, default='0000')
country = ft.FeatureSource('Country', ft.FEATURE_TYPE_CATEGORICAL)
double_amount = ft.FeatureExpression('Double_Amount', ft.FEATURE_TYPE_FLOAT, double, [amount])

# Define FeatureRatio
two = ft.FeatureRatio(
    'Two',                   # Name of the feature
    ft.FEATURE_TYPE_FLOAT_32,# Data type of the feature, in this case a Float, as the output of 'double' is a float 
    double_amount,           # The enumerator of our division, the ouput of the double function
    amount                   # The denominator of our division, the original amount
)

td = ft.TensorDefinition('Features', [date, card, merchant, amount, double_amount, two, mcc, country])

with en.EnginePandas(num_threads=1) as e:
    df = e.df_from_csv(td, file, inference=False)

df

2023-03-07 12:07:32.982 eng1n3.common.engine           INFO     Start Engine...
2023-03-07 12:07:32.983 eng1n3.pandas.pandasengine     INFO     Pandas Version : 1.5.3
2023-03-07 12:07:32.983 eng1n3.pandas.pandasengine     INFO     Numpy Version : 1.23.5
2023-03-07 12:07:32.984 eng1n3.pandas.pandasengine     INFO     Building Panda for : Features from file ./data/intro_card.csv
2023-03-07 12:07:32.991 eng1n3.pandas.pandasengine     INFO     Reshaping DataFrame to: Features


Unnamed: 0,Date,Card,Merchant,Amount,Double_Amount,Two,MCC,Country
0,2020-01-01,CARD-1,MRC-1,1.0,2.0,2.0,1,DE
1,2020-01-02,CARD-2,MRC-2,2.0,4.0,2.0,2,GB
2,2020-01-03,CARD-1,MRC-3,3.0,6.0,2.0,3,DE
3,2020-01-04,CARD-1,MRC-3,4.0,8.0,2.0,3,FR
4,2020-01-04,CARD-2,MRC-2,5.0,10.0,2.0,2,GB
5,2020-01-06,CARD-2,MRC-4,6.0,12.0,2.0,0,DE


For illustration purposes the data-type of the ratio feature was set to `float32 `. Whereas the original amount were `float64`, don't do this in a project, it can lead to overflow problems!

In [44]:
df.dtypes

Date             datetime64[ns]
Card                     object
Merchant                 object
Amount                  float64
Double_Amount           float64
Two                     float32
MCC                    category
Country                category
dtype: object