Skip to content

Base Package form the definition of various types of features

License

Notifications You must be signed in to change notification settings

t0kk35/f3atur3s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

f3atur3s


Description

This is the base package for the eng1n3 and m0d3l packages. Features are the datapoints that are used in detection models. A feature needs to be defined before it be built by an engine and used in a m0d3l.

Features have some properties that define how they will be built, they have a class, the class of feature determines the building logic. For instance; a feature of type FeatureSource, is a feature that can be found directly in a source. Whereas a feature of type FeatureConcat will concatenate 2 other features.

When it is created each feature will have a rank. Some features are rank 2, they create a 2-dimensional BatchxFeature tensor. Other features are rank 3, they will create a 3-dimensional structure. For instance a Batch X Series X Feature tensor.

Currently following feature classes exist

Name Dimension Description
FeatureSource (BxF) A feature directly found in a source of information, for instance a file
FeatureOneHot (BxF) Creates a one hot encoding of a feature. It turns a categorical feature with a relatively small cardinality into something a model can use. For instance, say we have a file with 3 rows and one column named 'Country'. The values of the rows are 'ES', 'GB', 'DE'. A OneHot feature will turn into 3 separate columns. 'Country_ES', 'Country_GB' and 'Country_DE' respectively. The 1st row with value will have column values 1,0,0, the second row 0,1,0 and the third 0,0,1
FeatureIndex (BxF) Also used on categorical features, other than FeatureOneHot, it can also be applied to relatively high cardinality categorical features. It will transform each unique value in the input to an index. For instance, say we have a file with 4 rows and one column named 'Country'. The values of the rows are 'ES', 'GB', 'DE' and 'ES'. A FeatureIndex will do ES->1, GB->2, DE->3. So the rows in our file will turn into 1,2,3,1
FeatureBin (BxF) Turns a continuous feature (for instance an amount) into a categorical feature (an integer index). Will divide the total range of the base feature into slices and assign an integer to each slice
FeatureRatio (BxF) Calculates a ratio of 2 other numerical features. Takes 2 numerical features as input (a base and a denominator feature and divides the base by the denominator feature.
FeatureLabelBinary (BxF) Wrapper feature. This wraps a FeaturesSource of numerical type. It does not transform the feature, but tells the model which feature(s) contain the label(s) to target.
FeatureConcat (BxF) Concatenates 2 string features into a new string.
FeatureExpression (BxF) Feature that is built from an expression, a piece of code. The code is standard Pyhon code and can take other features as input parameters.
FeatureNormalizeScale (BxF) Normalize a float base Feature. This scales all values of the base Feature between 0 and 1. It uses the formula $$x_{scaled}={x-x_{min} \over x_{max}-x_{min}}$$
FeatureNormalizeStandard (BxF) Normalize a float base Feature. It centers the values around 0 with a standard-deviation of 1. Uses formula $$x_{standard}={x-\mu \over \sigma}$$
FeatureDateTimeFormat (BxF) Re-formats DateTime features. Used the python 'strftime' format codes. Can for instance be used to extract the day of week or day of month from a date.
FeatureSeriesStacked (BxSxF) Stack a set of features into a series. This creates a sort of sliding window over the samples. Each sample in the Batch dimension will contain the X previous samples. Say we have a files with 5 samples and 2 features (5x2). If we create a stacked series of size 3, then we'll get a 5x4x2 tensor. Each of the 5 samples contains the current the sample's 2 feature + the 2 features of the 3 previous samples.

Requirements

  • Numpy
  • Pandas
  • Numba

About

Base Package form the definition of various types of features

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages