Skip to content

Commit

Permalink
README updated according to new entity - Standardizer
Browse files Browse the repository at this point in the history
  • Loading branch information
gyrdym committed Oct 10, 2019
1 parent ea0e408 commit 1435ce5
Showing 1 changed file with 23 additions and 2 deletions.
25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,25 @@ final transformed = normalizer.process(dataFrame);
Please, notice, if your data has raw categorical values, the normalization will fail as it requires only numerical
values. In this case you should encode data (e.g. using one-hot encoding) before normalization.

### Data standardization

A lot of machine learning algorithms require normally distributed data as their input. Normally distributed data
means that every dedicated to a feature column in the data has zero mean and unit variance. One may reach this
requirement using `Standardizer` class. During creation of the entity all the columns mean values and deviation values
are being extracted from the passed data and stored as fields of the class, in order to apply them to standardize the
other (or the same that was used for creation of the Standardizer) data:

````dart
final dataFrame = DataFrame([
[ 1, 2, 3],
[ 10, 20, 30],
[100, 200, 300],
], headerExists: false);
final standardizer = Standardizer(dataFrame);
final transformed = standardizer.process(dataFrame);
````

### Pipeline

There is a convenient way to organize a bunch of data preprocessing operations - `Pipeline`:
Expand All @@ -158,6 +177,7 @@ final pipeline = Pipeline(dataFrame, [
encodeAsOneHotLabels(featureNames: ['Gender', 'Age', 'City_Category']),
encodeAsIntegerLabels(featureNames: ['Stay_In_Current_City_Years', 'Marital_Status']),
normalize(),
standardize(),
]);
````

Expand All @@ -167,5 +187,6 @@ Once you create (or rather fit) a pipeline, you may use it farther in your appli
final processed = pipeline.process(dataFrame);
````

`encodeAsOneHotLabels`, `encodeAsIntegerLabels` and `normalize` are pipeable operator functions. Pipeable operator
function is a factory, that takes fitting data and creates a fitted pipeable entity (e.g., `Normalizer` instance)
`encodeAsOneHotLabels`, `encodeAsIntegerLabels`, `normalize` and `standardize` are pipeable operator functions.
Pipeable operator function is a factory, that takes fitting data and creates a fitted pipeable entity (e.g.,
`Normalizer` instance)

0 comments on commit 1435ce5

Please sign in to comment.