Skip to content

Commit

Permalink
Merge 7378928 into cef79bc
Browse files Browse the repository at this point in the history
  • Loading branch information
gyrdym committed Jun 21, 2020
2 parents cef79bc + 7378928 commit 980ef28
Show file tree
Hide file tree
Showing 8 changed files with 143 additions and 111 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,10 @@
# Changelog

## 14.0.0
- Breaking change:
- `CrossValidator`: `evalute` method's api changed, it returns a Future resolving with scores Vector now instead
of a double value

## 13.10.0
- `LinearRegressor`:
- `Default constructor`: `collectLearningData` parameter added
Expand Down
43 changes: 23 additions & 20 deletions README.md
Expand Up @@ -86,7 +86,7 @@ final targetColumnName = 'class variable (0 or 1)';
````

Then we should create an instance of `CrossValidator` class to fit [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning))
our model. We should pass training data (our `samples` variable), a list of target column names (in our case it's
of our model. We should pass training data (our `samples` variable), a list of target column names (in our case it's
just a name stored in `targetColumnName` variable) and a number of folds into CrossValidator constructor.

````dart
Expand All @@ -98,20 +98,24 @@ All are set, so we can do our classification.
Evaluate our model via accuracy metric:

````dart
final accuracy = validator.evaluate((samples, targetNames) =>
final scores = await validator.evaluate((samples, targetNames) =>
LogisticRegressor(
samples,
targetNames[0], // remember, we provided a list of just a single name
optimizerType: LinearOptimizerType.gradient,
initialLearningRate: .8,
iterationsLimit: 500,
batchSize: samples.rows.length,
fitIntercept: true,
interceptScale: .1,
learningRateType: LearningRateType.constant
optimizerType: LinearOptimizerType.gradient,
learningRateType: LearningRateType.decreasingAdaptive,
probabilityThreshold: 0.7,
randomSeed: 3,
), MetricType.accuracy);
````

Since the CrossValidator's instance returns a Vector of scores as a result of our predictor evaluation, we may choose
any way to reduce all the collected scores to a single number, for instance we may use Vector's `mean` method:

```dart
final accuracy = scores.mean();
```

Let's print the score:
````dart
print('accuracy on classification: ${accuracy.toStringAsFixed(2)}');
Expand All @@ -120,7 +124,7 @@ print('accuracy on classification: ${accuracy.toStringAsFixed(2)}');
We will see something like this:

````
acuracy on classification: 0.77
acuracy on classification: 0.65
````

All the code above all together:
Expand All @@ -134,18 +138,16 @@ Future main() async {
final samples = await fromCsv('datasets/pima_indians_diabetes_database.csv', headerExists: true);
final targetColumnName = 'class variable (0 or 1)';
final validator = CrossValidator.KFold(samples, [targetColumnName], numberOfFolds: 5);
final accuracy = validator.evaluate((samples, targetNames) =>
final scores = await validator.evaluate((samples, targetNames) =>
LogisticRegressor(
samples,
targetNames[0], // remember, we provide a list of just a single name
optimizerType: LinearOptimizerType.gradient,
initialLearningRate: .8,
iterationsLimit: 500,
batchSize: 768,
fitIntercept: true,
interceptScale: .1,
learningRateType: LearningRateType.constant
optimizerType: LinearOptimizerType.gradient,
learningRateType: LearningRateType.decreasingAdaptive,
probabilityThreshold: 0.7,
randomSeed: 3,
), MetricType.accuracy);
final accuracy = scores.mean();
print('accuracy on classification: ${accuracy.toStringFixed(2)}');
}
Expand Down Expand Up @@ -202,14 +204,15 @@ Let the `k` parameter be equal to `4`.
Assess a knn regressor with the chosen `k` value using MAPE metric

````dart
final error = validator.evaluate((samples, targetNames) =>
final scores = await validator.evaluate((samples, targetNames) =>
KnnRegressor(samples, targetNames[0], 4), MetricType.mape);
final averageError = scores.mean();
````

Let's print our error

````dart
print('MAPE error on k-fold validation: ${error.toStringAsFixed(2)}%'); // it yields approx. 6.18
print('MAPE error on k-fold validation: ${averageError.toStringAsFixed(2)}%'); // it yields approx. 6.18
````

### Contacts
Expand Down
@@ -0,0 +1,10 @@
class InvalidTestDataColumnsNumberException implements Exception {
InvalidTestDataColumnsNumberException(int expected, int received) :
message = 'Unexpected columns number in test data, '
'expected $expected, received ${received}';

final String message;

@override
String toString() => message;
}
@@ -0,0 +1,10 @@
class InvalidTrainDataColumnsNumberException implements Exception {
InvalidTrainDataColumnsNumberException(int expected, int received) :
message = 'Unexpected columns number in train data, '
'expected $expected, received ${received}';

final String message;

@override
String toString() => message;
}
36 changes: 20 additions & 16 deletions lib/src/model_selection/cross_validator/cross_validator.dart
Expand Up @@ -23,8 +23,8 @@ abstract class CrossValidator {
///
/// Parameters:
///
/// [samples] The whole training dataset to be split into parts to iteratively
/// evaluate given predictor on the each particular part
/// [samples] A dataset to be split into parts to iteratively evaluate given
/// predictor's performance
///
/// [targetColumnNames] Names of columns from [samples] that contain outcomes
///
Expand Down Expand Up @@ -57,8 +57,8 @@ abstract class CrossValidator {
///
/// Parameters:
///
/// [samples] The whole training dataset to be split into parts to iteratively
/// evaluate given model on the each particular part.
/// [samples] A dataset to be split into parts to iteratively
/// evaluate given predictor's performance
///
/// [targetColumnNames] Names of columns from [samples] that contain outcomes.
///
Expand All @@ -83,21 +83,21 @@ abstract class CrossValidator {
);
}

/// Returns a score of quality of passed predictor depending on given
/// [metricType]
/// Returns a future resolving with a vector of scores of quality of passed
/// predictor depending on given [metricType]
///
/// Parameters:
///
/// [predictorFactory] A factory function that returns a testing predictor
/// [predictorFactory] A factory function that returns an evaluating predictor
///
/// [metricType] Metric to assess a predictor, that is being created by
/// [metricType] Metric using to assess a predictor creating by
/// [predictorFactory]
///
/// [onDataSplit] A callback that is called when a new train-test split is
/// ready to be passed into evaluating predictor. One may place some
/// additional data-dependent logic here, e.g., data preprocessing. The
/// callback accepts train and test data from a new split and returns
/// transformed split as list, where the first element is training data and
/// transformed split as list, where the first element is train data and
/// the second one - test data, both of [DataFrame] type. This new transformed
/// split will be passed into the predictor.
///
Expand All @@ -115,26 +115,30 @@ abstract class CrossValidator {
/// header: header,
/// headerExists: false,
/// );
///
/// final predictorFactory = (trainData, _) =>
/// KnnRegressor(trainData, 'col_3', k: 4);
///
/// final onDataSplit = (trainData, testData) {
/// final standardizer = Standardizer(trainData);
/// return [
/// standardizer.process(trainData),
/// standardizer.process(testData),
/// ];
/// }
///
/// final validator = CrossValidator.kFold(data, ['col_3']);
/// final score = validator.evaluate(
/// final scores = await validator.evaluate(
/// predictorFactory,
/// MetricType.mape,
/// onDataSplit: onDataSplit,
/// );
/// final averageScore = scores.mean();
///
/// print(averageScore);
/// ````
double evaluate(PredictorFactory predictorFactory, MetricType metricType, {
DataPreprocessFn onDataSplit,
});
Future<Vector> evaluate(
PredictorFactory predictorFactory,
MetricType metricType,
{
DataPreprocessFn onDataSplit,
}
);
}
80 changes: 38 additions & 42 deletions lib/src/model_selection/cross_validator/cross_validator_impl.dart
@@ -1,3 +1,5 @@
import 'package:ml_algo/src/common/exception/invalid_test_data_columns_number_exception.dart';
import 'package:ml_algo/src/common/exception/invalid_train_data_columns_number_exception.dart';
import 'package:ml_algo/src/metric/metric_type.dart';
import 'package:ml_algo/src/model_selection/cross_validator/cross_validator.dart';
import 'package:ml_algo/src/model_selection/data_splitter/data_splitter.dart';
Expand All @@ -21,54 +23,48 @@ class CrossValidatorImpl implements CrossValidator {
final DataSplitter _splitter;

@override
double evaluate(PredictorFactory predictorFactory, MetricType metricType, {
DataPreprocessFn onDataSplit,
}) {
Future<Vector> evaluate(
PredictorFactory predictorFactory,
MetricType metricType,
{
DataPreprocessFn onDataSplit,
}
) {
final samplesAsMatrix = samples.toMatrix(dtype);
final sourceColumnsNum = samplesAsMatrix.columnsNum;

final discreteColumns = enumerate(samples.series)
.where((indexedSeries) => indexedSeries.value.isDiscrete)
.map((indexedSeries) => indexedSeries.index);

final allIndicesGroups = _splitter.split(samplesAsMatrix.rowsNum);
var score = 0.0;
var folds = 0;

for (final testRowsIndices in allIndicesGroups) {
final split = _makeSplit(testRowsIndices, discreteColumns);
final trainDataFrame = split[0];
final testDataFrame = split[1];

final splits = onDataSplit != null
? onDataSplit(trainDataFrame, testDataFrame)
: [trainDataFrame, testDataFrame];

final transformedTrainData = splits[0];
final transformedTestData = splits[1];

final transformedTrainDataColumnsNum = transformedTrainData.header.length;
final transformedTestDataColumnsNum = transformedTestData.header.length;

if (transformedTrainDataColumnsNum != sourceColumnsNum) {
throw Exception('Unexpected columns number in training data: '
'expected $sourceColumnsNum, received '
'${transformedTrainDataColumnsNum}');
}

if (transformedTestDataColumnsNum != sourceColumnsNum) {
throw Exception('Unexpected columns number in testing data: '
'expected $sourceColumnsNum, received '
'${transformedTestDataColumnsNum}');
}

score += predictorFactory(transformedTrainData, targetNames)
.assess(transformedTestData, targetNames, metricType);

folds++;
}

return score / folds;
final scores = allIndicesGroups
.map((testRowsIndices) {
final split = _makeSplit(testRowsIndices, discreteColumns);
final trainDataFrame = split[0];
final testDataFrame = split[1];
final splits = onDataSplit != null
? onDataSplit(trainDataFrame, testDataFrame)
: [trainDataFrame, testDataFrame];
final transformedTrainData = splits[0];
final transformedTestData = splits[1];
final transformedTrainDataColumnsNum = transformedTrainData.header.length;
final transformedTestDataColumnsNum = transformedTestData.header.length;

if (transformedTrainDataColumnsNum != sourceColumnsNum) {
throw InvalidTrainDataColumnsNumberException(sourceColumnsNum,
transformedTrainDataColumnsNum);
}

if (transformedTestDataColumnsNum != sourceColumnsNum) {
throw InvalidTestDataColumnsNumberException(sourceColumnsNum,
transformedTestDataColumnsNum);
}

return predictorFactory(transformedTrainData, targetNames)
.assess(transformedTestData, targetNames, metricType);
})
.toList();

return Future.value(Vector.fromList(scores, dtype: dtype));
}

List<DataFrame> _makeSplit(Iterable<int> testRowsIndices,
Expand Down
2 changes: 1 addition & 1 deletion pubspec.yaml
@@ -1,6 +1,6 @@
name: ml_algo
description: Machine learning algorithms written in native dart
version: 13.10.0
version: 14.0.0
homepage: https://github.com/gyrdym/ml_algo

environment:
Expand Down

0 comments on commit 980ef28

Please sign in to comment.