Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe driven API for predictors #111

Merged
merged 66 commits into from Oct 6, 2019
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
ae83b7a
logistic regressor api changed, xrange 0.0.8 supported
gyrdym Sep 23, 2019
360073e
documentation for the new parameters for Logistic Regressor added
gyrdym Sep 23, 2019
dd74964
softmax regressor api changed
gyrdym Sep 23, 2019
57e1aa0
linear regressor constructor api changed to dataframe based
gyrdym Sep 24, 2019
e034d6f
default parameters added for predictor constructors
gyrdym Sep 24, 2019
5f8df23
knn regressor constructor api changed
gyrdym Sep 24, 2019
f8b3b0c
featuresTargetSplit returning typed changed, tests for it added
gyrdym Sep 25, 2019
554fc04
KNN regressor tests fixed
gyrdym Sep 25, 2019
1043a90
common assesable mixin added and supported
gyrdym Sep 28, 2019
bae3a26
cross validation unit tests fixed
gyrdym Sep 28, 2019
fc6f427
logistic regressor and knn regressor unit tests fixed
gyrdym Sep 28, 2019
eea40dc
WIP: classifiers constructor api changed
gyrdym Sep 28, 2019
451cda2
WIP: logistic regression integration tests fixed, factories for cost …
gyrdym Sep 29, 2019
baba419
logistic regressor unit tests fixed
gyrdym Sep 29, 2019
c8dba8a
linear classifier interface extended
gyrdym Sep 29, 2019
da5461f
link function passed from logistic regressor factory to logistic regr…
gyrdym Sep 29, 2019
4d94744
linear optimizer moved to top level of the project directory
gyrdym Sep 29, 2019
61f363d
decision tree solver moved to top level of the project directory
gyrdym Sep 29, 2019
c0d1a07
softmax regressor unit tests fixed
gyrdym Sep 29, 2019
551890e
logistic regressor unit tests reorganized
gyrdym Sep 29, 2019
a8441cc
logistic regressor unit tests: fitIntercept=true case added
gyrdym Sep 29, 2019
6d2f919
softmax regressor unit tests reorganized
gyrdym Sep 29, 2019
0349a3b
linear regressor factory reorganized
gyrdym Sep 29, 2019
f62bedc
linear regressor uni tests added
gyrdym Sep 30, 2019
d75e2e7
unit tests for gradient descent optimizer fixed
gyrdym Oct 1, 2019
62984da
travis dart sdk version changed to 2.3.0
gyrdym Oct 1, 2019
a13434c
travis dart sdk version changed to 2.5.0
gyrdym Oct 1, 2019
745e624
pubspec.yaml: sdk version updated, unit tests: tearDownAll added to s…
gyrdym Oct 1, 2019
847fcf7
tearDown behavior changed for gradient descent optimizer unit tests
gyrdym Oct 1, 2019
33ca82b
linear optimizer directory reorganized
gyrdym Oct 1, 2019
a7dc0de
tearDown behavior changed for unit tests
gyrdym Oct 1, 2019
4d61218
unit tests for softmax regressor predict method added
gyrdym Oct 1, 2019
580e1bf
initial_weights_* prefix renamed to initial_coefficients* in the nami…
gyrdym Oct 1, 2019
494cd81
predictor api changed, method 'predict': returning type changed to Da…
gyrdym Oct 1, 2019
f71dcb8
getting rid of Parameterless regressor
gyrdym Oct 1, 2019
9e6b8d2
predictor api changed, method 'predict': input parameter type changed…
gyrdym Oct 1, 2019
6951ad4
classifier and regressor folders reorganized
gyrdym Oct 1, 2019
ca8af90
classLabels field removed from Classifier interface
gyrdym Oct 1, 2019
880b9d8
predictProbabilies method returning type changed to DataFrame
gyrdym Oct 2, 2019
2936a5c
CostFunction: subDerevative method renamed to subGradient; unit tests…
gyrdym Oct 2, 2019
bf93f77
classifier test folder reorganized, unit test for LogisticRegressor.p…
gyrdym Oct 2, 2019
2f020be
LinearClassifier.predictProbabilities: input parameter type changed t…
gyrdym Oct 2, 2019
2dcb392
positive and negative labels parameters added to logistic and softmax…
gyrdym Oct 2, 2019
10c58a8
documentation for LogisticRegressor corrected and extended
gyrdym Oct 2, 2019
c6fe68c
softmax regressor documentation added nad extended
gyrdym Oct 3, 2019
11f8aaf
decision tree classifier documentation added
gyrdym Oct 3, 2019
a50ac39
documentation for knn regressor added
gyrdym Oct 4, 2019
d6448b2
documentation for linear regression corrected
gyrdym Oct 4, 2019
6739fed
dtype removed from constructors of all predictors
gyrdym Oct 4, 2019
b834bc7
regularization type added
gyrdym Oct 4, 2019
c78e1fc
tests for logistic regressor and softmax regressor refactored
gyrdym Oct 5, 2019
07727ca
ml_dataframe 0.0.10 supported, unit tests for LogisticRegressor and S…
gyrdym Oct 5, 2019
df3dc0a
linear regressor unit tests refactored and extended
gyrdym Oct 5, 2019
67439d7
dtype returned to constructor parameters for all predictors
gyrdym Oct 5, 2019
25bf0d8
README fixed
gyrdym Oct 5, 2019
b54e071
regularization type and linear optimizer type added to exports
gyrdym Oct 5, 2019
c1269be
coordinate descent optimizer returning matrix shape fixed
gyrdym Oct 5, 2019
2f9cc83
decision tree classifier added to export file
gyrdym Oct 5, 2019
cd76215
decision tree: greedy splitter logic fixed
gyrdym Oct 6, 2019
f9c8823
redundant infromation removed from README
gyrdym Oct 6, 2019
4f7b481
README info corrected
gyrdym Oct 6, 2019
46a6132
LinearOptimizerType enum changed, LearningRateType enum changed
gyrdym Oct 6, 2019
e2476bb
Merge branch 'master' of https://github.com/gyrdym/dart_ml into dataf…
gyrdym Oct 6, 2019
22c5476
changelog record added, version updated
gyrdym Oct 6, 2019
f8cb411
Documentation fixed; dtype parameter passed to decision tree classifier;
gyrdym Oct 6, 2019
28ce099
pubspec lock deleted from version control
gyrdym Oct 6, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,9 @@
# Changelog

## 13.0.0
- Predictor's API: `DataFrame` used instead of `Matrix`
- `DecisionTreeSolver`: data splitting logic fixed

## 12.1.2
- `xrange` package version locked

Expand Down
305 changes: 80 additions & 225 deletions README.md

Large diffs are not rendered by default.

23 changes: 11 additions & 12 deletions benchmark/cross_validator.dart
@@ -1,19 +1,16 @@
// 8.5 sec
import 'dart:async';

import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_dataframe/ml_dataframe.dart';
import 'package:ml_linalg/matrix.dart';
import 'package:ml_linalg/vector.dart';

const observationsNum = 1000;
const featuresNum = 20;
const columnsNum = 21;

class CrossValidatorBenchmark extends BenchmarkBase {
CrossValidatorBenchmark() : super('Cross validator benchmark');

Matrix features;
Matrix labels;
CrossValidator crossValidator;

static void main() {
Expand All @@ -22,18 +19,20 @@ class CrossValidatorBenchmark extends BenchmarkBase {

@override
void run() {
crossValidator.evaluate((trainFeatures, trainLabels) =>
ParameterlessRegressor.knn(trainFeatures, trainLabels, k: 7),
features, labels, MetricType.mape);
crossValidator.evaluate((trainSamples, targetFeatureNames) =>
KnnRegressor(trainSamples, targetFeatureNames.first, k: 7),
MetricType.mape);
}

@override
void setup() {
features = Matrix.fromRows(List.generate(observationsNum,
(i) => Vector.randomFilled(featuresNum)));
labels = Matrix.fromColumns([Vector.randomFilled(observationsNum)]);
final samples = Matrix.fromRows(List.generate(observationsNum,
(i) => Vector.randomFilled(columnsNum)));

final dataFrame = DataFrame.fromMatrix(samples);

crossValidator = CrossValidator.kFold(numberOfFolds: 5);
crossValidator = CrossValidator.kFold(dataFrame, ['col_20'],
numberOfFolds: 5);
}

void tearDown() {}
Expand Down
34 changes: 0 additions & 34 deletions benchmark/gradient_descent_regression.dart

This file was deleted.

24 changes: 15 additions & 9 deletions benchmark/knn_regression.dart → benchmark/knn_regressor.dart
@@ -1,22 +1,22 @@
// 5.7 sec
import 'dart:async';

// 10.0 sec (MacBook Air mid 2017)
import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_algo/src/regressor/knn_regressor_impl.dart';
import 'package:ml_dataframe/ml_dataframe.dart';
import 'package:ml_linalg/matrix.dart';
import 'package:ml_linalg/vector.dart';

const observationsNum = 500;
const featuresNum = 20;

class KnnRegressorBenchmark extends BenchmarkBase {
KnnRegressorBenchmark() : super('KNN regression benchmark');
KnnRegressorBenchmark() : super('Knn regression benchmark');

Matrix features;
Matrix testFeatures;
DataFrame testFeatures;
Matrix labels;
Matrix testLabels;
ParameterlessRegressor regressor;
KnnRegressor regressor;


static void main() {
Expand All @@ -34,11 +34,17 @@ class KnnRegressorBenchmark extends BenchmarkBase {
(i) => Vector.randomFilled(featuresNum)));
labels = Matrix.fromColumns([Vector.randomFilled(observationsNum * 2)]);

testFeatures = Matrix.fromRows(List.generate(observationsNum,
(i) => Vector.randomFilled(featuresNum)));
testFeatures = DataFrame.fromMatrix(
Matrix.fromRows(
List.generate(
observationsNum,
(i) => Vector.randomFilled(featuresNum),
),
),
);
testLabels = Matrix.fromColumns([Vector.randomFilled(observationsNum)]);

regressor = ParameterlessRegressor.knn(features, labels, k: 7);
regressor = KnnRegressorImpl(features, labels, 'target', k: 7);
}

void tearDown() {}
Expand Down
44 changes: 44 additions & 0 deletions benchmark/linear_regressor.dart
@@ -0,0 +1,44 @@
import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_dataframe/ml_dataframe.dart';
import 'package:ml_linalg/matrix.dart';
import 'package:ml_linalg/vector.dart';

const observationsNum = 200;
const featuresNum = 20;

class LinearRegressorBenchmark extends BenchmarkBase {
LinearRegressorBenchmark() : super('Linear regressor');

DataFrame fittingData;

static void main() {
LinearRegressorBenchmark().report();
}

@override
void run() {
LinearRegressor(fittingData, 'col_20');
}

@override
void setup() {
final features = Matrix.fromRows(List.generate(observationsNum,
(i) => Vector.randomFilled(featuresNum)));

final labels = Matrix.fromColumns([Vector.randomFilled(observationsNum)]);

fittingData = DataFrame.fromMatrix(
Matrix.fromColumns([
...features.columns,
...labels.columns,
]),
);
}

void tearDown() {}
}

Future main() async {
LinearRegressorBenchmark.main();
}
37 changes: 0 additions & 37 deletions benchmark/logistic_regression.dart

This file was deleted.

42 changes: 42 additions & 0 deletions benchmark/logistic_regressor.dart
@@ -0,0 +1,42 @@
import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_dataframe/ml_dataframe.dart';
import 'package:ml_linalg/matrix.dart';
import 'package:ml_linalg/vector.dart';

const observationsNum = 200;
const columnsNum = 21;

class LogisticRegressorBenchmark extends BenchmarkBase {
LogisticRegressorBenchmark() : super('Logistic regressor');

DataFrame _data;

static void main() {
LogisticRegressorBenchmark().report();
}

@override
void run() {
LogisticRegressor(
_data,
'col_20',
minCoefficientsUpdate: null,
iterationsLimit: 200,
);
}

@override
void setup() {
final Matrix observations = Matrix.fromRows(List.generate(observationsNum,
(i) => Vector.randomFilled(columnsNum)));

_data = DataFrame.fromMatrix(observations);
}

void tearDown() {}
}

Future main() async {
LogisticRegressorBenchmark.main();
}
14 changes: 6 additions & 8 deletions benchmark/main.dart
@@ -1,12 +1,10 @@
import 'dart:async';

import 'gradient_descent_regression.dart' as gradientDescentRegressionBenchmark;
import 'logistic_regression.dart' as logisticRegressionBenchmark;
import 'algorithms/knn.dart' as knnBenchmark;
import 'linear_regressor.dart' as gradient_descent_regression_benchmark;
import 'logistic_regressor.dart' as logistic_regression_benchmark;
import 'algorithms/knn.dart' as knn_regressor_benchmark;

Future main() async {
// (MacBook Air mid 2017)
await gradientDescentRegressionBenchmark.main(); // 0.07 sec
await logisticRegressionBenchmark.main(); // 0.12 sec
await knnBenchmark.main(); // 5 sec
await gradient_descent_regression_benchmark.main(); // 0.07 sec
await logistic_regression_benchmark.main(); // 0.12 sec
await knn_regressor_benchmark.main(); // 5 sec
}
29 changes: 10 additions & 19 deletions example/main.dart
@@ -1,29 +1,20 @@
import 'dart:async';

import 'package:ml_algo/ml_algo.dart';
import 'package:ml_linalg/matrix.dart';
import 'package:ml_dataframe/ml_dataframe.dart';

/// A simple usage example using synthetic data. To see more complex examples,
/// please, visit other directories in this folder
Future main() async {
// Let's create a feature matrix (a set of independent variables)
final features = Matrix.fromList([
[2.0, 3.0, 4.0, 5.0],
[12.0, 32.0, 1.0, 3.0],
[27.0, 3.0, 0.0, 59.0],
]);

// Let's create dependent variables vector. It will be used as `true` values
// to adjust regression coefficients
final labels = Matrix.fromList([
[4.3],
[3.5],
[2.1],
]);
// Let's create a dataframe with fitting data, let's assume, that the target
// column is the fifth column (column with index 4)
final dataFrame = DataFrame(<Iterable<num>>[
[ 2, 3, 4, 5, 4.3],
[12, 32, 1, 3, 3.5],
[27, 3, 0, 59, 2.1],
], headerExists: false);

// Let's create a regressor itself and train it
final regressor = LinearRegressor.gradient(
features, labels,
final regressor = LinearRegressor(
dataFrame, 'col_4',
iterationsLimit: 100,
initialLearningRate: 0.0005,
learningRateType: LearningRateType.constant);
Expand Down
11 changes: 7 additions & 4 deletions lib/ml_algo.dart
@@ -1,10 +1,13 @@
export 'package:ml_algo/src/algorithms/knn/kernel_type.dart';
export 'package:ml_algo/src/classifier/linear/logistic_regressor/logistic_regressor.dart';
export 'package:ml_algo/src/classifier/linear/softmax_regressor/softmax_regressor.dart';
export 'package:ml_algo/src/classifier/decision_tree_classifier.dart';
export 'package:ml_algo/src/classifier/logistic_regressor.dart';
export 'package:ml_algo/src/classifier/softmax_regressor.dart';
export 'package:ml_algo/src/linear_optimizer/gradient_optimizer/learning_rate_generator/learning_rate_type.dart';
export 'package:ml_algo/src/linear_optimizer/linear_optimizer_type.dart';
export 'package:ml_algo/src/linear_optimizer/regularization_type.dart';
export 'package:ml_algo/src/metric/classification/type.dart';
export 'package:ml_algo/src/metric/metric_type.dart';
export 'package:ml_algo/src/metric/regression/type.dart';
export 'package:ml_algo/src/model_selection/cross_validator/cross_validator.dart';
export 'package:ml_algo/src/regressor/knn_regressor.dart';
export 'package:ml_algo/src/regressor/linear_regressor.dart';
export 'package:ml_algo/src/regressor/parameterless_regressor.dart';
export 'package:ml_algo/src/solver/linear/gradient/learning_rate_generator/learning_rate_type.dart';