Dev to master for Marcel (#161) · starlibs/AILibs@279fb2b

Commit

Dev to master for Marcel (#161)

* Update README.md

* Updated Javadoc references

* Mlplan/bugfix/twophasehasco/wrongassert optimal solution (#128)

* Modified logging behavior of MCCV, commented out incorrect assert

* Removed .travis_push_javadoc.sh

* Updated dependency specification examples in readme

* Updated ML-Plan readme

* removed unnessary files from ML-Plan top level

* Update README.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Update readme.md

* Configured the projects correctly for mvn repository. (#129)

* Configured the projects correctly for mvn repository.

Moved resources to resource folder.
Relaxed python version check.
Included searchgraph stylesheet via resources.

* Added empty default values for ossrhUsername and ossrhPassword

* Removed signs from travis builds

* Added intteruptible meka and weka via maven central.

* Fixed typos in build.gradle files.

* Configured the projects correctly for mvn repository.

Moved resources to resource folder.
Relaxed python version check.
Included searchgraph stylesheet via resources.

* Added empty default values for ossrhUsername and ossrhPassword

* Removed signs from travis builds

* Added intteruptible meka and weka via maven central.

* Fixed typos in build.gradle files.

* Renaming search packages.

Fix for some resource loading issues.

Fix for some logging issues.

Fix for some search space configurations.

* Update readme.md

* Update README.md

* modified travis.yml and removed settings.gradle from ML-Plan

* added a method to WekaUtil that sorts the class attribute to the end

* Added some logging functionality for node evaluation

* returned gradle.properties

* Made KVStoreCollectionPartitionings public.

* Pr/ail 99 (#133)

* Adds first implementation of online normalization with early abandon for shapelet search, some unit tests and minor improvements

* Adds online min distance calculation, minor fixes and improvements, tests, ...

* Implements a parallizable matrix operation version of the full shapelet search for ShapeletTransform, unit tests

* Splits up distance interface in ScalarDistance and TimeSeriesDistance.

* Adds distance interface that takes timestamps into account.

* Adds skeletons for TODO distance measures.

* Adds scalar distance parameter to DynamicTimeWarping constructor.

* Adds Weighted Dynamic Time Warping.

* Adds utilities for scalar distances.

* Implements IDstanceWithTimestamps now.

* Adds utility function to create equidistant timestamps for a timeseries.

* make all nessary errorhandling

* Adds fixes and fixmes for LearnShapelets

* First running and equally (to better) performing LearnShapelets classifier version

* Overcome vanishing gradient problem

* Implements and tests MoveSplitMerge.

* Adds javadoc.

* Adds euclidean and L1 distance.

* Introduces time series complexity.

* Adds stretching complexity.

* Removes (wrong) comments.

* Implements Complexity-Invariant-Distance.

* Adds univariate dataset working with java arrays.

* make JUnit test for DFT

* add test for ztransform

* add more tests for DFT

* Splits utility classes.

* Switches classifier implementation to new dataset representation

* New dataset v3.

* ...

* Reverts, because jlenen has overwritten.

* reimplement ztransform

* Adds utitlity function for working with java arrays.

* Adjusts Shapelet Transform to raw Java arrays

* Adds simplified time series loader for new Java array representation time series dataset

* Adds simplified abstract time series classifier using the new Java array dataset representation

* Adds a class mapper and a new test environment for the simplified ts classifier

* Adds training routine for abstract simplified time series classifier

* Updates Shapelet Transform to the new simplified dataset representation

* Minor fixes and improvements

* Updates the LearnShapelets classifier to the new dataset representation

* Unit tests for LearnShapelets, refactoring and minor improvements

* Adds documentation for LearnShapelets implementation and refactoring

* Adds documentation and refactoring

* Adds shuffling and minor improvements

* Small optimizations

* Changes from INDArray to Java arrays.

* Moves utilities for scalar distances to tsc.util package.

* Adds/updates java package info.

* LearnShapelets performance optimization

* make new implementation for new dataset and JUnit test

* start SFA test

* First draft of Time Series Forest Classifier

* Deal with different feature type scales

* Changes number of instances to int.

* Adds first version of nearest neighbor.

* test and comment all relveant classes for the BOSS classifier

* Adds first runnable version of time series tree and unit tests

* Adds unit tests for Time Series Tree, fixes, optimization and refactoring

* Adds runnable version of TimeSeriesForest, tests and optmizations

* Fixes and improvements for Time Series Forest and Tree

* Adds a sparce transformed feature cache to Time Series Tree

* Refactoring and minor improvements

* Refactoring and documentation

* Adds Nearest Neighbor Algorihtm.

* Adds/Uses org.apache.commons.math3 to make gradle build work.

* Changes euclidean to squared distance.

* Adds toString method for time series.

* Fixes Dynamic Time Warping.

* Avoids NullPointerException.

* Simplifications.

* Adds Nearest Neigbor reference test.

* start testing z-transform implement NumerosityReduction

* finish Numerosity Reduction implement more needed functionality for the Boss Classifer

* add silding Window functionallity

* finish transform

* Adds parallelization support, uses timeouts, adds documentation

* Refactoring, integration of the timeout for cancelling the training algorithm, more documentation

* Refactoring

* Adds jaicore experimenter to perform reference tests, integrates timeout, refactoring and minor changes

* Integrates the choice between the CAWPE and the HIVE COTE ensembles

* Refactoring, documentation and minor improvements

* implement the Histogramm builder

* Adds gitigonre for data folder.

* Changes dataset filepath to be project relative.

* Removes unused imports.

* Fixes Time Wap Edit Distance.

* Adds documentation to ShapeletTransform classifier

* Disables usage of optimized minimum search by default due to slightly worse prediction quality

* Minor adjustments

* Fixes WDTW (to match reference implementation).

* Fixes TWED.

* Allows WDTW to use different lengths.

* Improved experiment scheme

* Fixes missing initialization part.

* Fixes test to use same amount of isntances for reference and test.

* Adds reference test for MoveSplitMerge.

* Fixes due to changed signature.

* Adds permormance test.

* Adds derivate helper functions.

* Makes derivate functions static.

* Adds function to enable / disable LearnShapeletsClassifier

* Adds DerivateTransformDistance.

* Removes DerivateDTW. In future use DerivateFilter and DTW.

* Removes TimeSeriesLengthException from distance function.

* Removes TimeSeriesLengthException from distance function.

* Moves NearestNeighbors to neighbors package.

* Use new dataset.

* Adds derivate Filter template.

* Implements NN weighted vote techniques. Improves NN testing.

* Adds memoization to weight calculation.

* Documents TWED test.

* Corrects folder path.

* Adds performance test.

* Adds sum, mean, variance and zTransform for time series.

* Adds first version of Shotgun Distance.

* Draft of TSBF classifier

* change filter interface so it can handle calls on a single instance not allways the whole dataset and therefore fix all other classes that implement that interface

* Tests and fixes statistical methods.

* Tests and fixes ShotgunDistance.

* implement BossClassifier and change and improve used methods like sliding window

* Adds feature generation and prepares random forest training

* Adds orobability discretization, histogram and feature representation functions for TSBF

* Adds some utility functions to assess the given classes in the simplified TimeSeriesDataset

* Adds out of bag probabilitiy estimation for random forest classifiers (TBSF) and uses class number evaluation functions in classifiers

* Adds getter and setter for the matrices of the simplified TimeSeriesDataset

* Adds shuffling of TimeSeriesDataset objects

* Adds sampling reference test template, some minor improvements

* Fixes and improvements

* Adds unit tests, bug fixes, improvements, some refactoring

* Fixes TSBF bugs and minor improvements

* Fixes a bug when using number of folds not dividing the number of generated probabilities without a remainder

* Adds a robustness check for equal time series and min subsequence length case

* write predict method for BossClassifier for the univirat case and rework histogram builder

* Improves documentation and unit tests. Some minor fixes.

* Removes unused p parameter for the p-norm.

* Adds javadoc.

* Removes unused p parameter-

* Adds predict function with timestamps.

* Changes DerivateTransformDistance name to DerivateDistance.

* change FIlter Interface

* change name of methods in filter interface

* update Filter Interface with new method for matrix and change names

* remove unneeded Filter Exception

* update filter change name and update SAX

* rename fitTransformInstance to fitTransform

* Adds derivate filters.

* Renames fitTransformInstance to fitTransform.

* Adds conf as resource directory

* Adds Derivate as parameter.

* update all filters for the new Interface

* Adds derivate filters.

* Adds transform filters.

* Parametrizes derivate.

* Adds Derivate Transform Distance.

* Adds Transform Distance.

* write the remaining filter changes for silding window and SFA and implement sketch for Boss Essamble Classifer and added mean normalization by dropping the first DFT coefficient to the DFT class

* comment everything error correction and reading into recursiv DFT coefficent calculation

* comment classes overall

* Starts implementation of pipeline.

* Creates own package for pipeline.

* implement recursive DFT

* Fixes input to comform to specified test case.

* Adds test case.

* Changes second correctness test case.

* Adds setter, getter and more constructors.

* Adds test suite for TransformDistance.

* Comment classes overall and test Ztransform and sliding window builder

* Fixes javadoc.

* Adds constructors and javadoc.

* Minor cosmetic fixes.

* Adds javadoc, setters, getters and constructors.

* Implements test suite for DTD.

* ST, LS, TSF, TSBF classifier implementation (#1)

* First draft of the Learn Shapelets classifier, functional implementation of its training algorithm

* Minor LS classifier improvements

* Adjusts the classifier to the new class structure

* Refactoring

* Implements prediction, some improvements

* Implements the kMeans initialization of the S parameter matrix, some minor improvements, extends TimeSeriesUtil function by convenience functions dealing with INDArrays and Weka instances and INDArray normalization

* Adds first unoptimized implementation of the Shapelet Transform classifier

* Adds ensemble and some minor improvements

* Adds conversion utilities for Weka instances and TimeSeriesDataset and INDArrays

* Implements estimateMinMax function

* Adds Shapelet clustering

* Adds a conversion of a time series instance to a Weka instance

* Small ts dataset API fix

* Integrates new ts dataset implementation, some minor changes

* Integrates weka ensemble training

* Logging, refactoring, training and adds missing rotation forest classifier

* TimeSeriesDataset related fixes and improvements, unit tests, ...

* Refactoring and optimizations of the test environment

* Adds unit tests, bugfixes and refactoring for Shapelet Transform (first stable version)

* Shapelet transform optimization

* Adds first implementation of online normalization with early abandon for shapelet search, some unit tests and minor improvements

* Adds online min distance calculation, minor fixes and improvements, tests, ...

* Implements a parallizable matrix operation version of the full shapelet search for ShapeletTransform, unit tests

* Adds fixes and fixmes for LearnShapelets

* First running and equally (to better) performing LearnShapelets classifier version

* Overcome vanishing gradient problem

* Switches classifier implementation to new dataset representation

* Adjusts Shapelet Transform to raw Java arrays

* Updates Shapelet Transform to the new simplified dataset representation

* Minor fixes and improvements

* Updates the LearnShapelets classifier to the new dataset representation

* Unit tests for LearnShapelets, refactoring and minor improvements

* Adds documentation for LearnShapelets implementation and refactoring

* Adds documentation and refactoring

* Adds shuffling and minor improvements

* Small optimizations

* LearnShapelets performance optimization

* First draft of Time Series Forest Classifier

* Deal with different feature type scales

* Adds first runnable version of time series tree and unit tests

* Adds unit tests for Time Series Tree, fixes, optimization and refactoring

* Adds runnable version of TimeSeriesForest, tests and optmizations

* Fixes and improvements for Time Series Forest and Tree

* Adds a sparce transformed feature cache to Time Series Tree

* Refactoring and minor improvements

* Refactoring and documentation

* Adds parallelization support, uses timeouts, adds documentation

* Refactoring, integration of the timeout for cancelling the training algorithm, more documentation

* Refactoring

* Adds jaicore experimenter to perform reference tests, integrates timeout, refactoring and minor changes

* Integrates the choice between the CAWPE and the HIVE COTE ensembles

* Refactoring, documentation and minor improvements

* Adds documentation to ShapeletTransform classifier

* Disables usage of optimized minimum search by default due to slightly worse prediction quality

* Improved experiment scheme

* Adds function to enable / disable LearnShapeletsClassifier

* Draft of TSBF classifier

* Adds feature generation and prepares random forest training

* Adds orobability discretization, histogram and feature representation functions for TSBF

* Adds some utility functions to assess the given classes in the simplified TimeSeriesDataset

* Adds out of bag probabilitiy estimation for random forest classifiers (TBSF) and uses class number evaluation functions in classifiers

* Adds getter and setter for the matrices of the simplified TimeSeriesDataset

* Adds shuffling of TimeSeriesDataset objects

* Adds sampling reference test template, some minor improvements

* Fixes and improvements

* Adds unit tests, bug fixes, improvements, some refactoring

* Fixes TSBF bugs and minor improvements

* Fixes a bug when using number of folds not dividing the number of generated probabilities without a remainder

* Adds a robustness check for equal time series and min subsequence length case

* Refactoring

* Removes sysouts

* Minor changes and improvements for robustness and code quality

* Adds reference test cases compared to parameter search of reference classifier

* Fixes bugs in TSBF classifier and training algorithm implementation

* Fixes an issue concerning the comparability of ref and own classifier evaluation results

* Refactoring

* Refines the package structure

* Adds TimeSeriesDataset creation utils

* Refactoring and documentation

* Adds documentation, minor refactoring and improvements

* Adds more TSBF tests and documentation

* Adds robustness checks

* Adds reference tests

* Fixes small result saving bug

* Adds missing documentation and robustness checks in SimplifiedTimeSeriesLoader

* Fixes a reference test bug for TSBF

* Adds a trained flag to the classifiers

* Refactors the search strategies of Shapelet Transform to separate classes according to the Strategy design pattern

* Learn Shapelets refactoring

* Adds refactoring for consistency and robustness

* Adds command line argument reading for cluster tests

* Refactoring, robustness checks, more tests and documentation

* Removes static fit and transform.

* LaTeX result evaluator (#2)

* Adds first version of creating a LaTeX result table out of the performed tests

* Adds functionality to result collector to mark best implements, some output optimization and refactoring, documentation

* Small fixes with regard to test execution

* Adds a small fix to Shapelet Transform for the case that the parameterized maximum shapelet length is higher than the total time series length

* everything unit testet and repaired

* test BOSS Algo and Classifier

* fix constructor

* Learn Pattern Similarity Classifier (#3)

* Adds first LPS implementation draft

* Adds documentation and refactoring, minor improvements

* Adds unit tests and documentation

* Adds more documentation

* Adds LPS experiments environment

* Fixes LPS experiment runner issues

* Removes some LPS ToDos

* Adds LPS to result collector

* add test for BossClassifier

* Removes duplicated classes

* Optimization package for TSF, TSBF, LPS, LS and ST (#4)

* Refactored tests, optimization and some minor fixes

* More refactoring

* Adds a case in the Majority Confice Voter to avoid zero weights

* Removes duplicated, old files

* Add Test that does not work

* fix gradle and fix BOSS classifier bugs and make performance tests

* Removes wrong javadoc.

* Adds complexity as described in CID paper.

* Removes unused import.

* Test cases for squared backward difference complexity.

* New reference tests.

* Adds data to gitignore (for symlinks).

* Optimization for classifiers and other classes (#5)

* Refactored tests, optimization and some minor fixes

* More refactoring

* Adds a case in the Majority Confice Voter to avoid zero weights

* Removes duplicated, old files

* Adds SGD with momentum training, some LS minor optimizations

* Fixes small issues and enhances interfaces, refactoring

* Adds missing rotation forest dependency to lib

* Adds information comment on reference tests

* realtiv path

* wirte tests

* link to paper and remove unused classes

* decomment compile file

* Adds Shotgun Ensemble Classifier.

* Adds javadoc.

* Fixes bug that calculates eps eventho std is 0.

* Adds testss for ShotgunEnsemble.

* Adds evaluation util.

* Improves ref tests.

* Improves javadoc.

* Adds Abandonable interface.

* Removed old LoggerUtil classes

* removed unproper code files and reset outdated config files

* reset TwoPhaseHASCO

* Updated the whole Dataset model

* Cleaned up jaicore-ml project

* Resolved compile errors popped up in the CI

* removed tsctestenv folder

* Removed tsctestenv from settings.gradle

* Pr/ail 100 (#134)

* Range query prediction in the augmented space (#107)

* Adds the stolen regression datasets

* removes the data sets

* Adds the stolen regression datasets

* removes the data sets

* Adds the stolen regression datasets

* Adds the stolen regression datasets

* removes the data sets

* removes the data sets

* Adds range query prediction project

* - Deletes beginnings of range trees, as they will not be needed anymore

* - Adds first augmented space samplers for range query prediction in the augmented space.

* - Adds exact interval and exact kNN samplers.

* - Adds approximate kNN sampling

* Adds the stolen regression datasets

* removes the data sets

* Adds the stolen regression datasets

* removes the data sets

* Adds the stolen regression datasets

* removes the data sets

* Adds the stolen regression datasets

* removes the data sets

* Adds range query prediction project

* - Deletes beginnings of range trees, as they will not be needed anymore

* - Adds first augmented space samplers for range query prediction in the augmented space.

* - Adds exact interval and exact kNN samplers.

* - Adds approximate kNN sampling

* - Augmented space RQP experiments added

* - Experiment config files modified for new datasets

* - Changes to experiments to accomodate new data

* - Deleted approximate kNN sampler
- Renamed kNN exact to kNN sampler

* - Adds all pairs sampling function

* - All pairs experiment added

* - bug fixes

* - Changed wrong_order to be fraction instead of absolute number
- Removed GP from experiment configs

* - Followup experiments for RQP in the augmented space

* Moved range query prediction by Michael to correct folder

* Feature/latextables4datasets (#136)

* Added a generator for latex table that survey dataset properties

* Removed ECML/DatasetTableGenerator since this is project specific

* Dyad Ranking Code (#108) (#109)

* - Option to use mini batches added to plNet.
- Removed shuffling of the training data in the plNet training method as it constituted a likely unwanted (and undocumented) side effect on the training data outside of the class.

* NDCG score added

* fixes 2000 patterns

* Adds the DyadDatasetGenerator that was used for first evaluations of the PLNet

* Limits the number of threads for ND4L backend to 4

* Adds even more calls to limit the use of threads, apparently it does not work though

* Minor cleanup

* Removed TSP instances and other data that was not needed

* Work on equals methods in DyadRankingInstance and SparseDyadRankingInstance, serialization and deserialization of DyadRankingDatasets incl. unit test

* Adjustments to NDCG

* Cleaned up commented-out lines in plnet

* Removed DyadDatasetGenerator from repo

* Removed unneeded shuffleData parameter.

* Work on standard scaler, metamining unit test

* Add javadoc to the DyadStandardScaler, clean up unit tests, add config to unit tests to make them run faster

* Adds doc to the DyadRankingDatasetSerializationTest

* new weka util; new weka labels; readds root node

* Fix ontology connector & add test for kernel functions

* Clean up tests, DyadRankerGATSPTest now uses the DyadStandardScaler instead standardizing the dataset manually

* Fix WEKAPipelineCharacterizer Treeminer imports

* Minor doc for DyadRankingDataset serialization and deserialization

* Adds doc to DyadRankerMetaminingTest

* updates the patterns

* Trim dyad rankings before standardizing them in DyadRankerGATSPTest to avoid data snooping

* Adds methods for printing STD and Mean in DyadStandardScaler

* Minor typos fixed.
Implemented some unimplemented DyadRankingDataset utility methods (or made them throw UnsupportedOperationExceptions)

* Adds standford wrapper, precomputes bilin ft; unifies the implementation with the reference implementation

* Deletes DyadRankingDatasetGenerator

 as this is part of the experiment project

* fixes test

* Deserialization of DyadRankingDatasets now uses the Java String split method instead of bouncycastle util

* dependency fixes

* froll back wekautil

* Critical bugifx in ComponentInstanceDeserializer: The component field was not parsed correctly, instead we now use the component loader to parse this field

* Remove compile errors after rebase

* Removes ClassifierInputOptTest (moved to experiments project)
Adds unscaling method to zeroshotutil

* Fix for unscaling util method

* Adds option to use linear learning rate decay for PLNetInputOptimizer

* Bugfix/plnet (#12)

* Fixses class cast bug in PLNet

* DyadRankingDataset now uses LineIterator for deserialization

* new scalers + bugfixes in plnet

* PLNet bugfix

* Feature/scalers (#13)

* new scalers + bugfixes in plnet

* PLNet bugfix

* Start work on active learning interfaces

* Work on DyadRankingDatasetPoolProvider and active learning interfaces

* Work on active ranking

* Work on active dyad ranking with metamining data

* Work on active ranking

* Various changes and bug fixes

* Bug fix in DyadDatasetPoolProvider

* Adds minibatch active learning

* Work on minibatch active learning

* Adds functionality for ignoring specified attributes when transformations are performed to the dyad scalers

* Intermediate commit

* Feature/scalers (#15)

* new scalers + bugfixes in plnet

* PLNet bugfix

* Adds functionality for ignoring specified attributes when transformations are performed to the dyad scalers

* renames dyadnormalscaler

* Adds michaels scaler and adapts it to the interfaces;
Lacks unit tests.

* basic junit test

* fixes unit test

* Fix class cast bug

* Move files

* Move tests

* Rework PLNetDyadRanker

* Small change to the predict method

* Work on Active Dyad Ranking Tests

* hotfixes the scalers, sorry messed up the merge

* * Adds streaming API support for Vectors
* Adds javadoc to all algorithms with papaer references
* Adds a gradient function that can estimate the partial derivatives of a function by using the partial difference quotient
* Adds gradient descent as an optimizer algorithm
* Fixes bugs in the implementation of the Bilin-Approach (mostly related to the indices of the sums)
* Changes the notation of the algorithms to be compliant with the paper they were taken from
* Changes the AdvancedDyadTest to a model that the bilinear model can actually learn
* Moves some packages

* * Log cleanup
* Adds log-likelihood method for debugging purposes
* Adds the black-box gradient method as a backup-plan for the gradient in the bilinPL (if it yields NaN)

* Gives gradient descent an own config;
Moves some packages (sorry in advance)
removes debug logs

* Fixes helenas remarks

* Start on implementation of input optimizer.

* Adds first functional implementation of input optimizer

* - Parameters added to input optimizer
- Mask parameter to allow optimizations of only a selection of input parameters
- Test moved from main method to actual JUnit test

* Input optimizer made static

* Small adjustment

* Small fix

* Adds stantandardization to PLNetDyadRanker

* Implemented Kendall's tau for general dyad rankings.

* Standardize instance features in GATSPTest

* Adds the REAL orderings of the dyads and adjusts the DyadRankerGATSPTest to work with these

* Add loss utility and DyadDatasetGenerator

* Cleaned up various log calls and commented out lines left over from debugging.

* Some cleanup and the DyadRankerGATSPTest now works with both DyadRankers implemented so far

* Adds a very simple serialization for DyadRankingDatasets

* Implemented option for plnet to retrain after early stopping.
Made shuffling of the data set before training of plnet use the fixed seed from the configuration.

* Adds even more calls to limit the use of threads, apparently it does not work though

* Add javadoc to the DyadStandardScaler, clean up unit tests, add config to unit tests to make them run faster

* Adds standford wrapper, precomputes bilin ft; unifies the implementation with the reference implementation

* Post-merge fixes

* Start on random forest performance sample generation.

* Finished random forest performance sampler.

* Moves RF performance sampler to experimetnts project.
Adds pretty print method to PLNet.

* Adds normal scaler for dyad datasets
Adds input optimizer prototype
Modifications to PLNet: adam updater and sigmoid_uniform weight init

* Added KKT conditions to input optimizer.

* Modified dyad ranking data set deserialization method to read file line by line for vastly improved efficiency on large files.

* Created zeroshot package
Added single dyad untransform to normal scaler
Added util methods for zeroshot for mapping to weka classifier options

* Adds getters for normal scaler statistics.
Adds InputOptListener to allow tracking of input optimization process.
Adds rounding to J48 input mapper.

* Fixes to hyper parameter mapping.
Renames J48InputOptTest to ClassifierInputOptTest

* Removes ClassifierInputOptTest (moved to experiments project)
Adds unscaling method to zeroshotutil

* Fix for unscaling util method

* Adds option to use linear learning rate decay for PLNetInputOptimizer

* Post rebase fixes

* Post rebase fixes

* Adds LandmarkerCharacterizer

* new losses for alexander

* implementation according to paper

* Work on active dyad ranker

* adds test for TopKKendallsTauLoss

* intermeediate commit

* Work on UCB active ranker

* Adds method to get number of epochs used.

* Work on tracking queried pairs

* Fixes k-ktau for first 3 cases

* Work on UCB sampling

* fixes!

* removes averaging

* Adds the last case to the KendallsTauOfTopKTest

* Small change to comments

* Removes crucial bug in RandomPoolBasedActiveDyadRanker

* Intermediate commit

* Modifies PLNet to use INDArrays

* Work on new experiments

* Adapt DyadRankedNodeQueue to new repository structure.

* Add additional feature functions

* Rename

- getLength Method for IPipelinesCharacterizer (typo)
- ManualPatternMiner to ComponentInstanceVectorFeatureGenerator

* Feature/scalers (#18)

* Fixes division by zero bugs in DyadStandardScaler and DyadMinMaxScaler and adds the corresponding test case to the DyadScalerTest

* Refactoring of scalers. Now single DyadRankingInstances and SingleDyads can be transformed

* Intermediate commit

* Adapt DyadRankedOPENList to the given search space

* Work on metamining test

* Modifies PLNet to use INDArrays

* Minor fix in PLNetDyadRanker

* Minor fix to PLNetDyadRanker

* prototype implementation

* dependency fixes

* some bug fixes

* some bug fixes;
some test data

* Add missing landmarkers

* adds landmarking

* fixes rebase

* Adds nodeevaluator config

* Adds the Patternminer (approach 1)

* moves the dyad ranking based evaluator
adjustmens to the new interfaces
adds solution reporting
extends landmarking (still a lot to do, though)

* adds caching of the squared sums

* Fixes bugs in the dyadranker

* big fixes in dyadrankingbasednodeEval

* start working on no evaluations mlplan

* Adds scaler to node evaluator ; Adds fvalue event for evals; Fixes bugs in the scaler;

* Work on DyadRankedOPENList - should do something now

* Adds methods for untransforming with a given precision to the DyadMinMaxScaler and adds class for interval overlap based selective sampling

* Work on new sampling strategies

* Adds check for validity of potential new incumbent to input optimizer

* Readds test for input optimizer

* Add current status of approach

* Fix minor mistakes for experimenter

* Add compilation task to this project instead of experiments project

* Remove partialpipelinesexperiment

* Adapt ZeroShotUtil and PLNetInputOptimizer

* Changes in UCB Sampling and so on

* Fix bug for non-normalized ranking in evaluations

* minor code smell fixes

* Small adjustment in comment: SGD -> Adam

* Turns unit test for PLNetInputOptimizer into actual unit test.

* Work on some doc

* Post-merge fix

* Adds jaicore-math to jaicore-ml build.gradle.
Deletes some deprecated classes.

* Intermediate commit

* Work on new active learning metamining test

* Adds unit test for DyadDatasetPoolProvider

* Adds utility functions to Dyad, DyadRankingInstance and SparseDyadRankingInstance for converting them to INDArrays. Additionally reworks the update(Set<IInstance>) method for the PLNetDyadRanker

* new properties

* Intermediate commit

* some fixes, but still not running...

* Fixes remaining minorcompile errors after merge

* Adapt configs to new resource structure; move resources

* Remove Metaminer stuff that doesn't belong & update package descriptors

* Delete lds from this repo since it belongs to the meta mining stuff

* Update standard PLNet for DyadRankedOpenList, add models

* Remove conf files that are now in experiments repository

* Refactoring, documentation

* Removing code smell

* Adds some documentation and removes some code smell

* Removes wrong searchspace files

* Code improvements

* Dyad Ranking Code (#108)

* - Major PLNet bugs fixed
- slight tweaks to dyad ranker tests

* Fixes the ComponentLoader;
The Componentloader was designed to only parse components that used the pre & post condition format from weka,
however Componentinstances that already used the premise & conclusions format could not be parsed.
This commit extends the component loader with such methods, while also fixing (json)-bugs in the ParameterDomain classes.

* updates ontology

* Updates the weka_labels to the new ontology;
Adds the precomputed patterns as file
Adds methods to load the wekaPipeLineCharacterizer from the precomputed patterns

* Some documentation, refactoring of the PLNetDyadRanker. Prints are now done by logging. Also adds a default plnet.properties file. Small change to the SimpleDyadDatasetDyadRankerTester to make it work with the new contructors.

* Start working on dyad ranker test based on Dirk Schaefers GA TSP dataset

* Work on DyadRankerGATSPTest, add constructors to DyadRankingDataset, change equals method of Dyad such that it compares the instance and the alternative instead of object references

* Small bug fix in the IPLNetDyadRankerConfiguration

* Remove unused imports

* Minor changes

* Small reduction in code redundancy and readability improvements.

* Fixed option to not use early stopping when K_EARLY_STOPPING_TRAIN_RATIO is set to 1.0

* Bug concerning persistent model saving fixed.

* Some documentation / paper reference

* Small adjustment

* Small fix

* new unit tests that test the json deserialization of the component loader and the deserialization /serialization of the patterns in the tree miner

* * Adds streaming API support for Vectors
* Adds javadoc to all algorithms with papaer references
* Adds a gradient function that can estimate the partial derivatives of a function by using the partial difference quotient
* Adds gradient descent as an optimizer algorithm
* Fixes bugs in the implementation of the Bilin-Approach (mostly related to the indices of the sums)
* Changes the notation of the algorithms to be compliant with the paper they were taken from
* Changes the AdvancedDyadTest to a model that the bilinear model can actually learn
* Moves some packages

* * Log cleanup
* Adds log-likelihood method for debugging purposes
* Adds the black-box gradient method as a backup-plan for the gradient in the bilinPL (if it yields NaN)

* Gives gradient descent an own config;
Moves some packages (sorry in advance)
removes debug logs

* Fixes helenas remarks

* Implemented Kendall's tau for general dyad rankings.

* Adds stantandardization to PLNetDyadRanker

* small fix regarding standardization

* Standardize instance features in GATSPTest

* Fix test

* Adds the REAL orderings of the dyads and adjusts the DyadRankerGATSPTest to work with these

* Add loss utility and DyadDatasetGenerator

* Cleaned up various log calls and commented out lines left over from debugging.

* Train and test data in DyadDatasetGenerator are now distinct

* added 2000 support patterns

* Some cleanup and the DyadRankerGATSPTest now works with both DyadRankers implemented so far

* Adds a very simple serialization for DyadRankingDatasets

* Start work on DyadRankerMetaminingTest

* Implemented option for plnet to retrain after early stopping.
Made shuffling of the data set before training of plnet use the fixed seed from the configuration.

* - Option to use mini batches added to plNet.
- Removed shuffling of the training data in the plNet training method as it constituted a likely unwanted (and undocumented) side effect on the training data outside of the class.

* NDCG score added

* fixes 2000 patterns

* Adds the DyadDatasetGenerator that was used for first evaluations of the PLNet

* Limits the number of threads for ND4L backend to 4

* Adds even more calls to limit the use of threads, apparently it does not work though

* Minor cleanup

* Removed TSP instances and other data that was not needed

* Work on equals methods in DyadRankingInstance and SparseDyadRankingInstance, serialization and deserialization of DyadRankingDatasets incl. unit test

* Adjustments to NDCG

* Cleaned up commented-out lines in plnet

* Removed DyadDatasetGenerator from repo

* Removed unneeded shuffleData parameter.

* Work on standard scaler, metamining unit test

* Add javadoc to the DyadStandardScaler, clean up unit tests, add config to unit tests to make them run faster

* Adds doc to the DyadRankingDatasetSerializationTest

* new weka util; new weka labels; readds root node

* Fix ontology connector & add test for kernel functions

* Clean up tests, DyadRankerGATSPTest now uses the DyadStandardScaler instead standardizing the dataset manually

* Fix WEKAPipelineCharacterizer Treeminer imports

* Minor doc for DyadRankingDataset serialization and deserialization

* Adds doc to DyadRankerMetaminingTest

* updates the patterns

* Trim dyad rankings before standardizing them in DyadRankerGATSPTest to avoid data snooping

* Adds methods for printing STD and Mean in DyadStandardScaler

* Minor typos fixed.
Implemented some unimplemented DyadRankingDataset utility methods (or made them throw UnsupportedOperationExceptions)

* Adds standford wrapper, precomputes bilin ft; unifies the implementation with the reference implementation

* Deletes DyadRankingDatasetGenerator

 as this is part of the experiment project

* fixes test

* Deserialization of DyadRankingDatasets now uses the Java String split method instead of bouncycastle util

* dependency fixes

* froll back wekautil

* Critical bugifx in ComponentInstanceDeserializer: The component field was not parsed correctly, instead we now use the component loader to parse this field

* Remove compile errors after rebase

* Removes ClassifierInputOptTest (moved to experiments project)
Adds unscaling method to zeroshotutil

* Fix for unscaling util method

* Adds option to use linear learning rate decay for PLNetInputOptimizer

* Bugfix/plnet (#12)

* Fixses class cast bug in PLNet

* DyadRankingDataset now uses LineIterator for deserialization

* new scalers + bugfixes in plnet

* PLNet bugfix

* Feature/scalers (#13)

* new scalers + bugfixes in plnet

* PLNet bugfix

* Start work on active learning interfaces

* Work on DyadRankingDatasetPoolProvider and active learning interfaces

* Work on active ranking

* Work on active dyad ranking with metamining data

* Work on active ranking

* Various changes and bug fixes

* Bug fix in DyadDatasetPoolProvider

* Adds minibatch active learning

* Work on minibatch active learning

* Adds functionality for ignoring specified attributes when transformations are performed to the dyad scalers

* Intermediate commit

* Feature/scalers (#15)

* new scalers + bugfixes in plnet

* PLNet bugfix

* Adds functionality for ignoring specified attributes when transformations are performed to the dyad scalers

* renames dyadnormalscaler

* Adds michaels scaler and adapts it to the interfaces;
Lacks unit tests.

* basic junit test

* fixes unit test

* Fix class cast bug

* Move files

* Move tests

* Rework PLNetDyadRanker

* Small change to the predict method

* Work on Active Dyad Ranking Tests

* hotfixes the scalers, sorry messed up the merge

* * Adds streaming API support for Vectors
* Adds javadoc to all algorithms with papaer references
* Adds a gradient function that can estimate the partial derivatives of a function by using the partial difference quotient
* Adds gradient descent as an optimizer algorithm
* Fixes bugs in the implementation of the Bilin-Approach (mostly related to the indices of the sums)
* Changes the notation of the algorithms to be compliant with the paper they were taken from
* Changes the AdvancedDyadTest to a model that the bilinear model can actually learn
* Moves some packages

* * Log cleanup
* Adds log-likelihood method for debugging purposes
* Adds the black-box gradient method as a backup-plan for the gradient in the bilinPL (if it yields NaN)

* Gives gradient descent an own config;
Moves some packages (sorry in advance)
removes debug logs

* Fixes helenas remarks

* Start on implementation of input optimizer.

* Adds first functional implementation of input optimizer

* - Parameters added to input optimizer
- Mask parameter to allow optimizations of only a selection of input parameters
- Test moved from main method to actual JUnit test

* Input optimizer made static

* Small adjustment

* Small fix

* Adds stantandardization to PLNetDyadRanker

* Implemented Kendall's tau for general dyad rankings.

* Standardize instance features in GATSPTest

* Adds the REAL orderings of the dyads and adjusts the DyadRankerGATSPTest to work with these

* Add loss utility and DyadDatasetGenerator

* Cleaned up various log calls and commented out lines left over from debugging.

* Some cleanup and the DyadRankerGATSPTest now works with both DyadRankers implemented so far

* Adds a very simple serialization for DyadRankingDatasets

* Implemented option for plnet to retrain after early stopping.
Made shuffling of the data set before training of plnet use the fixed seed from the configuration.

* Adds even more calls to limit the use of threads, apparently it does not work though

* Add javadoc to the DyadStandardScaler, clean up unit tests, add config to unit tests to make them run faster

* Adds standford wrapper, precomputes bilin ft; unifies the implementation with the reference implementation

* Post-merge fixes

* Start on random forest performance sample generation.

* Finished random forest performance sampler.

* Moves RF performance sampler to experimetnts project.
Adds pretty print method to PLNet.

* Adds normal scaler for dyad datasets
Adds input optimizer prototype
Modifications to PLNet: adam updater and sigmoid_uniform weight init

* Added KKT conditions to input optimizer.

* Modified dyad ranking data set deserialization method to read file line by line for vastly improved efficiency on large files.

* Created zeroshot package
Added single dyad untransform to normal scaler
Added util methods for zeroshot for mapping to weka classifier options

* Adds getters for normal scaler statistics.
Adds InputOptListener to allow tracking of input optimization process.
Adds rounding to J48 input mapper.

* Fixes to hyper parameter mapping.
Renames J48InputOptTest to ClassifierInputOptTest

* Removes ClassifierInputOptTest (moved to experiments project)
Adds unscaling method to zeroshotutil

* Fix for unscaling util method

* Adds option to use linear learning rate decay for PLNetInputOptimizer

* Post rebase fixes

* Post rebase fixes

* Adds LandmarkerCharacterizer

* new losses for alexander

* implementation according to paper

* Work on active dyad ranker

* adds test for TopKKendallsTauLoss

* intermeediate commit

* Work on UCB active ranker

* Adds method to get number of epochs used.

* Work on tracking queried pairs

* Fixes k-ktau for first 3 cases

* Work on UCB sampling

* fixes!

* removes averaging

* Adds the last case to the KendallsTauOfTopKTest

* Small change to comments

* Removes crucial bug in RandomPoolBasedActiveDyadRanker

* Intermediate commit

* Modifies PLNet to use INDArrays

* Work on new experiments

* Adapt DyadRankedNodeQueue to new repository structure.

* Add additional feature functions

* Rename

- getLength Method for IPipelinesCharacterizer (typo)
- ManualPatternMiner to ComponentInstanceVectorFeatureGenerator

* Feature/scalers (#18)

* Fixes division by zero bugs in DyadStandardScaler and DyadMinMaxScaler and adds the corresponding test case to the DyadScalerTest

* Refactoring of scalers. Now single DyadRankingInstances and SingleDyads can be transformed

* Intermediate commit

* Adapt DyadRankedOPENList to the given search space

* Work on metamining test

* Modifies PLNet to use INDArrays

* Minor fix in PLNetDyadRanker

* Minor fix to PLNetDyadRanker

* prototype implementation

* dependency fixes

* some bug fixes

* some bug fixes;
some test data

* Add missing landmarkers

* adds landmarking

* fixes rebase

* Adds nodeevaluator config

* Adds the Patternminer (approach 1)

* moves the dyad ranking based evaluator
adjustmens to the new interfaces
adds solution reporting
extends landmarking (still a lot to do, though)

* adds caching of the squared sums

* Fixes bugs in the dyadranker

* big fixes in dyadrankingbasednodeEval

* start working on no evaluations mlplan

* Adds scaler to node evaluator ; Adds fvalue event for evals; Fixes bugs in the scaler;

* Work on DyadRankedOPENList - should do something now

* Adds methods for untransforming with a given precision to the DyadMinMaxScaler and adds class for interval overlap based selective sampling

* Work on new sampling strategies

* Adds check for validity of potential new incumbent to input optimizer

* Readds test for input optimizer

* Add current status of approach

* Fix minor mistakes for experimenter

* Add compilation task to this project instead of experiments project

* Remove partialpipelinesexperiment

* Adapt ZeroShotUtil and PLNetInputOptimizer

* Changes in UCB Sampling and so on

* Fix bug for non-normalized ranking in evaluations

* minor code smell fixes

* Small adjustment in comment: SGD -> Adam

* Turns unit test for PLNetInputOptimizer into actual unit test.

* Work on some doc

* Post-merge fix

* Fixed small bug in RCNE that caused an exception in ML-Plan (#93)

* Fixed small bug in RCNE that caused an exception in ML-Plan

* dummy push to trigger the CI

* Adds jaicore-math to jaicore-ml build.gradle.
Deletes some deprecated classes.

* Intermediate commit

* Work on new active learning metamining test

* Adds unit test for DyadDatasetPoolProvider

* Adds utility functions to Dyad, DyadRankingInstance and SparseDyadRankingInstance for converting them to INDArrays. Additionally reworks the update(Set<IInstance>) method for the PLNetDyadRanker

* new properties

* Intermediate commit

* some fixes, but still not running...

* Fixes remaining minorcompile errors after merge

* Adapt configs to new resource structure; move resources

* Remove Metaminer stuff that doesn't belong & update package descriptors

* Delete lds from this repo since it belongs to the meta mining stuff

* Update standard PLNet for DyadRankedOpenList, add models

* Remove conf files that are now in experiments repository

* Refactoring, documentation

* Removing code smell

* Adds some documentation and removes some code smell

* Removes wrong searchspace files

* Code improvements

* fixes the builder
adapts the DyadRankingBasedNodeEvaluator mostly to this new interrupt patterns...

* javadoc

* Add missing setters, Add missing Pipeline Features

* Javadoc and code smell fixes.

* Deletes empty NDCGLossTest

* Code quality improvements

* Code smell fixes

* Reworking ActiveDyadRankerGATSPTest

* Work on code smell reduction

* Work on code smells

* remove code that is in the wrong location

* Should add the correct versions of all classes from dyraco/dev

* Some more minor code smell fixing

* Fix some more code smells

* Improve code quality so that hopefully we could merge

- in LBFGS
- remove unused imports

* made DyadRanking code compatible with current jaicore-ml

* Resolved compile issues of CI

* resolved another compile error occurring in the CI

* Fixed code smells in active learning library

* Removed several code smells

* Resolved remaining code smells and added test-relevant txt files

* Resolved further code smells and introduced a DatasetCreationException

* Resolved code smell

* Resolved a coupled of additional code smells

* Updated RCNE and its test (#142)

* Updated RCNE and its test

* Resolved compile issue in RCNE tester

* Pr/ail 93 (#135)

* Fixes a bug in the interval calculation, adds a junit test that generates range-queries on one-dimensional functions (as long as their gradient is given)

* Adds some more test data, adds various tests for the extendedrandomforest, adds extendedm5tree & extendedm5forest as well as various tests. THe m5 implementation needs to be reviewed by me, since the results are not as good as I'd like them

* M5 Tree fixes

* A ton of bugfixes

* a lot of bug fixes

* deletes old training files

* Major refactoring

* Some more cleanup

* test data sets & bug fixes

* updated build.gradle

* Adapts the RQPs to the weka interface changes (#139)

* Integrate weka-interruptible 0.1.1 to resolve interval prediction issues

* Adds troubleshooting in README for Maven dependency resolvement problems (#143)

* Pr/ail 96 (#140)

* Fix bug in NumericFeatureDomain setMax() method

* Add utility methods for extracting information out of compositions

* Comment unimplemented part

* Work on single and pairwise marginal prediction

* Fixes a bug in the interval calculation, adds a junit test that generates range-queries on one-dimensional functions (as long as their gradient is given)

* Set guava to version 23 in gradle file

* Work on random forest test

* Work on ExtendedRandomTree and ExtendedRandomForest, importance for subsets of arbitrary size can know be computed

* Add development branch with all experimental stuff

* Add utility methods for extracting a list of components from a composition and a String identifier for a composition

* Work on knowledgebase, creating instances out of performance samples

* Adds some more test data, adds various tests for the extendedrandomforest, adds extendedm5tree & extendedm5forest as well as various tests. THe m5 implementation needs to be reviewed by me, since the results are not as good as I'd like them

* Rework utility methods and add test

* Work on debugging the fANOVA implementation

* Debug fANOVA

* Tidy up

* Work on ExtendedRandomForest

* Work on integrating fANOVA into HASCO

* Add inner class to deal with parameter configurations

* Work on creation of instances from performance samples

* M5 Tree fixes

* Work on further integration of parameter importance estimation

* Integrate importance estimation, add name attribute to feature domains

* Work in database connectivity, ComponentInstances apparently cannot be deserialized from JSON

* Work on caching importance values and unit tests for tree/forest

* Work on ParameterImportanceEstimator

* Remove wrong code

* Clean up and doc

* Work on debugging fANOVA

* Work on evaluation of parameter importance estimation

* Refactore ParameterImportanceEstimator so that there is an interface for general parameter importance estimators

* Add Util for computing weighted variance and integrate it into fANOVA computation

* Work on tests and debugging

* Refactoring of predicates and HASCO classes

* Work on furthe integration

* Add hotfix for empty leaves in Weka RandomTree

* Work on isRefinementCompletePredicate for parameter importance pruning

* Work on refinement complete predicate and bug with empty ranges in ExtendedRandomTree

* Work on Bug with empty feature domains

* Work on refactoring

* Work on refactoring

* Try to get everything working

* Intermediate emergency commit because my laptop seems to be dying

* Work on experiments

* Work on experiments

* Add methods for computing unnormalized variance and standard deviation of marginals

* Work computing variance values

* Work on restriction on the minimum number of samples before using parameter discarding

* Some changes in tests

* Bug fixes

* Intermediate commit

* removed datasets

* A ton of bugfixes

* Work on storing performance samples for individual components

* Work on performance knowledge base for individual components

* Work on warmstart feature

* Work on warmstart feature

* Work on warmstart

* Work on comparison of parameters according to their importance

* Work on storage of intermediate results

* work on intermediate result storage

* work on models

* Work on serialization

* Work on Warmstart

* Minor Changes

* Warmstart work

* a lot of bug fixes

* deletes old training files

* Major refactoring

* Some more cleanup

* Work on refactoring

* Refactoring, documentation and changes towards robustness

* Further documentation

* Further work towards robustness

* Work on event bus listening

* test data sets & bug fixes

* Integration of parameter pruning into HASCO

* Several adjustments

* Refactoring of readyness check

* Work on integration

* Further work on integration

* Further work on integration

* Empty parameters are now set to default values in PerformanceKnowledgeBase

* Intermediate commit

* Fixes a bug in performance knowledge base which occured for parameter that have not been set explicitely, also several adjustments

* Integration

* Fixes a bug with pipelines with less than 2 parameters

* Minor refactoring

* Add workaround for parameter values that are not in the parameters domain

* Refactoring

* Minor Refactoring

* Some changes regarding NaN predictions

* Several adjustments

* Removed old classes

* Work on warmstart

* Extensive documentation added

* Adds classes for events

* Adds javadoc

* Adds the listener; Extends the classifier interface

* Adds interface method

* undos earlier changes

* Remove events again

* Fixes a livelock in the splitting

* Added ReproducibleInstances and classes to track history.

* Starts work on PerformanceDBAdapter

* Starts work on PerformanceDBAdapter

* intermediate commit

* DecoratedLoss + PerformanceDB connection

* Remove PerformanceDBAdapter from old package

* Intermediate commit

* Intermediate commit

* Work on PerformanceDBAdapter

* Implementation of getStratifiedSplit with ReproducibleInstances and seed

* Added StratifiedSplit with RandomObject as input, so it can be used like the old one.

* Adds component instance deserializer as a jackson module

* Work on PerformanceDBAdapter

* Fixes tests; removes debugs

* Added unitTest and fixed a bug in the reproduction of Instances

* Removes failed added classes

* Removes failed added classes

* First working version

* PerformanceDBAdapter now only considers has values

* First working version

* Fix major bug with hash values

* Added cache to selectionBenchemark

* changes seed handling

* seed fix

* Testing @JsonPropertyOrder(alphabetic=true)

* fixed == used on Double Objects in ZeroOneLoss. Uses equals now.

* Follow up for the seed fixes

* Delete PerformanceKnowledgeBaseDBTest.java

* cache package cleanup

* changes the seed generator

* Cache storage and cache lookup can now be toggled seperately

* Several adjustments

* removes some property orders

* extends the db adapter with the test trajectory and the error class name

* Minor fixes in classes that did not yet use the bridge

* minor bug fix

* Performance DBAdapter now supports separate ReproducibleInstances for train and test data and the db has an additional column for the loss function used

* Small adjustment of the CacheEvaluatorMeasureBridge

* Move PerformanceDBAdapterTest into the correct package and correct some typos

* Small change to database structure

* Several changes regarding jackson in the instructions. Also adding a test for reproducibility

* Evaluation time is now recorded and inserted into the DB for intermediate results

* Minor refactoring and documentation

* Small adjustment in PerformanceDBAdapterTest

* Small adjustment in PerformanceDBAdapterTest

* Resolve merge conflict

* Some refactoring and clean up

* minor refactoring

* Some adjustments

* Several adjustments

* Various changes regarding cleaning up printlns etc.

* fixed gradle

* updated build.gradle

* Pr/ail 95 (#144)

* Adds basic AutoFE structure and first framework skeleton

* Adds a default component loader for testing and filter mockups.

* Changes project directories to get a valid build path configuration.

* Prepares benchmarking evaluator, adds a test filter (local binary pattern from catalano) used in HASCO search

* Adds configuration files (for logging and search visualization) and a test configuration

* Adds a working automated feature engineering search using HASCO. Adds some generic test filters.

* Changes the test model and the pipeline factory to support pipelines with unbounded length.

* Integrates functionality (basic filters and factory implementation) to generate complex tree-structured feature engineering pipelines

* Adds basic AutoFE structure and first framework skeleton

* Adds a default component loader for testing and filter mockups.

* Changes project directories to get a valid build path configuration.

* Prepares benchmarking evaluator, adds a test filter (local binary pattern from catalano) used in HASCO search

* Adds configuration files (for logging and search visualization) and a test configuration

* Adds a working automated feature engineering search using HASCO. Adds some generic test filters.

* Changes the test model and the pipeline factory to support pipelines with unbounded length.

* Integrates functionality (basic filters and factory implementation) to generate complex tree-structured feature engineering pipelines

* Adds construction and application of compley filter pipelines, generic collection type parameter, filter changes and minor improvements

* Switches data wrapper class to DataSet which allows to keep former attributes (incl. class attribute) and intermediate domain-specific representation

* Adds Catalano libs to project

* Implements first draft of cluster evaluation using kernel, data set changes (now using ND4J) and a lot of minor improvements and fixes

* Adds missing library (used for kernel-based clustering)

* Adds Catalano preprocessing filters, new catalano config, unit tests, working filter pipeline execution and many further minor improvements and bug fixes

* Integrates more feature extraction operators, some minor test changes, ...

* Adds pretrained nn skeleton

* Adds (rudimentary) pretrained NN support

* Updates clustering, class hierarchies, introduces / changes new filters,  adds tests and minor improvements

* Adds new benchmark functions (ensemble, LDA and COCO), fixes and improvements and unit tests

* Adds missing lib

* Adds ranking experiment implementation, pretrained nn filters now check for appropriate input shape, minor evaluation improvements, additional tests and more

* Updates ranking experiments, other logger, minor improvements.

* Adds model classes for databases and serialization / deserialization for database models

* Adds some hashCode/equals/toString implementations

* Changes Set to List as data structure in the model classes

* Adds more util methods + unit tests for operations

* Delete temporary serialization file after unit test

* Adds classes for graph search

* Adds implementation of DatabaseConnector for backward aggregations

* Makes databases cloneable

* Adds first implementation for the search classes

* Implements getTargetTable() util method

* Fixes a bug in serialization/deserialization

* Implements forward operation, adds some logging

* Makes operations name-based instead of object-based

* Improves logging

* Adds a simple test for the search

* Adds toString for the operations

* Refactoring

* Adds an aggregated attribute to the serialization unit test

* Fixes a bug with serialization of aggregated attributes

* Relationships are now based on strings only, this fixes OTFML-194

* OTFML-192: Implements applyForwardOperation() and getInstances()

* OTFML-185: Adds toy database

* Adds first draft of the database connector

* OTFML-207: Removes obsolete code

* OTFML-206: Adds methods that compute all features reachable via forward/backward edges

* OTFML-206: Adjusts graph search problem

* OTFML-209: Introduces exit edges

* OTFML-211: Adds graph visualizer to the test search

* OTFML-208: Adds lexicographic order on forward attributes

* OTFML-212: Separates features from attributes

* OTFML-210: Corrects the representation of the path in the BackwardFeature

* OTFML-210: Adds a method to check whether the path of a backward feature is complete

* OTFML-210: Adds methods to compute the relations TO a table

* OTFML-210: Fixes a bug in the isIntermediate() method

* OTML-210: Completed intermediate node handling

* Adds additional search test

* Adds default constructor to backward relationship

* The path is now part of the name of a backward feature

* Fixes a bug that complete paths were classified as intermediate

* Feature lists are now properly cloned in the successor computation

* OTFML-219: Only aggregable attributes are considered as backward features

* OTFML-214: Adds unit tests for the duplicate check

* OTFML-214: Moved the path of a backward feature into a dedicated class

* OTFML-222: Adds duplicate check for intermediate nodes

* OTFML-223: Adds unit test

* OTFML-223: Corrects unit test

* OTFML-223: Adds duplicate check for standard nodes

* OTFML-192: Removes obsolte sql statements + test

* OTFML-192: Adds a method to compute the name of a feature table

* OTFML-192: Adds a field to the Attribute class to indicate whether an attribute represents the primary key

* Cherry-picks commit from upstream/master to include changes in SQLAdapter

* OTFML-192: Adjusted SQLAdapter to make it run with SQLite databases

* OTFML-192: Adds SQL to create a forward feature

* OTFML-192: Adds implementation for the getJoinTables() method

* Adds a few more data to the toy database

* OTFML-192: Adds SQL generation for backward features

* OTFML-192: Adds type to features

* OTFML-192: Adds first draft of the DatabaseConnector

* Adds jdbc access data to the database model file

* OTFML-192: Adds test for the DatabaseConnector

* OTFML-192: Finalized DatabaseConnector

* OTFML-188: Adds implementation of the node evaluator (including connection to benchmarking system + random completion)

* OTFML-224: Feature from the target table can now be selected

* OTFML-228: The target attribute is now selected, too

* OTFML-229: Aggregated features are now propagated in random completion

* Adds some small fixes and adjustments

* OTFML-225: Adds configuration class for hyperparameters

* Introduces usage of hyperparameters and implements the DatabaseProcessor

* Adds close() method to the DatabaseConnector to make sure that database connections are closed

* Adjustments to be compatible to the upstream version

* Adds database name to the JDBC data

* OTFML-230: Adds workaround

* OTFML-231: Fixes bug that there are tables without alias

* Attributes now have a full name (<tableName>.<columnName>)

* Adjusts the name of attributes

* Finished nodes now do not have successors

* Adjusts toString() for features

* OTFML-233: Attribute and table names are now escaped using backticks

* Fixes unit test and incorrect setup of the SQL adapter

* Adds full attribute name to the toy database model file

* OTFML-233: The attribute table name is now considered in the feature table name

* OTFML-241: Removes additional (unnecessary) column in feature tables

* Fixes search unit tests => The target must not be considered as feature

* OTFML-250: Includes timeout for the feature extraction

* Fixes a bug where backward features have a corrupt path

* OTFML-251: Adds (temporary) evaluation for empty and non-finished nodes

* OTFML-252: String attributes are now converted to nominal attributes

* OTFML-253: Makes evaluation function configurable

* Fixes a bug that just the last attribute is converted from stri…

Loading branch information

julilien authored and mwever committed Jun 13, 2019

1 parent 97f1dd5 commit 279fb2b

.gitignore

-Original file line number
+Diff line change
@@ -1,6 +1,9 @@
     # Created by https://www.gitignore.io/api/java,linux,macos,gradle,windows,eclipse,intellij
+    ### VS Code ###
+    .vscode/
     ### Eclipse ###
     .metadata
@@ Expand Down @@

.travis.yml

-Original file line number
+Diff line change
@@ -1,10 +1,3 @@
-    stages:
-    - name: compile
-    - analysis
-    - test
-    #- sonarqube
-    - name: deploy
-      if: branch = master
     language: java
     jdk:
     - oraclejdk8
@@ Expand All / @@ -17,32 +10,22 @@ cache: @@
       - "$HOME/.gradle/caches/"
       - "$HOME/.gradle/wrapper/"
       - "$HOME/.sonar/cache"
     # overriding the install step is important to avoid that ./gradlew assemble is run (which is the default in travis)
     install:
       - true
-    jobs:
-      include:
-      - stage: compile
-        script:
-        - "./gradlew build -x test -x signArchives"
-      - stage: analysis
-        addons:
-          sonarcloud:
-            organization: "starlibs"
-        script:
-        - ./gradlew compileJava
-        - ./gradlew testClasses
-        - ./gradlew sonarqube -x test
-      #- stage: deploy
-        #script:
-        #- "./gradlew publish -PnexusBaseUrl=$nexusBaseUrl -PnexusUpRepo=$nexusUpRepo -PnexusUser=$nexusUser -PnexusPassword=$nexusPassword"
+    addons:
+      sonarcloud:
+        organization: "starlibs"
+    script:
+      - ./gradlew compileJava
+      - ./gradlew testClasses
+      - git fetch --no-tags --unshallow https://github.com/fmohr/AILibs.git +master:refs/heads/master
+      - git fetch --no-tags https://github.com/fmohr/AILibs.git +dev:refs/heads/dev
+      - ./gradlew sonarqube -x test
     env:
       global:
-      - secure: FkCxTnMbWP3Whd61puUXI6VJhhhXTeFby0k2Rmq01dYZN8asZ1FCoQFw1dEOlcx7dWLYFvb1feSmi7YOW+u6WznrbORu41NWFpA5VvGJyPkrsrhNN6xaNy7PNfFCMjmhZqcOAUoCCDwGzyUe17qoip1NfjNiIIFa+G6HNATUaODeGBtDBYpKVw3tvzM/rz14cIx2U4wF1WphxaIME3X5pzBNm0JK0NsYOZOHpb96O5t0lVkQXFgsm2A7Q0yWV+fGENoqfHZlnKmuiXjd7VJDNl9VMPjCLbcgVeN7O7OSzOQ9Omb9YwesV2nMUPEX6g2/v9yzFOoDpghtKEU9jZaLpIfmHsRafCNx0jBdy482cjznfEYaowT4CG/lGi4tT8gMYxzJGsT8TAdHoyzX7ZZUJsY+LWqb0K45YTup51Ynbuz2xeGzWo6Ydm9kYeuBcmcMwEjXFcSubalobT3JGlKUIgW9abEUfTWZwuhAzsRJ6uzB2N7pVqC/MyGXvnDJcQNDObHlmeKA0pWqZ9yIWYxHqByuwWfs+Ac63XJOk1qboWBfRXQhTWI9qQPV1W1j0kHU4S+MA2vPS1P7evzcU+Ci/YdCDwONiZ8Df1mevV3U8uZPmMcb5rgJ2YQT7xlSahTE1n7St8SroP6vFdiCgKuj8RRAIn9IZM6uTBAN1dKyDxQ=
-      - secure: n7jixa2lZTul+vGQWKrgTPDZTBXyEGyNzznVxy2ljGNFRdbX7js11tLJiGV0F5yUtD+2tHRsFyMxiauIfajW3A2y6EdCJvCbWXU1gtzN1khufk4bTujnomEiqERI16wbiIwTaVWILnX+m0mmw14qj8+9FJpsudzQuAdyy2a6qKM/3cn9VNJN1KAOye0liRV+cFlwVstwynZXFg6xRJ6PyJMQBorZyP6XVTQYpwqe7iKtR2jI6iVlQdLhjZAspDaG7QC4RA5Sxv0+a10cs6T3AZyddHrJi7AqrtSVfIO1Spq/hr/jtG//rmFZKmtoXIg1J2qIrDmhNN6G7d3StO8QecNYs+5GR4gHGMEkY86Op22g4nL+ktACviWpY7nRrgSKMef8fJ3BMfJriwMOBgVNBIoj7POFwRUWJ78pGGusIyHWsmd07OfqyQbvImkcFfGZBCUZugCVWoKpxJW5TeWu073AuBT9gMbDTnvcE3S2l1YZGxryJl0o5w9u3HM94ERqY77HVGa9LKzcZSPXyLfeqWQt1O1YaGmxE+vYtFokwQr9TI3pEbQy4Kq6U8PoLGpilhUAXkxM+Dmaia2b2seUfNbeAnvl4R0IFVY/KhpucJDz2z8lKR143ppudSsTYLOVk1+cAGwqBm72EdNu1NIM+3MM4vViZj0N0bQ6gr2d1gw=
-      - secure: uruOwgB6s3zHxvBGO5rAKr8w2vihnJr56ZtuG1e/uUznQAuWPD/Tfa+7dNXMiLEAERY+IzOH1ZNohCPv/BN492POV+elz9O6J28BpQbuR1sTz2vh4PRHjJxWvXp5JYQpSoH2GRn+r21PfE6viFkD3FoyQYYQY7TDSqbPtMWTxp/EqPbnXbGC9i/Ej2hCjTa0c8TkadYxYR/qPNr2FghrSudtm8xF48UNGGHGGKcVisRpUMjAawyIY6luc6SfKeD3VSi7+ruYL30P+1bzcp/2W+cR3uNp/Kl8WMrPSjbRLGcy7/rYC5pXvxmpcNjkrj7CRDNnlnprZOeqsWUGq0g/oNLv+w98A+VDFRvs0DLAk5qX1Fj8/aSAjlCCTRBrjY34b6CJOikKNQOpM1kwi1tx2ER/yfx97PA+p9Hd15IWvzUTGef6zGW4vfeFnpgiwEMk3Z3XEJ7XYIChgk/1QrjhjBXihbbDQ5Ifl8rUwNQahbxHDiXlqBFd7v2RxfoqXoWFkC9bbL0yRgrKvcb/kC8A5/+ih7wFoz0f7Djw09eGdf67EKuc/aaY4zY8s1UFN33M9NuPIhqiNbF8rSf40xb077LYgeOXRwdZq/f/3Ay3NBonSDpyYFfMobcXA1YCexm68jeiBLjWehkFrxlmDdhQpiCL/QsbPl4+7cNFsjdwJFU=
-      - secure: beYMt3rl3NMfQNvYfYwObyQxMmaPqzgQ6UkNDjc3/aM6dEVo+m6C1dM4qy298iRt5052IZmCRl6lbUtFsdWVt0gUYorMUWchgnLHocOykdNylQtIZRKzXcvL+IxXoX6hjTu+9NsdTOmbTu51vFJ7l3w8BbqoZxuyR45NwUcC4LLb69AlFsEps/n0Rz43YJgG878mLvaLhIo2KjDSCF5cEf7U8gfm8YdcAbQxTJ82SYq4bV/+Nn1ObNXV2efT7dgqUDfFufBb2KGq8+WHZkR2Rkg17Xc0OaxYmMsy2ffOOqSE/XefGaynBSXUMjhkurODfQJam+eZenSNO6TBwXFBe/rjictMQDeGs0vd1gOQAyUluttV4n8K4RIIuDlYGCH/095A6U2Ss2qTiKoEM4oJ8OGr5z7PQe1j+jaP6G7Pm4Zsq4L0FFDXGNOYCObMotXF0ks2hhkz9pSHHShJBqdVHHcqEeKNzhw34NEnByDnjwA3DjLNXqtXciIcK6PyQMS8m0vubdpVs6qpNAuUNjy3OiO4hix2IiiAVnPcgzdH5kfUr0T1JdbPEWqT40IOZ3/NcfYcss4ftVgwFjGizRBkiDu/TXifVRK3DkTfElOYuqCp/LRwsgwHwYZuvGK3H1KAEVWLsnzEcnWbOu4gUBG9QqDTFlH3fi8gqHBnSRCgqUU=
-      - secure: Mq2QjNnwnYGToCbeSnFi065p1CsQUo7M/ueLNBsI1B6z4oZVHD38Yg3wPBhIdYbJ+dW08liFAvLNnrHDA6xqX/+H6vewCkgCNw5C1C62dvKB/S9fkQ5JPYYm3VeYxfG4rYkEy121ZlMpZtQSP8wq89xZYcseY5lJDJmOZiQh5HZxs1Hc6vXNSpRj5JoeHgM5Ko8Db+sFpugbj+CNHhqYy171VhjoxtykXm/E4KWwfwjMyYE7VOiOKt7vudljA6PoRIWgKrNFUFEFHApAkrIkj5JKgFccuumoBvluXEycr6LxMJxWsU9HPi5OnNr2SIwEnZwDOywbnnTKONUf+ppXp0bG0wRleBPSAvOgJhoCDy3LD0TIE505QanHpEqUz0DzBQ8gqDpZlF7NdO5sL7ywKuSenICsPRr5Pahd4puwEocPqT2BWMD4AglUANBklXxwHJ66/SJBdXt3YAoepHSwgy86NGAbqv5MZBZweuvfeeWveHSbx3SnPOl2PyomMSgQlsaK33oByBsJxEPmyCIVZqVR5FzFU7N3BN4u3XLCWPmiUp5DKfA+NgVd1DnvkPRzvnB9atzSYy/lwYB9nBDmXsrI7c1vd8O0pDtXliHOu4Dy+RGLUR3PkmIK5TpptD+yuDrh7AykjY53pLe28PyYslImj09CBPpggkHvU4fG4Ug=
-      #- secure: DfzFY0ZPQljDuzACprKbH0i3WwyzE0Mu17wqVZHNrhldOnjnmS+aM6NxRy5pDYMFKLV2I/sWn2+VH3VxswvEq4VAH/m61F2HSuA5UolFCaL6QYszHn9sOkJt/kVe+mI08iy/b6RJF3l8p73oiOeTu+Nx8sVL2DftfmG0NsG0ie2vaMtGKUFEG1ooE2iNUj1FjcjDBzxQ06p2XGbsVeFwoJo/Y+WbHXPOdzE2m7dXmbERrM4yRMQ9RR56qdsCgaJQuPZqcbd9v71GEPZ8FGN6t7evojK0P9TM0pdiJeptD3QErP3YE6YAFy+mDU4W9nycazFowAyg45o9G+60Aob4WI3fDsSb5duZ6tO1EzYrvjuH9gvucYmu8+DXAzXgF2MvebtrtZTlyPviuyRpcp2EtZmG4evyFeRRAnXV07nYSN7chb0bsF02d23UW2NQYPtQvl+YUIbTYgfwB6rjpWQtNxLMu8s6QMdnuVwpJWevTACW1ryJZyJExLhnPJp2/bnoB94sebr4kSezUX/CMi+jVxTQ+BdlqWKafjYDfBQ+T7m7Mfs5beBC0+7m7gpqLgDhDx2kngWTrwDAGwCnqTGo1Zf1M7ZqbwYxO6UVcNMB4JgBfAGfK1HMOt5EU6UPFCqUnn5bOqFCOwJagpnm7FnWYOJR3kTlzXapJs9U/DJ84Ro=
-      #- secure: LkTxOQQ5d4OKT6Vq+4LAJpBhnJ2/ymJOVHDYDayuKmmExq+fOot11xpDrUZ/Ju3RJqPRrVGVA1C5xheWaWVjeAa1LLIJugw4jW9HesXx4sKeW3jpG13Tn/YEP18pr7YxrSp0Q7xqPcDjth1mxM0ovTpZ1iLGJawGUre0amMZOT3dwKfFRZNfXiQL7Uog6xQ6Z+QhkSaBS3IMT+wCM2Yrrn9oXNDZY7KhZ533AhPgaQI9A3O5Zpp4ag7O+zXg3cru7ANb5ViRxz3AK3pwL//ZPq88Lby6AOb5vzsQnSU/Lsjq7i7CYhsl2EXSUT3dFr94yMERiAibMvU5Y1z4cipTf8WYVo7RL278ju8i20WWCaARuGWcieve14k1b5iOQS3jl+Z1YPL0htbByPeyZNH9yDta1QAGdOl+p+Tz0FNXOviQwA7k9NIjc1Qr/F18sntLEcfUW9oHE61bxAFjuZSRCNvBe6wPB/vSQ747ARi8cNzsBuyE8rEgHm1Bn4VJ15iA2Le5o0awtglJDaTnZNA33hepyx7+YGGagPe/+2dgWyIh9bsFxly5gCFsJIG9HwaJmh1dShCk8jXEy+nCB+Es33sMyxP9CaphB/HWAfzsohpfrBoqwB2Pn4Gpxc85k70TDJ7vQkacneZY3nfTH45oSvgeQKmyhE8pijhazdCfBmc=
       - secure: yUfjJ5ZSdOWWYLye5Oc6LnkN8fRFHOvRM/NlH15NXN5ZloRqfsfcQ80TFUBOCkEBwGPwUqhqZn2x7yc6XzVlpZ0lNc8voDvDNvGk2V0R42QOHUPFce0HicAtslM6we05kWA3wmcKvB3nBhRC1rPcrZC3a0G2Vby8yfvO2BEEog7Cb8zhaX5RcOFedSw3pgDrnu/aglZ2mtOjhyt36aYXx7UGY9ueEO24EcF0R8XKNn8h7qSnr7GJtPLioTME4gCOhV57YBBgDbM72I33Rmmr1fvb264/QTozYh5E0e+0uv5BuQO8q4XZU5VjdG+eTISbtkXc3bFSAOeVYUrXJBv7t7C/IhuiaHFRSC8O9dME8MXfaa/0E199CgmI7M23LIjKF7r3gPz6VvBx9BBs3m7zgCnwBhomw8ggk1U21GDYR5AwoaGhYpBCFfdMNPnPiS8vpoXYk9L1dl8vyGRyVLBNpAq7wXthWrUz3X3HEEMVWDyRqnugDzS+lE/hB+Mh7TA+lqVJegRcNOyjHw/JdynEbbvatRXOZQftcgQ4FZ81dUjI+Svin79WCdCSGM8fcTTIB5+/avoggAyLwQpsX5BXewEhADdwl5OcsFiYBPZLbX0gUBtbIKCsEOxUMX55VbzdiLoWOFMjjqUeyXJteGdqVKcFIyEazJ/Eh07lc78SxqE=

JAICore/JAICore/.settings/org.eclipse.jdt.core.prefs

This file was deleted.

JAICore/JAICore/build/libs/JAICore-0.0.1-SNAPSHOT.jar

Binary file not shown.

JAICore/JAICore/build/manifest/JAICore.properties

This file was deleted.

JAICore/JAICore/build/tmp/jar/MANIFEST.MF

This file was deleted.

JAICore/jaicore-basic/.gitignore

Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		/logs/

JAICore/jaicore-basic/build.gradle

-Original file line number
+Diff line change
@@ -1,7 +1,8 @@
     dependencies {
-      compile group: 'org.apache.commons', name: 'commons-math3', version: '3.6.1'
-      compile group: 'it.unimi.dsi', name: 'fastutil', version: '8.2.1'
-      compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.9.7'
+      compile ('org.apache.commons:commons-math3:3.6.1')
+      compile ('org.apache.commons:commons-lang3:3.9')
+      compile ('it.unimi.dsi:fastutil:8.2.1')
+      compile ('com.fasterxml.jackson.core:jackson-databind:2.9.7')
       testCompile 'org.awaitility:awaitility:3.1.6'
     }
@@ Expand Down @@

0 comments on commit `279fb2b`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `279fb2b`

Commit

There are no files selected for viewing

0 comments on commit 279fb2b

0 comments on commit `279fb2b`