Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Turn off shuffling for FactorizationMachineBinaryClassifier. #316

Merged
merged 1 commit into from Oct 9, 2019

Conversation

pieths
Copy link
Collaborator

@pieths pieths commented Oct 9, 2019

Attempt to fix the intermittent test_estimator_checks failures with FactorizationMachineBinaryClassifier.

@pieths pieths merged commit 15eddb4 into microsoft:master Oct 9, 2019
@pieths pieths deleted the factorization_classifier_fix branch October 9, 2019 19:45
najeeb-kazmi added a commit that referenced this pull request Oct 10, 2019
* Update readme with latest feedback (#39)

Updating readme with latest feedback.

* Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40)

* Initial checkin

* Move to Hosted Mac pool

* Update README.md

* Manually copied naming changes over from master.

* Revert "Merge remote-tracking branch 'upstream/temp/docs'"

This reverts commit 93c7347, reversing
changes made to 2350069.

* Improve documentation regarding contributors.

* Fix email address.

* Create CODE_OF_CONDUCT.md

* Update issue templates

* Create PULL_REQUEST_TEMPLATE.md

* Update issue templates

* Update issue templates

* Update issue templates

* Fixing link in CONTRIBUTING.md (#44)

* Update contributing.md link. (#43)

* Initial checkin for ML.NET 0.7 upgrade

* fix tests

* put back columndropper

* fix tests

* Update scikit-learn links to use https instead of http

* restart dotnetcore2 package work

* fix build

* fix mac & linux

* fix build

* fix build

* dbg build

* fix build

* fix build

* handle py 2.7

* handle py27

* fix py27

* fix build

* fix build

* fix build

* ensure dependencies

* ignore exceptions from ensure dependencies

* up version

* Update cv.py

add case for X is data frame

* Update cv.py

add a space

* add a test for cv with data frame

* set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error

* fix build

* up version

* Add instructions for editing docstrings. (#51)

* Add instructions for editing docstrings.

* Add footnote giving more information.

* Fix build failures caused by dotnetcore2 module. (#67)

* Fix importing of the dotnetcore2 module because it has inconsistent folder naming.

* Fix file check for unix platforms.

* Fix indentation levels.

* Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (#69)

* Increase version to 0.6.5. (#71)

* Update clr helper function to search multiple folders for clr binaries. (#72)

* Update clr helper function to search multiple folders for clr binaries.

* Moved responsiblity for Python version checking to utility functions.

* Add clarifying comments.

* Fix call to get_nimbusml_libs()

* fix drop column param name

* Remove restricted permissions on build.sh script.

* Fix lightgbm test failures by updating runtime dependencies.

* fix TensorFlowScorer model_location paramter name

* Fix build.sh defaults so that it detects when running on a mac.

* Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests

* fix tests

* fix pyproj test

* fix win 3.6 build

* fix comments

* expose "parallel" to the fit/fit_transform function by including **param to the argument

* add a test for the parallel

* update parallel thread

* fix tests comparison

* Update thread, retry build

* modify tests

* specify pytest-cov version

* update pytest-cov version in build command for linux

* for windows use the latest pytest-cov

* Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET)

* Changed the keys to be the same as other internal repos

* Changed the key filename

* Update to ML.NET 0.10.preview (#77)

* Updating ML.NET nugets to latest 0.9 preview.

* --generate_entrypoints phase 1

* Fixed Models.CrossValidator

* Updated all entrypoints

* New manifest.json, picket from Monte's branch

* Updated API codegen

* Replace ISchema and SchemaImpl with Schema and SchemaBuilder.

* Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder."

This reverts commit dcd749d.

* Refactor IRowCursor to RowCursor.

* Update ML.NET version in build.csproj.

* Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11

* Updated RunGraph.cs to ml.net 0.10

* Refactor Vbuffer

* Added override to RowCursor methods

* Update to NimbusML-privileged nugets from ML.NET.

* Update to Microsoft.ML namespace without Runtime.

* Schema and VBuffer fixes in NativeDataInterop.

* API fixes for IRandom and IsText in RmlEnvironment and NativeDataView.

* Work on getting VBuffer pointers from Spans.

* Some VBuffer fixes

* fix some class names

* Fix Register Assembly names.

* Remove ML.PipelineInference

* fixed more classes

* Add back columndropper for backward compatability.

* Register Entrypoints assembly in environment.

* Fix homebrew update problem on VS Hosted Mac images.

* Updated all the nuget versions to be the same.

* Attempt to fix the dataframe unit tests

* Fixed test_pyproj

* Optimized VBuffer changes

* Changed bridge version value to 0.10

* Addressed PR comments

* Simplify by using six.string_types (#89)

* Simplify by using six.string_types

* Force a retest

* Removed ISchema from DotNetBridge (#90)

* Removed ISchema

* Fixed the tests

* Addressed PR comments

* Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn.

* add configuration for python 3.7 (#101)

* add configuration for python 3.7

* fix broken unit test

* Update build.sh

* fix build for Windows

* Linux py3.7 build

* fix pytest version

* upgrade pytest

* fix pytest-cov version

* fix isinstance(., int) for python 2.7

* build urls for Mac

* final fixes

* fix libomp

* Removing 3.7 for now as its not in PyPI

* Upgrade to ML.NET version 1.0.0 (#100)

* ref v0.10 ML.NET

* fix build

* hook up to v0.11.0 ML.NET

* fix build errors

* fix build

* include Microsoft.Data.DataView.dll in build

* typo

* remove protobuf dll

* Regenerate code due to manifest changes

* fix missing ep

* Update to ML.NET 1.0.0-preview

* fix .net build

* update nuget for ML.NET

* remove Data namespace dll

* rollback nuget changes

* move to final RC ML.NET

* Regenerate classes as per updated manifest

* fix maximum_number_of_iterations param name

* fix parameter names

* fix names

* reference official v1.0 of ML.NET

* fix tests

* fix label column

* Fix tests

* fix lightgbm tests

* fix OLS

* fix tests

* fix more tests

* fix more tests

* fix weight column name

* more tests

* fix normalized metrics

* more errors

* Fix CV

* rename feature_column to feature_column_name

* fix cv ranker

* Fix lightgbm tests

* fix changes due to upgrade of NGramFeaturizer

* fix ngram featurizer

* fix FactorizationMachine assert error

* disable test which is not working now due to change in LightGbm version

* fix model name

* typo

* handle nan in arrays

* fix tests

* fix tests

* fix more tests

* fix data type

* fix AUC exception

* kick the build

* fix tests due to data change

* fix ngram test

* fix mutual info tests

* copy libiomp lib

* fix mac build

* disable SymSgdNative for now

* disable SymSgdBinary classifier tests for Linux

* fix linux tests

* fix linux tests

* try linux

* fix linux

* skip SymSgdBinaryClassifier checks

* fix entrypoint compiler

* fix entry point generation

* fix example tests run

* fix typo

* fix documentation regression

* fix parameter name

* fix examples

* fix examples

* fix tests

* fix tests

* fix linux

* kick build

* Fix code_fixer

* fix skip take filters

* fix estimator checks

* Fix latest Windows build issues. (#105)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Fixes #50 - summary() fails if called a second time. (#107)

* Fixes #50 - summary() fails if called a second time.

* Fixes #99. Do not use hardcoded file separator. (#108)

Fixes #99. Do not use hard coded file separator.

* Delete the cached summaries when refitting a pipeline or a predictor. (#109)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Delete the cached summaries when refitting a pipeline or a predictor.
Fixes #106

* Simplify the code that deletes cached summaries when calling fit.

* Fix signature import error when using latest version of scikit-learn. (#116)

* Fix signature import error when using latest version of scikit-learn.
Fixes #111

* Move the conditional import of the signature method in to the utils package.

* Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade the pytest-remotedata package to fix missing attribute error. (#121)

* Upgrade the pytest-remotedata package to fix missing attribute error.
Fixes #117

* Remove the RlsMacPy3.6 configuration from .vsts-ci.yml.

* Upgrade version (#122)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade version

* Support quoted strings by default (#124)

* upgrade to ML.NET 1.1 (#126)

* upgrade to ML.NET 1.1

* by default quote is +

* assert changes due to quote

* fix tensor flow example

* Put long running tests in to their own folder to shorten build times. (#136)

* Temporarily remove the dataframe examples from the test run
to see how much that effects the test length.

* Remove all examples from the tests to see how it impacts the CI run.

* Put long running tests in to their own folder to shorten build times.

* Update nimbusml.pyproj to reflect the newly moved test files.
Forgot to save the nimbusml.pyproj in visual studio.

* Expose ML.NET SSA & IID spike & changepoint detectors. (#135)

* Initial creation of the IidSpikeDetector files to see what works and
what doesn't.

* Import the Microsoft.ML.TimeSeries assembly in to the project.

* Use 'PassAs' in manifest.json to fix the source parameter name.

* Use float32 for data dtype in IidSpikeDetector example.

* Convert IidSpikeDetector to a standard transform. Add examples and tests.

* Add pre-transform to IidSpikeDetector to fix incompatible data types.

* Fix issues with the test_estimator_checks IidSpikeDetector tests.

* Remove unnecessary TypeConverter import in IidSpikeDetector example.

* Initial implementation of IidChangePointDetector.

* Initial implementation of SsaSpikeDetector.

* Initial implementation of SsaChangePointDetector.

* Fix incorrect SsaSpikeDetector instance in test_estimator_checks.

* Fix a few minor issues with time series unit tests and examples. (#139)

* Skip Image.py and Image_df.py tests for Ubuntu 14 (#149)

* * Fixed the script for generating the documentation (#144)

* Moved _static to ci_script to solve an error while using sphinx
* Removed amek_md.bat and merge the commands of it to make_yaml.bat
* Moved metrics.rst to concepts

* Rename time_series package to timeseries. (#150)

* Fixed the issue of Ubuntu14 not skipping Image.py and Image_df.py (#161)

* Updated CharTokenizer.py example (#153)

* Skip CharTokenizer.py for extended tests (#163)

* Add support for returning custom values when overriding Pipeline.predict. (#155)

* Initial creation of the release-next.md file. (#165)

* Initial creation of the release-next.md file.

* Point the time series example links to the head of the master branch.

* Initial implementation of the SsaForecaster entry point. (#164)

* Final updates for release 1.2.0 (#167)

* Update the LightGbm entry point with the latest version from the manifest.

* Add SsaForecasting examples to the release notes.

* Add documentation modification to the release notes.

* Create the official 1.2.0 release notes. They have been put in the
docs/release-notes folder to closely match the ml.net directory
structure.

* Add correct version to the release notes title.

* Re-enable the SsaForecaster tests.

* Update to the latest version of ml.net. Update the NimbusML version.

* Fix issues with the summary unit tests.

* Comment out the SymSgdBinaryClassifier summary test. It does not
appear to be working on linux.

* Revert change b5eb937 to see if it (#168)

fixes the signed build issue.

* Bring back build.cmd commit. It did not fix the signed build issue. (#169)

* Revert change b5eb937 to see if it
fixes the signed build issue.

* Bring back commit b5eb937. It did
not fixed the signed build issue.

* Bring back the build.cmd change from b5eb937. (#170)

It did not fix the signed build issue.

* Use restored dotnet CLI for signing (#171)

* Update README.md

* Enable LinearSvmBinaryClassifier (#180)

* Enable LinearSvmBinaryClassifier, add examples, add test, and update docs

* Add test for predict_proba() and decision_function()

* Setup destructors for data passed to python (#184)

* pass destructor to python

* indent

* Add azureml-dataprep support for dataflow objects (#181)

* draft code

* draft

* delete

* add dprep dependency

* rollback

* rollback

* rollback

* test & example on using DprepDataStream

* add dprep path

* add dprep path

* fix mlnetpath

* optional dependency on dprep

* run dprep tests optionally

* fix typo

* Up sdk version

* fix linux dprep tests

* up version (#188)

* Save the model file when pickling a NimbusML Pipeline. (#189)

* Save the model file when pickling a NimbusML Pipeline.

* Add version to the pickled Pipeline.

* Add the steps attribute to a pickled Pipeline instance.

* Add extra unit test for pickled nimbusml pipelines.

* Add export_version to pickled base_pipeline_items.
Remove unnecessary export_version attribute from an unpickled Pipeline.

* Remove stored references to X and y in BasePredictor. (#195)

* Remove stored references to X and y in BasePredictor.

* Remove unnecessary scikit-learn import.

* Add observation level feature contributions to Pipeline and BasePredictor (#196)

* Add get_feature_contributions to Pipeline and BasePredictor, add example

* Add tests

* Update release-next.md

* Add classes_ to Pipeline and/or predictor when calling predict_proba. (#200)

* Add classes_ to Pipeline and/or predictor when calling predict_proba.

* Update test_estimator_checks.py to skip the check_dict_unchanged
test for any estimator which supports predict_proba or decision_function.

* Update Handler, Filter, and Indicator to automatically convert the input columns to float before performing the transform. (#204)

Fixes #203.

* Combine models from transforms, predictors and pipelines in to one model. (#208)

* Initial test implementation of combining 2 or more models in to one.

* Added support to Pipeline.combine_models for combining other types of items
and transform only inputs.

* Combine Pipeline._evaluation_infer and _evaluation in to one method.
This fixes an issue where a classifier graph would not contain the
correct nodes after calling Pipeline._predict().

* Missing part of previous check-in.

* Fix the Pipeline.combine_models signature to work with Python 2.7.

* Fix build (#209)

* T

* Fix cert

* Update release-next.md. (#211)

* Update release-next.md

* Update release-next.md

* Update release-next.md

* Add classifier and FileDataStream unit tests to test_pipeline_combining. (#212)

Add classifier and FileDataStream unit tests to test_pipeline_combining.

* Update release-next.md

* up version (#210)

* up version

* Up the version

* renamed factorization lib

* remove matrix factorization lib ref

* dbg libs

* fix libtensorflow framework

* package more libs

* add mkl proxy

* Enable EnsembleClassifier and EnsembleRegressor (#207)

* Enable EnsembleClassifier

* nit

* Enable EnsembleRegressor

* Add output combiners

* Add sub model selectors

* Update examples

* Add documentation for components

* Add diversity measure

* Improve examples

* Add tests

* Fix test_estimator_checks

* Create release notes for version 1.3.0. (#214)

* Update release-1.3.0.md

* Add --installPythonPackages flag to build scripts (#215)

* Add --installPythonPackages flag to build scripts

* close if statement in build.sh

* fix --runTestsOnly

* Fix a bug with the classes_ attribute when no y input is specified during fitting. (#218)

Fixes #216

* Add NumSharp.Core.dll (#220)

* Add timeseries documentation to the master branch. (#221)

* Docs update (#224)

* Fix documentation

* Few more

* More doc fixes (#228)

* More doc fixes

* A few nits

* Pass python path to Dprep (#232)

* remove Dprep* dll from wheel (#235)

* remove Dprep* dll from wheel

* Move Dprep calls into separate class

* test

* remove DprepLoader

* clean unused code (#236)

* clean unused code

* fix tests changes due to seed changes

* remove max_slots from graph

* delete Dprep dlls from python2.7

* fix linux extended tests for TensorFlow

* fix tests

* fix tests

* rollback

* fix tests

* disable estimator check

* fix tests

* fix tests again

* fix tabbing
removing -r from rm command

* remove experimental

* Enable scoring of ML.NET models saved with new TransformerChain format (#230)

* Handle new ML.NET model format for predictions

* fix

* use with{} statement with ZipFile

* Add initial implementation of DatasetTransformer. (#240)

* Update release-next for the 1.4 release. (#252)

* Update release-next.md

* Upgrade to ML.NET 1.4 (#251)

* Upgrade to ML.NET 1.4

* preview bits

* update refs

* Fix casing for the installPythonPackages build.sh argument. (#256)

* Rename lambda_ to l2_regularization in LinearSvmBinaryClasifier (#259)

* Initial implementation of csr_matrix output support. (#250)

* Initial implementation of csr_matrix output support.

* Whitespace change to kick off another build. The CentOs test run crashed.

* Rename as per comment

* Initial implementation of LpNormalizer. (#253)

* Initial implementation of LpNormalizer.

* Rename to LpScaler

* fix build

* fix casing

* up version (#262)

* Remove scikit-learn testing module from normal flow (#265)

* remove scikit learn testing module from normal flow

* fix build

* fix build

* Fix issue when using predict_proba or decision_function with combined models. (#272)

* Output predictor model file optionally (#270)

* Output predictor model file optionally

* fix comment

* fix unit tests

* Draft of ColumnConcat transform that takes in a prefix instead

* fix test

* fix test

* PrefixColumnConcat transform

* fix entrypoint namespace

* fix exception

* Handle no match scenario

* add exampl & test

* add test

* fix comments

* fix comments

* fix example

* Providing error message to python in exception (#273)

* spit out error message to python
upgrade patch version

* fix the test

* another test

* rollback

* Add I8 support to CSR matrix output. (#276)

* Get column names for transform model (#278)

* draft for schema

* resolve conflict

* debug pieces

* Few perf tricks

* rollback prints

* few perf tricks

* perf tricks

* fix csr

* set 0 byte

* Update schema example.

* Convert return value to list.

* Update schema example to use new list return value.

* Fix naming in Pipeline.get_schema.

* Add initial unit tests for Pipeline.get_schema().

* Check length in Pipeline.get_schema unit tests.

* few perf tricks

* fix linux tests

* rollback

* Temporarily use 'inclusive' test instead of positional test for columns since order is not valid in Python 2.6 and 3.5.

* fix comments

* Add variable length vector support (#267)

* Update Schema.py to remove the non-ASCII character (#291)

* Fix Pipeline._extract_classes_from_headers was not checking for valid steps. (#292)

* Save predictor_model when pickling a pipeline. (#295)

* Initial implementation of the WordTokenizer transform. (#296)

* Remove summary validation in Pipeline and enable the summary tests for the tree based predictors. (#298)

* Turn on dprep unit tests for all platforms and python versions except 2.7 (#303)

* Fix bug in Pipeline.transform() (#294)

* Remove unnecessary code from Pipeline.transform that was causing a bug

* Update release-next.md

* Remove y argument from transform() method

* Update release-next.md

* Fix test

* Fixed building of NimbusML with Python 3.5 on Windows (and other versions of Python) (#297)

* Update Schema.py to remove the non-ASCII character

* Update build.cmd

* Update build.cmd

* Update build.cmd

* Revert "Update build.cmd"

This reverts commit cb79b9d.

* Upgreate pip for all Python versions

* Update release notes. (#306)

* Added libtensorflow_framework.so.1 (#310)

* Add Permutation Feature Importance (PFI) (#279)

* Add PFI entrypoint

* Add PFI to Pipeline and BasePipelineItem, and examples

* Improved docs and sample

* Load model as PredictorModel, and remove label column and group ID column from EP inputs

* schema example reference

* Add test

* nit

* Update release-next.md

* Add tests to check PFI from loaded model

* Make SgdBinaryClassifier deterministic in test_estimator_checks.py

* Update ML.NET nugets to 1.4.0-preview2 and 0.16.0-preview2

* Fix test baseline values

* Fix Ranking PFI column names to work with with Py2.7 and Py3.5

* Initial implementation of DateTime input and output column support. (#290)

* Add support for DateTime output.

* Add support for DateTime input columns.

* Add unit test for DateTime column input and output.

* Fix DateTime.Kind == Unspecified output from dprep.

* Update the csproj files to point to the latest nuget packages.

* Update the Tensorflow.NET library version.

* Fix azureml dprep not available for Python 2.7

* Fix missing sys import.

* Fix broken assertEqual on Python 3.5.

* Fix BinaryDataStream not valid as input for transformer. (#307)

* Add test for fitting a BinaryDataStream.

* Use BinaryDataStream schema for retrieving feature columns in _init_graph_nodes.

* Add idv schema to BinaryDataStream.

* Fix DprepDataStream was passing in incorrect value to base class constructor.

* Remove column position check from unit test since it is unreliable on Python 3.5 and 2.7.

* Issue 300 (#311)

* Temporarily change running Mac pipeline to Python 3.6

* Temporary addition to view state of "result" in MacOS with Python 3.6

* Added additional temporary Python builds on Mac

* Added libtensorflow_framework.so.1 (#310)

* Revert "Temporary addition to view state of "result" in MacOS with Python 3.6"

This reverts commit d116dc8.

* Updated test_data_with_missing.test_input_conversion_to_float()

* Update test_data_with_missing.py

* Revert "Added additional temporary Python builds on Mac"

This reverts commit 1aa1526.

* Revert "Temporarily change running Mac pipeline to Python 3.6"

This reverts commit 4ec36fb.

* allow csr_matrix as input to predict_proba() (#305)

* draft

* draft

* rollback

* new entrypoint

* add assert

* rollback

* no print in test

* up version

* only Single type is allowed for Feature vector

* fix comments, rename entrypoint

* convert to single

* fix type

* add feature contribution test

* rename pipeline.get_schema() to pipeline.gat_output_columns()

* fix build

* Update release notes. (#312)

* Turn off shuffling for FactorizationMachineBinaryClassifier. (#316)

* Fix imports

* Fix few more conflicts and build

* Fix one more import

* Fix nimbusml.pyroj
pieths added a commit that referenced this pull request Jan 24, 2020
* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version

* Turn off shuffling for FactorizationMachineBinaryClassifier. (#316)

* Initial implementation of NGramExtractor. (#320)

* Disable check which prevents artifacts from being generated by pull requests. (#330)

* Update ManifestGenerator. (#329)

* Added "# -- coding: utf-8 --" to preserve the character `␂` while guaranteeing successful builds with Python 2.7 (#328)

* Replaced the non-ASCII characters

* Revert "Replaced the non-ASCII characters"

This reverts commit 4adb28c.

* Update NGramExtractor_df.py

* Updating coding of Schema.py to preserve the character "␂"

* To re-run build tests

* To re-run build tests

* Edited encoding

* Rerun build tests

* Rerun build tests

* Added utf-8 encoding to NGramExtractor.py (#339)

* Image.py and Image_df.py extended testing examples are now supported on Ubuntu and CentOS (#338)

* Remove skipping of Image.py and Image_df.py

* Add libraries required for running Image.py and Image_df.py in Linux machines

* Update build.sh

* Add third party notices to package description on PyPI (#341)

* Add third party notices to package description on PyPI

* update

* update

* Add 1.5 (#344)

* Add info to README.md (#342)

* Add info to README.md

* update

* Fix DbgWinPy2.7 build which was failing when building NativeBridge. (#340)

* Fix DbgWinPy2.7 build which was failing when building NativeBridge.

Here is one of the error messages:
libboost_numpy-vc140-mt-gd-1_64.lib(ndarray.obj) : error LNK2038:
mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug'
doesn't match value 'MTd_StaticDebug' in DataViewInterop.obj

* Add whitespace change to start new CI run. UbuntuPy36 crashed

* Fix error level when exiting build.cmd. (#345)

* Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and processed HTTP-->HTTPS URL changes (#346)

* Added utf-8 encoding to NGramExtractor.py

* Added HTTP to HTTPS finder and converter

* Changes made by ChangeHttpURLsToHttps.py

* Added copyright statements

* Updated FindHttpURLs.py and ChangeHttpURLsToHttps.py

* Add reports of alterable, nonalterable and invalid URLs

* Revert "Changes made by ChangeHttpURLsToHttps.py"

This reverts commit afa5f35.

* Add URL changes made by ChangeHttpURLsToHttps.py

* Revert "Add URL changes made by ChangeHttpURLsToHttps.py"

This reverts commit b6a2f7f.

* Revert "Add reports of alterable, nonalterable and invalid URLs"

This reverts commit 9121123.

* Update FindHttpURLs.py and ChangHttpURLsToHttps.py

* Add HTTP to HTTPS URL reports

* Changes made by ChangeHttpToHttpsURLs.py

* Revert "Changes made by ChangeHttpToHttpsURLs.py"

This reverts commit 72c85d9.

* Revert "Add HTTP to HTTPS URL reports"

This reverts commit 81c5a96.

* Revert "Update FindHttpURLs.py and ChangHttpURLsToHttps.py"

This reverts commit 038262f.

* Update FindHttpURLs.py and ChangeHttpURLsToHttps.py

* Add URL reports

* Add Http-->Https URL changes through ChangeHttpURLsToHttpsURLs.py

* Removed if __name__ and main() statements

* Revert "Removed if __name__ and main() statements"

This reverts commit ba2742f.

* Update nimbusml.pyproj

* Manually converted two alterable HTTP links to HTTPS.

* Rename ChangeHttpURLsToHttps.py to changeHttpURLsToHttps.py

* Rename FindHttpURLs.py to findHttpURLs.py

* URL in SigmoidKernel.txt is fixed for findHttpURLs.py to recognize it as an alterable URL

* Changed outdated URL as original URL redirected to current URL

* Update Report_InvalidUrls_FindHttpURLs.csv

* Fixing reachable HTTP URLs

* Update findHttpURLs.py

* Updated URL reports, cleared invalid URLs

* Update of report for alterable HTTP URLs after running findHttpURLs.py after running changeHttpURLsToHttps.py

* Removing URL reports for merge

* Renamed URL scripts and reflected this change inside these files (#348)

* Renamed URL scripts and reflected this change inside these files

* Fix small type in change_http_urls_to_https.py

* Updated file names and naming conventions inside files

* Update nimbusml.pyproj

* Updated usage infos of find_http_urls.py and change_to_https.py

* Updated find_http_urls.py and change_to_https.py

* Execute unit tests in parallel (#331)

* Wrap test estimator checks in a python unit test.

* Combine the non-extended test runs together to make them more parallelizable.

* Reverse the tests path args order to try and have test_estimator_checks run earlier in the test run.

* Dynamically generate the test_estimator_checks unit tests.

* Create the test_docs_example unit tests dynamically so they can be parallelized.

* Fix KMeansPlusPlus does not work with a cluster size of 1 when using a debug version of ml.net

* Fix OLS divide by 0 when given a particular set of inputs to fit. This is hidden in release versions of ml.net

* Fix issue when ranking where the output of TextToKeyConverter was
trying to overwrite the $scoredVectorData variable set by
DatasetScorerEx. See test_metrics_evaluate_ranking_group_id_from_existing_column_in_X
for a test which demonstrates the issue. It throws an exception
from EntryPointNode.cs:837 when trying to get the outputs. The
exception was hidden when using release builds of ML.Net.

* Remove a test_estimator_check for OrdinaryLeastSquaresRegressor
since it is causing invalid float values and throwing an exception
which was hidden in release versions of ML.Net but visible in debug.

* Update test_permutation_feature_importance tests to support parallel execution.

* Rerun unit tests one extra time if any failed to check for intermittent failures.

* Decrease the size of the images in the Image and Image_df examples. (#350)

* Update package references to work with the latest versions from nuget.org. (#353)

* Update ML.Net package references to work with RC1

* Update to ML.Net 1.4.0

* Update Microsoft.DataPrep to version 0.0.2.19-preview.

* Downgrade Microsoft.DataPrep to version 0.0.2.3-preview due to issue with missing SqlJdbc package.

* Update nimbusml version to 1.6.0.

* Update release notes. (#354)

* Added Google.Protobuf.dll to Mac and Linux builds (#358)

* Modifications to support scripted temp/docs merging. (#361)

* Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/decoding issue (#359)

* Modified size variable in GetUnicodeTX to -1

* Update DataViewInterop.h

* Fixed spacing in DataViewInterop.h

* Re-enabled skipped test due to Py2.7 encoding/decoding issue

* Removed unnecessary invoking of .sum()

* Revert "Removed unnecessary invoking of .sum()"

This reverts commit e51a64b.

* Initial implementation of the temp_docs_updater script. (#363)

* Update README.md

* Generate PrefixColumnConcatenator with entry point compiler instead of manually. (#364)

* Fix broken docs (#369)

* Fix whitespaces and typos

* tabs and whitespaces

* Removed all references to DSSM in NimbusML (except for in test_wordembedding.py) (#374)

* Added catch for predictors that do not support summary()  (#375)

* Added catch for summary() with FactorizationMachineBinaryClassifier

* Updated test for model summary

* Revert "Updated test for model summary"

This reverts commit 59656fe.

* Update pipeline.py

* Update test_model_summary.py

* Update test_model_summary.py

* Update test_model_summary.py

* Update test_model_summary.py

* Update test_model_summary.py

* Changed wording of error message

* Update Microsoft.DataPrep to the latest version. (#379)

* Create release notes for the 1.6.0 release. (#382)

* Create release notes for version 1.6.0.

* Update 1.6.0 release notes.

* Bump version to 1.6.1 to fix dprep issue. (#385)

* Update to latest version of DataPrep.

* Bump version to 1.6.1 to fix dprep issue.

* Removed "TODO: Replace with CV" comments (#389)

* Disabled tests that only fail on Mac Py2.7 due to string encoding/dec… (#391)

* Disabled tests that only fail on Mac Py2.7 due to string encoding/decoding bug

* Update test_ngramfeaturizer.py

* Add as_csr documentation to the inline docstrings for transform() and fit_transform(). (#392)

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Update to the latest version of ML.Net. (#401)

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Typo fixed on paragraph 15 (#399)

* Typo fixed on paragraph 10 (#398)

* Initial implementation of DateTimeSplitter. Ported from the aml branch.

* Update the transform output formats documentation. (#395)

* Update the transform output formats documentation.

* Add whitespace change to restart CI run. The mac build did not start correctly.

* Add whitespace change to restart CI run. The mac build did not start correctly.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>

* Fixed broken brew command (#402)

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Checking for extended tests

* Update phase-template.yml

* Final touches

* Re-activated NGramFeaturizer2.py (#381)

* Update test_docs_example.py

* Temporary change so that extended tests can be run by PRs

* Revert "Temporary change so that extended tests can be run by PRs"

This reverts commit 3f2b8a3.

* Temporary change to be able to view extended tests' status with manual PRs

* Update .vsts-ci.yml

* Update .vsts-ci.yml

* Update .vsts-ci.yml

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>

* Fix missing import in test_datetimesplitter.

* Fix issue with ColumnSelector when dropping columns after DateTimeSplitter.

* Contributing: Fix a typo (#406)

* Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (#407)

Note, this modification only handles intermittent crashes on Ubuntu/Mac unit test runs. It does not handle situations where the build hangs and never returns control to the build script.

* Fix issue when specifying split_start='after_transforms' with CV.fit() (#410)

* Use latest ML.Net dev packages from MachineLearning feed.

* Re-enable the default nuget.org feed. It does not appear to cause
any conflicts with getting the latest packages so long as the * is
used in the PackageReference Version attributes. Keeping this enabled
will allow other packages which are not part of the the MachineLearning
feed to be retrieved (ie. Microsoft.MLFeaturizers).

* Add whitespace change to restart CI build. Linux timed out.

* Fix build issue when using pip version >= 20.0.0

* Fix build issue caused by latest version of pip (>=20.0.0) (#412)

* Remove local-nuget-packages, fix build and test_estimator_checks failures.

* Remove DateTimeSplitter duplicates in nimbusml.pyproj

* Remove duplicate ML.Featurizers import.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <balmustafa117@gmail.com>
Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com>
Co-authored-by: Darío Hereñú <magallania@gmail.com>
Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>
ganik added a commit that referenced this pull request Mar 12, 2020
* Native featurizers for AutoML (#317)

* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version

* Update aml branch. (#415)

* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version

* Turn off shuffling for FactorizationMachineBinaryClassifier. (#316)

* Initial implementation of NGramExtractor. (#320)

* Disable check which prevents artifacts from being generated by pull requests. (#330)

* Update ManifestGenerator. (#329)

* Added "# -- coding: utf-8 --" to preserve the character `␂` while guaranteeing successful builds with Python 2.7 (#328)

* Replaced the non-ASCII characters

* Revert "Replaced the non-ASCII characters"

This reverts commit 4adb28c.

* Update NGramExtractor_df.py

* Updating coding of Schema.py to preserve the character "␂"

* To re-run build tests

* To re-run build tests

* Edited encoding

* Rerun build tests

* Rerun build tests

* Added utf-8 encoding to NGramExtractor.py (#339)

* Image.py and Image_df.py extended testing examples are now supported on Ubuntu and CentOS (#338)

* Remove skipping of Image.py and Image_df.py

* Add libraries required for running Image.py and Image_df.py in Linux machines

* Update build.sh

* Add third party notices to package description on PyPI (#341)

* Add third party notices to package description on PyPI

* update

* update

* Add 1.5 (#344)

* Add info to README.md (#342)

* Add info to README.md

* update

* Fix DbgWinPy2.7 build which was failing when building NativeBridge. (#340)

* Fix DbgWinPy2.7 build which was failing when building NativeBridge.

Here is one of the error messages:
libboost_numpy-vc140-mt-gd-1_64.lib(ndarray.obj) : error LNK2038:
mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug'
doesn't match value 'MTd_StaticDebug' in DataViewInterop.obj

* Add whitespace change to start new CI run. UbuntuPy36 crashed

* Fix error level when exiting build.cmd. (#345)

* Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and processed HTTP-->HTTPS URL changes (#346)

* Added utf-8 encoding to NGramExtractor.py

* Added HTTP to HTTPS finder and converter

* Changes made by ChangeHttpURLsToHttps.py

* Added copyright statements

* Updated FindHttpURLs.py and ChangeHttpURLsToHttps.py

* Add reports of alterable, nonalterable and invalid URLs

* Revert "Changes made by ChangeHttpURLsToHttps.py"

This reverts commit afa5f35.

* Add URL changes made by ChangeHttpURLsToHttps.py

* Revert "Add URL changes made by ChangeHttpURLsToHttps.py"

This reverts commit b6a2f7f.

* Revert "Add reports of alterable, nonalterable and invalid URLs"

This reverts commit 9121123.

* Update FindHttpURLs.py and ChangHttpURLsToHttps.py

* Add HTTP to HTTPS URL reports

* Changes made by ChangeHttpToHttpsURLs.py

* Revert "Changes made by ChangeHttpToHttpsURLs.py"

This reverts commit 72c85d9.

* Revert "Add HTTP to HTTPS URL reports"

This reverts commit 81c5a96.

* Revert "Update FindHttpURLs.py and ChangHttpURLsToHttps.py"

This reverts commit 038262f.

* Update FindHttpURLs.py and ChangeHttpURLsToHttps.py

* Add URL reports

* Add Http-->Https URL changes through ChangeHttpURLsToHttpsURLs.py

* Removed if __name__ and main() statements

* Revert "Removed if __name__ and main() statements"

This reverts commit ba2742f.

* Update nimbusml.pyproj

* Manually converted two alterable HTTP links to HTTPS.

* Rename ChangeHttpURLsToHttps.py to changeHttpURLsToHttps.py

* Rename FindHttpURLs.py to findHttpURLs.py

* URL in SigmoidKernel.txt is fixed for findHttpURLs.py to recognize it as an alterable URL

* Changed outdated URL as original URL redirected to current URL

* Update Report_InvalidUrls_FindHttpURLs.csv

* Fixing reachable HTTP URLs

* Update findHttpURLs.py

* Updated URL reports, cleared invalid URLs

* Update of report for alterable HTTP URLs after running findHttpURLs.py after running changeHttpURLsToHttps.py

* Removing URL reports for merge

* Renamed URL scripts and reflected this change inside these files (#348)

* Renamed URL scripts and reflected this change inside these files

* Fix small type in change_http_urls_to_https.py

* Updated file names and naming conventions inside files

* Update nimbusml.pyproj

* Updated usage infos of find_http_urls.py and change_to_https.py

* Updated find_http_urls.py and change_to_https.py

* Execute unit tests in parallel (#331)

* Wrap test estimator checks in a python unit test.

* Combine the non-extended test runs together to make them more parallelizable.

* Reverse the tests path args order to try and have test_estimator_checks run earlier in the test run.

* Dynamically generate the test_estimator_checks unit tests.

* Create the test_docs_example unit tests dynamically so they can be parallelized.

* Fix KMeansPlusPlus does not work with a cluster size of 1 when using a debug version of ml.net

* Fix OLS divide by 0 when given a particular set of inputs to fit. This is hidden in release versions of ml.net

* Fix issue when ranking where the output of TextToKeyConverter was
trying to overwrite the $scoredVectorData variable set by
DatasetScorerEx. See test_metrics_evaluate_ranking_group_id_from_existing_column_in_X
for a test which demonstrates the issue. It throws an exception
from EntryPointNode.cs:837 when trying to get the outputs. The
exception was hidden when using release builds of ML.Net.

* Remove a test_estimator_check for OrdinaryLeastSquaresRegressor
since it is causing invalid float values and throwing an exception
which was hidden in release versions of ML.Net but visible in debug.

* Update test_permutation_feature_importance tests to support parallel execution.

* Rerun unit tests one extra time if any failed to check for intermittent failures.

* Decrease the size of the images in the Image and Image_df examples. (#350)

* Update package references to work with the latest versions from nuget.org. (#353)

* Update ML.Net package references to work with RC1

* Update to ML.Net 1.4.0

* Update Microsoft.DataPrep to version 0.0.2.19-preview.

* Downgrade Microsoft.DataPrep to version 0.0.2.3-preview due to issue with missing SqlJdbc package.

* Update nimbusml version to 1.6.0.

* Update release notes. (#354)

* Added Google.Protobuf.dll to Mac and Linux builds (#358)

* Modifications to support scripted temp/docs merging. (#361)

* Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/decoding issue (#359)

* Modified size variable in GetUnicodeTX to -1

* Update DataViewInterop.h

* Fixed spacing in DataViewInterop.h

* Re-enabled skipped test due to Py2.7 encoding/decoding issue

* Removed unnecessary invoking of .sum()

* Revert "Removed unnecessary invoking of .sum()"

This reverts commit e51a64b.

* Initial implementation of the temp_docs_updater script. (#363)

* Update README.md

* Generate PrefixColumnConcatenator with entry point compiler instead of manually. (#364)

* Fix broken docs (#369)

* Fix whitespaces and typos

* tabs and whitespaces

* Removed all references to DSSM in NimbusML (except for in test_wordembedding.py) (#374)

* Added catch for predictors that do not support summary()  (#375)

* Added catch for summary() with FactorizationMachineBinaryClassifier

* Updated test for model summary

* Revert "Updated test for model summary"

This reverts commit 59656fe.

* Update pipeline.py

* Update test_model_summary.py

* Update test_model_summary.py

* Update test_model_summary.py

* Update test_model_summary.py

* Update test_model_summary.py

* Changed wording of error message

* Update Microsoft.DataPrep to the latest version. (#379)

* Create release notes for the 1.6.0 release. (#382)

* Create release notes for version 1.6.0.

* Update 1.6.0 release notes.

* Bump version to 1.6.1 to fix dprep issue. (#385)

* Update to latest version of DataPrep.

* Bump version to 1.6.1 to fix dprep issue.

* Removed "TODO: Replace with CV" comments (#389)

* Disabled tests that only fail on Mac Py2.7 due to string encoding/dec… (#391)

* Disabled tests that only fail on Mac Py2.7 due to string encoding/decoding bug

* Update test_ngramfeaturizer.py

* Add as_csr documentation to the inline docstrings for transform() and fit_transform(). (#392)

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Update to the latest version of ML.Net. (#401)

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Typo fixed on paragraph 15 (#399)

* Typo fixed on paragraph 10 (#398)

* Initial implementation of DateTimeSplitter. Ported from the aml branch.

* Update the transform output formats documentation. (#395)

* Update the transform output formats documentation.

* Add whitespace change to restart CI run. The mac build did not start correctly.

* Add whitespace change to restart CI run. The mac build did not start correctly.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>

* Fixed broken brew command (#402)

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Update phase-template.yml

* Checking for extended tests

* Update phase-template.yml

* Final touches

* Re-activated NGramFeaturizer2.py (#381)

* Update test_docs_example.py

* Temporary change so that extended tests can be run by PRs

* Revert "Temporary change so that extended tests can be run by PRs"

This reverts commit 3f2b8a3.

* Temporary change to be able to view extended tests' status with manual PRs

* Update .vsts-ci.yml

* Update .vsts-ci.yml

* Update .vsts-ci.yml

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>

* Fix missing import in test_datetimesplitter.

* Fix issue with ColumnSelector when dropping columns after DateTimeSplitter.

* Contributing: Fix a typo (#406)

* Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (#407)

Note, this modification only handles intermittent crashes on Ubuntu/Mac unit test runs. It does not handle situations where the build hangs and never returns control to the build script.

* Fix issue when specifying split_start='after_transforms' with CV.fit() (#410)

* Use latest ML.Net dev packages from MachineLearning feed.

* Re-enable the default nuget.org feed. It does not appear to cause
any conflicts with getting the latest packages so long as the * is
used in the PackageReference Version attributes. Keeping this enabled
will allow other packages which are not part of the the MachineLearning
feed to be retrieved (ie. Microsoft.MLFeaturizers).

* Add whitespace change to restart CI build. Linux timed out.

* Fix build issue when using pip version >= 20.0.0

* Fix build issue caused by latest version of pip (>=20.0.0) (#412)

* Remove local-nuget-packages, fix build and test_estimator_checks failures.

* Remove DateTimeSplitter duplicates in nimbusml.pyproj

* Remove duplicate ML.Featurizers import.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <balmustafa117@gmail.com>
Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com>
Co-authored-by: Darío Hereñú <magallania@gmail.com>
Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>

* Fix build and test failures in the aml branch. (#418)

* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Initial implementation of DateTimeSplitter. Ported from the aml branch.

* Fix missing import in test_datetimesplitter.

* Fix issue with ColumnSelector when dropping columns after DateTimeSplitter.

* Use latest ML.Net dev packages from MachineLearning feed.

* Re-enable the default nuget.org feed. It does not appear to cause
any conflicts with getting the latest packages so long as the * is
used in the PackageReference Version attributes. Keeping this enabled
will allow other packages which are not part of the the MachineLearning
feed to be retrieved (ie. Microsoft.MLFeaturizers).

* Add whitespace change to restart CI build. Linux timed out.

* Fix build issue when using pip version >= 20.0.0

* Remove local-nuget-packages, fix build and test_estimator_checks failures.

* Remove DateTimeSplitter duplicates in nimbusml.pyproj

* Remove duplicate ML.Featurizers import.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>

* Fix build issues with aml branch (#419)

* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Initial implementation of DateTimeSplitter. Ported from the aml branch.

* Fix missing import in test_datetimesplitter.

* Fix issue with ColumnSelector when dropping columns after DateTimeSplitter.

* Use latest ML.Net dev packages from MachineLearning feed.

* Re-enable the default nuget.org feed. It does not appear to cause
any conflicts with getting the latest packages so long as the * is
used in the PackageReference Version attributes. Keeping this enabled
will allow other packages which are not part of the the MachineLearning
feed to be retrieved (ie. Microsoft.MLFeaturizers).

* Add whitespace change to restart CI build. Linux timed out.

* Fix build issue when using pip version >= 20.0.0

* Remove local-nuget-packages, fix build and test_estimator_checks failures.

* Remove DateTimeSplitter duplicates in nimbusml.pyproj

* Remove duplicate ML.Featurizers import.

* Fix incorrect featurizers library on Mac builds.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>

* Fix issues with centos unit tests related to featurizers. (#420)

* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version

* Update to the latest version of ML.Net.

* Whitespace change to start a new CI run to see if the mac build is working again.

* Initial implementation of DateTimeSplitter. Ported from the aml branch.

* Fix missing import in test_datetimesplitter.

* Fix issue with ColumnSelector when dropping columns after DateTimeSplitter.

* Use latest ML.Net dev packages from MachineLearning feed.

* Re-enable the default nuget.org feed. It does not appear to cause
any conflicts with getting the latest packages so long as the * is
used in the PackageReference Version attributes. Keeping this enabled
will allow other packages which are not part of the the MachineLearning
feed to be retrieved (ie. Microsoft.MLFeaturizers).

* Add whitespace change to restart CI build. Linux timed out.

* Fix build issue when using pip version >= 20.0.0

* Remove local-nuget-packages, fix build and test_estimator_checks failures.

* Remove DateTimeSplitter duplicates in nimbusml.pyproj

* Remove duplicate ML.Featurizers import.

* Fix incorrect featurizers library on Mac builds.

* Fix centos unit test issues with featurizers.

Co-authored-by: Gani Nazirov <ganinz@hotmail.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>

* Add support for ONNX model export and execution. Merge to AML branch (#421)

* Add initial implementation of the export to ONNX functionality.

* Update the Microsoft.ML.OnnxConverter version in Platforms/build.csproj

* Add test for verifying onnx export support.

* Update the onnx conversion to be compatible with the latest changes
in pull quest dotnet/machinelearning#3986.

* Fix a few of the issues with test_export_to_onnx.

* Add onnxruntime.dll to the NimbusML python package. It is already included in the Linux and Mac builds.

* Initial implementation of the OnnxRunner transform.

* Fix missing reference to models_onnxconverter in nimbusml.pyproj.

* Exclude OnnxRunner from the test_export_to_onnx tests.

* Remove OnnxRunner from test_estimator_checks.

* Add back in OnnxConverter reference which was accidentally removed in merge.

* Update onnx export test. TypeConverter, MeanVarianceScaler, MinMaxScaler no longer require experimental flag.

* Pretty print the output of test_export_to_onnx.

* Update to the latest version of ML.Net.

* Update supported estimators in test_export_to_onnx.

* Use the latest nightly builds for the ML.Net packages.

* fix tests

* fix test

* Add example for OnnxRunner. (#422)

* Build fix for rolling ML.NET 1.5.0-preview* and update to Pandas 1.0 (#437)

* Updates for mlnet rolling build 1.5.0-preview2-28612-3

* Update pyproj

* Update tests for pandas 1.0.1

* Skip check_dtype_object in TestEstimatorChecks due to pandas 1.0.0 removing Series.itemsize

* Re-enable check_dtype_object and fix underlying issue causing it to fail

* Remove label column from features when no Y is specified and predictor supports labels. (#439)

* Fix breaking unit tests. (#440)

* Update test_export_to_onnx test. (#443)

* Update test_export_to_onnx test. (#444)

* Fix NGramFeaturizer test

* fix .0 (#445)

* Add OneVsRest support to export to onnx tests and increase test coverage. (#446)

* Automatically convert Categorical columns to their values before comparison in ONNX export tests. (#447)

* add ORT results

* Add ORT & vinod script (#449)

* Add ORT validation to the export to onnx tests. (#451)

* Remove unnecessary import. (#452)

* Update data_frame_tool.py (#454)

* Fixes for dataframe tool (#455)

* add ORT results

* fixes to dataframe tool and vinod

* typos fixes

* rollback

* Fixed data_frame_tool to handle category columns correctly (#456)

* Few fixes for IDV and DF formats

* rollback

* Regenerate entrypoint & api

* Up version and fix test

* Added Async suffix to RunOnBackgroundThread (#459)

Added Async suffix to RunOnBackgroundThread

* Update entrypoints and MarshallInvoke call (#461)

* Update manifest.json

* Update VariableColumnTransform.cs

* Updated entrypoints

* Update to use OnnxRuntime 1.2 (#462)

* Updated ORT dependencies

* Updated ORT Feed

* Updated ORT tests for GPU

* Revert "Updated ORT Feed"

This reverts commit 76680f1.

* Revert "Updated ORT tests for GPU"

This reverts commit ae55b45.

* Upgrade CI build to use latest onnxruntime and automl scenario based … (#463)

* Upgrade CI build to use latest onnxruntime and automl scenario based test

* simplify

Co-authored-by: Gani Nazirov <ganaziro@microsoft.com>

* dont run onnxruntime for python2.7

* fix automl test

* Remove py2.7 Windows from CI build as latest pytest & pip are not supported anymore for Python 2.7

* fix typo

* remove daily build location

* use only nuget.org

Co-authored-by: pieths <pieths.dev@gmail.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <balmustafa117@gmail.com>
Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com>
Co-authored-by: Darío Hereñú <magallania@gmail.com>
Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>
Co-authored-by: Gani Nazirov <ganaziro@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants