Turn off shuffling for FactorizationMachineBinaryClassifier. #316

pieths · 2019-10-09T18:56:12Z

Attempt to fix the intermittent test_estimator_checks failures with FactorizationMachineBinaryClassifier.

* Update readme with latest feedback (#39) Updating readme with latest feedback. * Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40) * Initial checkin * Move to Hosted Mac pool * Update README.md * Manually copied naming changes over from master. * Revert "Merge remote-tracking branch 'upstream/temp/docs'" This reverts commit 93c7347, reversing changes made to 2350069. * Improve documentation regarding contributors. * Fix email address. * Create CODE_OF_CONDUCT.md * Update issue templates * Create PULL_REQUEST_TEMPLATE.md * Update issue templates * Update issue templates * Update issue templates * Fixing link in CONTRIBUTING.md (#44) * Update contributing.md link. (#43) * Initial checkin for ML.NET 0.7 upgrade * fix tests * put back columndropper * fix tests * Update scikit-learn links to use https instead of http * restart dotnetcore2 package work * fix build * fix mac & linux * fix build * fix build * dbg build * fix build * fix build * handle py 2.7 * handle py27 * fix py27 * fix build * fix build * fix build * ensure dependencies * ignore exceptions from ensure dependencies * up version * Update cv.py add case for X is data frame * Update cv.py add a space * add a test for cv with data frame * set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error * fix build * up version * Add instructions for editing docstrings. (#51) * Add instructions for editing docstrings. * Add footnote giving more information. * Fix build failures caused by dotnetcore2 module. (#67) * Fix importing of the dotnetcore2 module because it has inconsistent folder naming. * Fix file check for unix platforms. * Fix indentation levels. * Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (#69) * Increase version to 0.6.5. (#71) * Update clr helper function to search multiple folders for clr binaries. (#72) * Update clr helper function to search multiple folders for clr binaries. * Moved responsiblity for Python version checking to utility functions. * Add clarifying comments. * Fix call to get_nimbusml_libs() * fix drop column param name * Remove restricted permissions on build.sh script. * Fix lightgbm test failures by updating runtime dependencies. * fix TensorFlowScorer model_location paramter name * Fix build.sh defaults so that it detects when running on a mac. * Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests * fix tests * fix pyproj test * fix win 3.6 build * fix comments * expose "parallel" to the fit/fit_transform function by including **param to the argument * add a test for the parallel * update parallel thread * fix tests comparison * Update thread, retry build * modify tests * specify pytest-cov version * update pytest-cov version in build command for linux * for windows use the latest pytest-cov * Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET) * Changed the keys to be the same as other internal repos * Changed the key filename * Update to ML.NET 0.10.preview (#77) * Updating ML.NET nugets to latest 0.9 preview. * --generate_entrypoints phase 1 * Fixed Models.CrossValidator * Updated all entrypoints * New manifest.json, picket from Monte's branch * Updated API codegen * Replace ISchema and SchemaImpl with Schema and SchemaBuilder. * Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder." This reverts commit dcd749d. * Refactor IRowCursor to RowCursor. * Update ML.NET version in build.csproj. * Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11 * Updated RunGraph.cs to ml.net 0.10 * Refactor Vbuffer * Added override to RowCursor methods * Update to NimbusML-privileged nugets from ML.NET. * Update to Microsoft.ML namespace without Runtime. * Schema and VBuffer fixes in NativeDataInterop. * API fixes for IRandom and IsText in RmlEnvironment and NativeDataView. * Work on getting VBuffer pointers from Spans. * Some VBuffer fixes * fix some class names * Fix Register Assembly names. * Remove ML.PipelineInference * fixed more classes * Add back columndropper for backward compatability. * Register Entrypoints assembly in environment. * Fix homebrew update problem on VS Hosted Mac images. * Updated all the nuget versions to be the same. * Attempt to fix the dataframe unit tests * Fixed test_pyproj * Optimized VBuffer changes * Changed bridge version value to 0.10 * Addressed PR comments * Simplify by using six.string_types (#89) * Simplify by using six.string_types * Force a retest * Removed ISchema from DotNetBridge (#90) * Removed ISchema * Fixed the tests * Addressed PR comments * Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn. * add configuration for python 3.7 (#101) * add configuration for python 3.7 * fix broken unit test * Update build.sh * fix build for Windows * Linux py3.7 build * fix pytest version * upgrade pytest * fix pytest-cov version * fix isinstance(., int) for python 2.7 * build urls for Mac * final fixes * fix libomp * Removing 3.7 for now as its not in PyPI * Upgrade to ML.NET version 1.0.0 (#100) * ref v0.10 ML.NET * fix build * hook up to v0.11.0 ML.NET * fix build errors * fix build * include Microsoft.Data.DataView.dll in build * typo * remove protobuf dll * Regenerate code due to manifest changes * fix missing ep * Update to ML.NET 1.0.0-preview * fix .net build * update nuget for ML.NET * remove Data namespace dll * rollback nuget changes * move to final RC ML.NET * Regenerate classes as per updated manifest * fix maximum_number_of_iterations param name * fix parameter names * fix names * reference official v1.0 of ML.NET * fix tests * fix label column * Fix tests * fix lightgbm tests * fix OLS * fix tests * fix more tests * fix more tests * fix weight column name * more tests * fix normalized metrics * more errors * Fix CV * rename feature_column to feature_column_name * fix cv ranker * Fix lightgbm tests * fix changes due to upgrade of NGramFeaturizer * fix ngram featurizer * fix FactorizationMachine assert error * disable test which is not working now due to change in LightGbm version * fix model name * typo * handle nan in arrays * fix tests * fix tests * fix more tests * fix data type * fix AUC exception * kick the build * fix tests due to data change * fix ngram test * fix mutual info tests * copy libiomp lib * fix mac build * disable SymSgdNative for now * disable SymSgdBinary classifier tests for Linux * fix linux tests * fix linux tests * try linux * fix linux * skip SymSgdBinaryClassifier checks * fix entrypoint compiler * fix entry point generation * fix example tests run * fix typo * fix documentation regression * fix parameter name * fix examples * fix examples * fix tests * fix tests * fix linux * kick build * Fix code_fixer * fix skip take filters * fix estimator checks * Fix latest Windows build issues. (#105) * Fix build issue on Windows when VS2019 is installed. Note: The -version option could not be added directly to the FOR command due to a command script parsing issue. * Add missing arguments to fix build issue with latest version of autoflake. * Fixes #50 - summary() fails if called a second time. (#107) * Fixes #50 - summary() fails if called a second time. * Fixes #99. Do not use hardcoded file separator. (#108) Fixes #99. Do not use hard coded file separator. * Delete the cached summaries when refitting a pipeline or a predictor. (#109) * Fix build issue on Windows when VS2019 is installed. Note: The -version option could not be added directly to the FOR command due to a command script parsing issue. * Add missing arguments to fix build issue with latest version of autoflake. * Delete the cached summaries when refitting a pipeline or a predictor. Fixes #106 * Simplify the code that deletes cached summaries when calling fit. * Fix signature import error when using latest version of scikit-learn. (#116) * Fix signature import error when using latest version of scikit-learn. Fixes #111 * Move the conditional import of the signature method in to the utils package. * Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade the pytest-remotedata package to fix missing attribute error. (#121) * Upgrade the pytest-remotedata package to fix missing attribute error. Fixes #117 * Remove the RlsMacPy3.6 configuration from .vsts-ci.yml. * Upgrade version (#122) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade version * Support quoted strings by default (#124) * upgrade to ML.NET 1.1 (#126) * upgrade to ML.NET 1.1 * by default quote is + * assert changes due to quote * fix tensor flow example * Put long running tests in to their own folder to shorten build times. (#136) * Temporarily remove the dataframe examples from the test run to see how much that effects the test length. * Remove all examples from the tests to see how it impacts the CI run. * Put long running tests in to their own folder to shorten build times. * Update nimbusml.pyproj to reflect the newly moved test files. Forgot to save the nimbusml.pyproj in visual studio. * Expose ML.NET SSA & IID spike & changepoint detectors. (#135) * Initial creation of the IidSpikeDetector files to see what works and what doesn't. * Import the Microsoft.ML.TimeSeries assembly in to the project. * Use 'PassAs' in manifest.json to fix the source parameter name. * Use float32 for data dtype in IidSpikeDetector example. * Convert IidSpikeDetector to a standard transform. Add examples and tests. * Add pre-transform to IidSpikeDetector to fix incompatible data types. * Fix issues with the test_estimator_checks IidSpikeDetector tests. * Remove unnecessary TypeConverter import in IidSpikeDetector example. * Initial implementation of IidChangePointDetector. * Initial implementation of SsaSpikeDetector. * Initial implementation of SsaChangePointDetector. * Fix incorrect SsaSpikeDetector instance in test_estimator_checks. * Fix a few minor issues with time series unit tests and examples. (#139) * Skip Image.py and Image_df.py tests for Ubuntu 14 (#149) * * Fixed the script for generating the documentation (#144) * Moved _static to ci_script to solve an error while using sphinx * Removed amek_md.bat and merge the commands of it to make_yaml.bat * Moved metrics.rst to concepts * Rename time_series package to timeseries. (#150) * Fixed the issue of Ubuntu14 not skipping Image.py and Image_df.py (#161) * Updated CharTokenizer.py example (#153) * Skip CharTokenizer.py for extended tests (#163) * Add support for returning custom values when overriding Pipeline.predict. (#155) * Initial creation of the release-next.md file. (#165) * Initial creation of the release-next.md file. * Point the time series example links to the head of the master branch. * Initial implementation of the SsaForecaster entry point. (#164) * Final updates for release 1.2.0 (#167) * Update the LightGbm entry point with the latest version from the manifest. * Add SsaForecasting examples to the release notes. * Add documentation modification to the release notes. * Create the official 1.2.0 release notes. They have been put in the docs/release-notes folder to closely match the ml.net directory structure. * Add correct version to the release notes title. * Re-enable the SsaForecaster tests. * Update to the latest version of ml.net. Update the NimbusML version. * Fix issues with the summary unit tests. * Comment out the SymSgdBinaryClassifier summary test. It does not appear to be working on linux. * Revert change b5eb937 to see if it (#168) fixes the signed build issue. * Bring back build.cmd commit. It did not fix the signed build issue. (#169) * Revert change b5eb937 to see if it fixes the signed build issue. * Bring back commit b5eb937. It did not fixed the signed build issue. * Bring back the build.cmd change from b5eb937. (#170) It did not fix the signed build issue. * Use restored dotnet CLI for signing (#171) * Update README.md * Enable LinearSvmBinaryClassifier (#180) * Enable LinearSvmBinaryClassifier, add examples, add test, and update docs * Add test for predict_proba() and decision_function() * Setup destructors for data passed to python (#184) * pass destructor to python * indent * Add azureml-dataprep support for dataflow objects (#181) * draft code * draft * delete * add dprep dependency * rollback * rollback * rollback * test & example on using DprepDataStream * add dprep path * add dprep path * fix mlnetpath * optional dependency on dprep * run dprep tests optionally * fix typo * Up sdk version * fix linux dprep tests * up version (#188) * Save the model file when pickling a NimbusML Pipeline. (#189) * Save the model file when pickling a NimbusML Pipeline. * Add version to the pickled Pipeline. * Add the steps attribute to a pickled Pipeline instance. * Add extra unit test for pickled nimbusml pipelines. * Add export_version to pickled base_pipeline_items. Remove unnecessary export_version attribute from an unpickled Pipeline. * Remove stored references to X and y in BasePredictor. (#195) * Remove stored references to X and y in BasePredictor. * Remove unnecessary scikit-learn import. * Add observation level feature contributions to Pipeline and BasePredictor (#196) * Add get_feature_contributions to Pipeline and BasePredictor, add example * Add tests * Update release-next.md * Add classes_ to Pipeline and/or predictor when calling predict_proba. (#200) * Add classes_ to Pipeline and/or predictor when calling predict_proba. * Update test_estimator_checks.py to skip the check_dict_unchanged test for any estimator which supports predict_proba or decision_function. * Update Handler, Filter, and Indicator to automatically convert the input columns to float before performing the transform. (#204) Fixes #203. * Combine models from transforms, predictors and pipelines in to one model. (#208) * Initial test implementation of combining 2 or more models in to one. * Added support to Pipeline.combine_models for combining other types of items and transform only inputs. * Combine Pipeline._evaluation_infer and _evaluation in to one method. This fixes an issue where a classifier graph would not contain the correct nodes after calling Pipeline._predict(). * Missing part of previous check-in. * Fix the Pipeline.combine_models signature to work with Python 2.7. * Fix build (#209) * T * Fix cert * Update release-next.md. (#211) * Update release-next.md * Update release-next.md * Update release-next.md * Add classifier and FileDataStream unit tests to test_pipeline_combining. (#212) Add classifier and FileDataStream unit tests to test_pipeline_combining. * Update release-next.md * up version (#210) * up version * Up the version * renamed factorization lib * remove matrix factorization lib ref * dbg libs * fix libtensorflow framework * package more libs * add mkl proxy * Enable EnsembleClassifier and EnsembleRegressor (#207) * Enable EnsembleClassifier * nit * Enable EnsembleRegressor * Add output combiners * Add sub model selectors * Update examples * Add documentation for components * Add diversity measure * Improve examples * Add tests * Fix test_estimator_checks * Create release notes for version 1.3.0. (#214) * Update release-1.3.0.md * Add --installPythonPackages flag to build scripts (#215) * Add --installPythonPackages flag to build scripts * close if statement in build.sh * fix --runTestsOnly * Fix a bug with the classes_ attribute when no y input is specified during fitting. (#218) Fixes #216 * Add NumSharp.Core.dll (#220) * Add timeseries documentation to the master branch. (#221) * Docs update (#224) * Fix documentation * Few more * More doc fixes (#228) * More doc fixes * A few nits * Pass python path to Dprep (#232) * remove Dprep* dll from wheel (#235) * remove Dprep* dll from wheel * Move Dprep calls into separate class * test * remove DprepLoader * clean unused code (#236) * clean unused code * fix tests changes due to seed changes * remove max_slots from graph * delete Dprep dlls from python2.7 * fix linux extended tests for TensorFlow * fix tests * fix tests * rollback * fix tests * disable estimator check * fix tests * fix tests again * fix tabbing removing -r from rm command * remove experimental * Enable scoring of ML.NET models saved with new TransformerChain format (#230) * Handle new ML.NET model format for predictions * fix * use with{} statement with ZipFile * Add initial implementation of DatasetTransformer. (#240) * Update release-next for the 1.4 release. (#252) * Update release-next.md * Upgrade to ML.NET 1.4 (#251) * Upgrade to ML.NET 1.4 * preview bits * update refs * Fix casing for the installPythonPackages build.sh argument. (#256) * Rename lambda_ to l2_regularization in LinearSvmBinaryClasifier (#259) * Initial implementation of csr_matrix output support. (#250) * Initial implementation of csr_matrix output support. * Whitespace change to kick off another build. The CentOs test run crashed. * Rename as per comment * Initial implementation of LpNormalizer. (#253) * Initial implementation of LpNormalizer. * Rename to LpScaler * fix build * fix casing * up version (#262) * Remove scikit-learn testing module from normal flow (#265) * remove scikit learn testing module from normal flow * fix build * fix build * Fix issue when using predict_proba or decision_function with combined models. (#272) * Output predictor model file optionally (#270) * Output predictor model file optionally * fix comment * fix unit tests * Draft of ColumnConcat transform that takes in a prefix instead * fix test * fix test * PrefixColumnConcat transform * fix entrypoint namespace * fix exception * Handle no match scenario * add exampl & test * add test * fix comments * fix comments * fix example * Providing error message to python in exception (#273) * spit out error message to python upgrade patch version * fix the test * another test * rollback * Add I8 support to CSR matrix output. (#276) * Get column names for transform model (#278) * draft for schema * resolve conflict * debug pieces * Few perf tricks * rollback prints * few perf tricks * perf tricks * fix csr * set 0 byte * Update schema example. * Convert return value to list. * Update schema example to use new list return value. * Fix naming in Pipeline.get_schema. * Add initial unit tests for Pipeline.get_schema(). * Check length in Pipeline.get_schema unit tests. * few perf tricks * fix linux tests * rollback * Temporarily use 'inclusive' test instead of positional test for columns since order is not valid in Python 2.6 and 3.5. * fix comments * Add variable length vector support (#267) * Update Schema.py to remove the non-ASCII character (#291) * Fix Pipeline._extract_classes_from_headers was not checking for valid steps. (#292) * Save predictor_model when pickling a pipeline. (#295) * Initial implementation of the WordTokenizer transform. (#296) * Remove summary validation in Pipeline and enable the summary tests for the tree based predictors. (#298) * Turn on dprep unit tests for all platforms and python versions except 2.7 (#303) * Fix bug in Pipeline.transform() (#294) * Remove unnecessary code from Pipeline.transform that was causing a bug * Update release-next.md * Remove y argument from transform() method * Update release-next.md * Fix test * Fixed building of NimbusML with Python 3.5 on Windows (and other versions of Python) (#297) * Update Schema.py to remove the non-ASCII character * Update build.cmd * Update build.cmd * Update build.cmd * Revert "Update build.cmd" This reverts commit cb79b9d. * Upgreate pip for all Python versions * Update release notes. (#306) * Added libtensorflow_framework.so.1 (#310) * Add Permutation Feature Importance (PFI) (#279) * Add PFI entrypoint * Add PFI to Pipeline and BasePipelineItem, and examples * Improved docs and sample * Load model as PredictorModel, and remove label column and group ID column from EP inputs * schema example reference * Add test * nit * Update release-next.md * Add tests to check PFI from loaded model * Make SgdBinaryClassifier deterministic in test_estimator_checks.py * Update ML.NET nugets to 1.4.0-preview2 and 0.16.0-preview2 * Fix test baseline values * Fix Ranking PFI column names to work with with Py2.7 and Py3.5 * Initial implementation of DateTime input and output column support. (#290) * Add support for DateTime output. * Add support for DateTime input columns. * Add unit test for DateTime column input and output. * Fix DateTime.Kind == Unspecified output from dprep. * Update the csproj files to point to the latest nuget packages. * Update the Tensorflow.NET library version. * Fix azureml dprep not available for Python 2.7 * Fix missing sys import. * Fix broken assertEqual on Python 3.5. * Fix BinaryDataStream not valid as input for transformer. (#307) * Add test for fitting a BinaryDataStream. * Use BinaryDataStream schema for retrieving feature columns in _init_graph_nodes. * Add idv schema to BinaryDataStream. * Fix DprepDataStream was passing in incorrect value to base class constructor. * Remove column position check from unit test since it is unreliable on Python 3.5 and 2.7. * Issue 300 (#311) * Temporarily change running Mac pipeline to Python 3.6 * Temporary addition to view state of "result" in MacOS with Python 3.6 * Added additional temporary Python builds on Mac * Added libtensorflow_framework.so.1 (#310) * Revert "Temporary addition to view state of "result" in MacOS with Python 3.6" This reverts commit d116dc8. * Updated test_data_with_missing.test_input_conversion_to_float() * Update test_data_with_missing.py * Revert "Added additional temporary Python builds on Mac" This reverts commit 1aa1526. * Revert "Temporarily change running Mac pipeline to Python 3.6" This reverts commit 4ec36fb. * allow csr_matrix as input to predict_proba() (#305) * draft * draft * rollback * new entrypoint * add assert * rollback * no print in test * up version * only Single type is allowed for Feature vector * fix comments, rename entrypoint * convert to single * fix type * add feature contribution test * rename pipeline.get_schema() to pipeline.gat_output_columns() * fix build * Update release notes. (#312) * Turn off shuffling for FactorizationMachineBinaryClassifier. (#316) * Fix imports * Fix few more conflicts and build * Fix one more import * Fix nimbusml.pyroj

* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Turn off shuffling for FactorizationMachineBinaryClassifier. (#316) * Initial implementation of NGramExtractor. (#320) * Disable check which prevents artifacts from being generated by pull requests. (#330) * Update ManifestGenerator. (#329) * Added "# -- coding: utf-8 --" to preserve the character `␂` while guaranteeing successful builds with Python 2.7 (#328) * Replaced the non-ASCII characters * Revert "Replaced the non-ASCII characters" This reverts commit 4adb28c. * Update NGramExtractor_df.py * Updating coding of Schema.py to preserve the character "␂" * To re-run build tests * To re-run build tests * Edited encoding * Rerun build tests * Rerun build tests * Added utf-8 encoding to NGramExtractor.py (#339) * Image.py and Image_df.py extended testing examples are now supported on Ubuntu and CentOS (#338) * Remove skipping of Image.py and Image_df.py * Add libraries required for running Image.py and Image_df.py in Linux machines * Update build.sh * Add third party notices to package description on PyPI (#341) * Add third party notices to package description on PyPI * update * update * Add 1.5 (#344) * Add info to README.md (#342) * Add info to README.md * update * Fix DbgWinPy2.7 build which was failing when building NativeBridge. (#340) * Fix DbgWinPy2.7 build which was failing when building NativeBridge. Here is one of the error messages: libboost_numpy-vc140-mt-gd-1_64.lib(ndarray.obj) : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug' doesn't match value 'MTd_StaticDebug' in DataViewInterop.obj * Add whitespace change to start new CI run. UbuntuPy36 crashed * Fix error level when exiting build.cmd. (#345) * Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and processed HTTP-->HTTPS URL changes (#346) * Added utf-8 encoding to NGramExtractor.py * Added HTTP to HTTPS finder and converter * Changes made by ChangeHttpURLsToHttps.py * Added copyright statements * Updated FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add reports of alterable, nonalterable and invalid URLs * Revert "Changes made by ChangeHttpURLsToHttps.py" This reverts commit afa5f35. * Add URL changes made by ChangeHttpURLsToHttps.py * Revert "Add URL changes made by ChangeHttpURLsToHttps.py" This reverts commit b6a2f7f. * Revert "Add reports of alterable, nonalterable and invalid URLs" This reverts commit 9121123. * Update FindHttpURLs.py and ChangHttpURLsToHttps.py * Add HTTP to HTTPS URL reports * Changes made by ChangeHttpToHttpsURLs.py * Revert "Changes made by ChangeHttpToHttpsURLs.py" This reverts commit 72c85d9. * Revert "Add HTTP to HTTPS URL reports" This reverts commit 81c5a96. * Revert "Update FindHttpURLs.py and ChangHttpURLsToHttps.py" This reverts commit 038262f. * Update FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add URL reports * Add Http-->Https URL changes through ChangeHttpURLsToHttpsURLs.py * Removed if __name__ and main() statements * Revert "Removed if __name__ and main() statements" This reverts commit ba2742f. * Update nimbusml.pyproj * Manually converted two alterable HTTP links to HTTPS. * Rename ChangeHttpURLsToHttps.py to changeHttpURLsToHttps.py * Rename FindHttpURLs.py to findHttpURLs.py * URL in SigmoidKernel.txt is fixed for findHttpURLs.py to recognize it as an alterable URL * Changed outdated URL as original URL redirected to current URL * Update Report_InvalidUrls_FindHttpURLs.csv * Fixing reachable HTTP URLs * Update findHttpURLs.py * Updated URL reports, cleared invalid URLs * Update of report for alterable HTTP URLs after running findHttpURLs.py after running changeHttpURLsToHttps.py * Removing URL reports for merge * Renamed URL scripts and reflected this change inside these files (#348) * Renamed URL scripts and reflected this change inside these files * Fix small type in change_http_urls_to_https.py * Updated file names and naming conventions inside files * Update nimbusml.pyproj * Updated usage infos of find_http_urls.py and change_to_https.py * Updated find_http_urls.py and change_to_https.py * Execute unit tests in parallel (#331) * Wrap test estimator checks in a python unit test. * Combine the non-extended test runs together to make them more parallelizable. * Reverse the tests path args order to try and have test_estimator_checks run earlier in the test run. * Dynamically generate the test_estimator_checks unit tests. * Create the test_docs_example unit tests dynamically so they can be parallelized. * Fix KMeansPlusPlus does not work with a cluster size of 1 when using a debug version of ml.net * Fix OLS divide by 0 when given a particular set of inputs to fit. This is hidden in release versions of ml.net * Fix issue when ranking where the output of TextToKeyConverter was trying to overwrite the $scoredVectorData variable set by DatasetScorerEx. See test_metrics_evaluate_ranking_group_id_from_existing_column_in_X for a test which demonstrates the issue. It throws an exception from EntryPointNode.cs:837 when trying to get the outputs. The exception was hidden when using release builds of ML.Net. * Remove a test_estimator_check for OrdinaryLeastSquaresRegressor since it is causing invalid float values and throwing an exception which was hidden in release versions of ML.Net but visible in debug. * Update test_permutation_feature_importance tests to support parallel execution. * Rerun unit tests one extra time if any failed to check for intermittent failures. * Decrease the size of the images in the Image and Image_df examples. (#350) * Update package references to work with the latest versions from nuget.org. (#353) * Update ML.Net package references to work with RC1 * Update to ML.Net 1.4.0 * Update Microsoft.DataPrep to version 0.0.2.19-preview. * Downgrade Microsoft.DataPrep to version 0.0.2.3-preview due to issue with missing SqlJdbc package. * Update nimbusml version to 1.6.0. * Update release notes. (#354) * Added Google.Protobuf.dll to Mac and Linux builds (#358) * Modifications to support scripted temp/docs merging. (#361) * Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/decoding issue (#359) * Modified size variable in GetUnicodeTX to -1 * Update DataViewInterop.h * Fixed spacing in DataViewInterop.h * Re-enabled skipped test due to Py2.7 encoding/decoding issue * Removed unnecessary invoking of .sum() * Revert "Removed unnecessary invoking of .sum()" This reverts commit e51a64b. * Initial implementation of the temp_docs_updater script. (#363) * Update README.md * Generate PrefixColumnConcatenator with entry point compiler instead of manually. (#364) * Fix broken docs (#369) * Fix whitespaces and typos * tabs and whitespaces * Removed all references to DSSM in NimbusML (except for in test_wordembedding.py) (#374) * Added catch for predictors that do not support summary() (#375) * Added catch for summary() with FactorizationMachineBinaryClassifier * Updated test for model summary * Revert "Updated test for model summary" This reverts commit 59656fe. * Update pipeline.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Changed wording of error message * Update Microsoft.DataPrep to the latest version. (#379) * Create release notes for the 1.6.0 release. (#382) * Create release notes for version 1.6.0. * Update 1.6.0 release notes. * Bump version to 1.6.1 to fix dprep issue. (#385) * Update to latest version of DataPrep. * Bump version to 1.6.1 to fix dprep issue. * Removed "TODO: Replace with CV" comments (#389) * Disabled tests that only fail on Mac Py2.7 due to string encoding/dec… (#391) * Disabled tests that only fail on Mac Py2.7 due to string encoding/decoding bug * Update test_ngramfeaturizer.py * Add as_csr documentation to the inline docstrings for transform() and fit_transform(). (#392) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Update to the latest version of ML.Net. (#401) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Typo fixed on paragraph 15 (#399) * Typo fixed on paragraph 10 (#398) * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Update the transform output formats documentation. (#395) * Update the transform output formats documentation. * Add whitespace change to restart CI run. The mac build did not start correctly. * Add whitespace change to restart CI run. The mac build did not start correctly. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fixed broken brew command (#402) * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Checking for extended tests * Update phase-template.yml * Final touches * Re-activated NGramFeaturizer2.py (#381) * Update test_docs_example.py * Temporary change so that extended tests can be run by PRs * Revert "Temporary change so that extended tests can be run by PRs" This reverts commit 3f2b8a3. * Temporary change to be able to view extended tests' status with manual PRs * Update .vsts-ci.yml * Update .vsts-ci.yml * Update .vsts-ci.yml Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Contributing: Fix a typo (#406) * Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (#407) Note, this modification only handles intermittent crashes on Ubuntu/Mac unit test runs. It does not handle situations where the build hangs and never returns control to the build script. * Fix issue when specifying split_start='after_transforms' with CV.fit() (#410) * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Fix build issue caused by latest version of pip (>=20.0.0) (#412) * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <balmustafa117@gmail.com> Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com> Co-authored-by: Darío Hereñú <magallania@gmail.com> Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>

* Native featurizers for AutoML (#317) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update aml branch. (#415) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Turn off shuffling for FactorizationMachineBinaryClassifier. (#316) * Initial implementation of NGramExtractor. (#320) * Disable check which prevents artifacts from being generated by pull requests. (#330) * Update ManifestGenerator. (#329) * Added "# -- coding: utf-8 --" to preserve the character `␂` while guaranteeing successful builds with Python 2.7 (#328) * Replaced the non-ASCII characters * Revert "Replaced the non-ASCII characters" This reverts commit 4adb28c. * Update NGramExtractor_df.py * Updating coding of Schema.py to preserve the character "␂" * To re-run build tests * To re-run build tests * Edited encoding * Rerun build tests * Rerun build tests * Added utf-8 encoding to NGramExtractor.py (#339) * Image.py and Image_df.py extended testing examples are now supported on Ubuntu and CentOS (#338) * Remove skipping of Image.py and Image_df.py * Add libraries required for running Image.py and Image_df.py in Linux machines * Update build.sh * Add third party notices to package description on PyPI (#341) * Add third party notices to package description on PyPI * update * update * Add 1.5 (#344) * Add info to README.md (#342) * Add info to README.md * update * Fix DbgWinPy2.7 build which was failing when building NativeBridge. (#340) * Fix DbgWinPy2.7 build which was failing when building NativeBridge. Here is one of the error messages: libboost_numpy-vc140-mt-gd-1_64.lib(ndarray.obj) : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug' doesn't match value 'MTd_StaticDebug' in DataViewInterop.obj * Add whitespace change to start new CI run. UbuntuPy36 crashed * Fix error level when exiting build.cmd. (#345) * Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and processed HTTP-->HTTPS URL changes (#346) * Added utf-8 encoding to NGramExtractor.py * Added HTTP to HTTPS finder and converter * Changes made by ChangeHttpURLsToHttps.py * Added copyright statements * Updated FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add reports of alterable, nonalterable and invalid URLs * Revert "Changes made by ChangeHttpURLsToHttps.py" This reverts commit afa5f35. * Add URL changes made by ChangeHttpURLsToHttps.py * Revert "Add URL changes made by ChangeHttpURLsToHttps.py" This reverts commit b6a2f7f. * Revert "Add reports of alterable, nonalterable and invalid URLs" This reverts commit 9121123. * Update FindHttpURLs.py and ChangHttpURLsToHttps.py * Add HTTP to HTTPS URL reports * Changes made by ChangeHttpToHttpsURLs.py * Revert "Changes made by ChangeHttpToHttpsURLs.py" This reverts commit 72c85d9. * Revert "Add HTTP to HTTPS URL reports" This reverts commit 81c5a96. * Revert "Update FindHttpURLs.py and ChangHttpURLsToHttps.py" This reverts commit 038262f. * Update FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add URL reports * Add Http-->Https URL changes through ChangeHttpURLsToHttpsURLs.py * Removed if __name__ and main() statements * Revert "Removed if __name__ and main() statements" This reverts commit ba2742f. * Update nimbusml.pyproj * Manually converted two alterable HTTP links to HTTPS. * Rename ChangeHttpURLsToHttps.py to changeHttpURLsToHttps.py * Rename FindHttpURLs.py to findHttpURLs.py * URL in SigmoidKernel.txt is fixed for findHttpURLs.py to recognize it as an alterable URL * Changed outdated URL as original URL redirected to current URL * Update Report_InvalidUrls_FindHttpURLs.csv * Fixing reachable HTTP URLs * Update findHttpURLs.py * Updated URL reports, cleared invalid URLs * Update of report for alterable HTTP URLs after running findHttpURLs.py after running changeHttpURLsToHttps.py * Removing URL reports for merge * Renamed URL scripts and reflected this change inside these files (#348) * Renamed URL scripts and reflected this change inside these files * Fix small type in change_http_urls_to_https.py * Updated file names and naming conventions inside files * Update nimbusml.pyproj * Updated usage infos of find_http_urls.py and change_to_https.py * Updated find_http_urls.py and change_to_https.py * Execute unit tests in parallel (#331) * Wrap test estimator checks in a python unit test. * Combine the non-extended test runs together to make them more parallelizable. * Reverse the tests path args order to try and have test_estimator_checks run earlier in the test run. * Dynamically generate the test_estimator_checks unit tests. * Create the test_docs_example unit tests dynamically so they can be parallelized. * Fix KMeansPlusPlus does not work with a cluster size of 1 when using a debug version of ml.net * Fix OLS divide by 0 when given a particular set of inputs to fit. This is hidden in release versions of ml.net * Fix issue when ranking where the output of TextToKeyConverter was trying to overwrite the $scoredVectorData variable set by DatasetScorerEx. See test_metrics_evaluate_ranking_group_id_from_existing_column_in_X for a test which demonstrates the issue. It throws an exception from EntryPointNode.cs:837 when trying to get the outputs. The exception was hidden when using release builds of ML.Net. * Remove a test_estimator_check for OrdinaryLeastSquaresRegressor since it is causing invalid float values and throwing an exception which was hidden in release versions of ML.Net but visible in debug. * Update test_permutation_feature_importance tests to support parallel execution. * Rerun unit tests one extra time if any failed to check for intermittent failures. * Decrease the size of the images in the Image and Image_df examples. (#350) * Update package references to work with the latest versions from nuget.org. (#353) * Update ML.Net package references to work with RC1 * Update to ML.Net 1.4.0 * Update Microsoft.DataPrep to version 0.0.2.19-preview. * Downgrade Microsoft.DataPrep to version 0.0.2.3-preview due to issue with missing SqlJdbc package. * Update nimbusml version to 1.6.0. * Update release notes. (#354) * Added Google.Protobuf.dll to Mac and Linux builds (#358) * Modifications to support scripted temp/docs merging. (#361) * Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/decoding issue (#359) * Modified size variable in GetUnicodeTX to -1 * Update DataViewInterop.h * Fixed spacing in DataViewInterop.h * Re-enabled skipped test due to Py2.7 encoding/decoding issue * Removed unnecessary invoking of .sum() * Revert "Removed unnecessary invoking of .sum()" This reverts commit e51a64b. * Initial implementation of the temp_docs_updater script. (#363) * Update README.md * Generate PrefixColumnConcatenator with entry point compiler instead of manually. (#364) * Fix broken docs (#369) * Fix whitespaces and typos * tabs and whitespaces * Removed all references to DSSM in NimbusML (except for in test_wordembedding.py) (#374) * Added catch for predictors that do not support summary() (#375) * Added catch for summary() with FactorizationMachineBinaryClassifier * Updated test for model summary * Revert "Updated test for model summary" This reverts commit 59656fe. * Update pipeline.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Changed wording of error message * Update Microsoft.DataPrep to the latest version. (#379) * Create release notes for the 1.6.0 release. (#382) * Create release notes for version 1.6.0. * Update 1.6.0 release notes. * Bump version to 1.6.1 to fix dprep issue. (#385) * Update to latest version of DataPrep. * Bump version to 1.6.1 to fix dprep issue. * Removed "TODO: Replace with CV" comments (#389) * Disabled tests that only fail on Mac Py2.7 due to string encoding/dec… (#391) * Disabled tests that only fail on Mac Py2.7 due to string encoding/decoding bug * Update test_ngramfeaturizer.py * Add as_csr documentation to the inline docstrings for transform() and fit_transform(). (#392) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Update to the latest version of ML.Net. (#401) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Typo fixed on paragraph 15 (#399) * Typo fixed on paragraph 10 (#398) * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Update the transform output formats documentation. (#395) * Update the transform output formats documentation. * Add whitespace change to restart CI run. The mac build did not start correctly. * Add whitespace change to restart CI run. The mac build did not start correctly. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fixed broken brew command (#402) * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Checking for extended tests * Update phase-template.yml * Final touches * Re-activated NGramFeaturizer2.py (#381) * Update test_docs_example.py * Temporary change so that extended tests can be run by PRs * Revert "Temporary change so that extended tests can be run by PRs" This reverts commit 3f2b8a3. * Temporary change to be able to view extended tests' status with manual PRs * Update .vsts-ci.yml * Update .vsts-ci.yml * Update .vsts-ci.yml Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Contributing: Fix a typo (#406) * Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (#407) Note, this modification only handles intermittent crashes on Ubuntu/Mac unit test runs. It does not handle situations where the build hangs and never returns control to the build script. * Fix issue when specifying split_start='after_transforms' with CV.fit() (#410) * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Fix build issue caused by latest version of pip (>=20.0.0) (#412) * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <balmustafa117@gmail.com> Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com> Co-authored-by: Darío Hereñú <magallania@gmail.com> Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com> * Fix build and test failures in the aml branch. (#418) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> * Fix build issues with aml branch (#419) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. * Fix incorrect featurizers library on Mac builds. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> * Fix issues with centos unit tests related to featurizers. (#420) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. * Fix incorrect featurizers library on Mac builds. * Fix centos unit test issues with featurizers. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> * Add support for ONNX model export and execution. Merge to AML branch (#421) * Add initial implementation of the export to ONNX functionality. * Update the Microsoft.ML.OnnxConverter version in Platforms/build.csproj * Add test for verifying onnx export support. * Update the onnx conversion to be compatible with the latest changes in pull quest dotnet/machinelearning#3986. * Fix a few of the issues with test_export_to_onnx. * Add onnxruntime.dll to the NimbusML python package. It is already included in the Linux and Mac builds. * Initial implementation of the OnnxRunner transform. * Fix missing reference to models_onnxconverter in nimbusml.pyproj. * Exclude OnnxRunner from the test_export_to_onnx tests. * Remove OnnxRunner from test_estimator_checks. * Add back in OnnxConverter reference which was accidentally removed in merge. * Update onnx export test. TypeConverter, MeanVarianceScaler, MinMaxScaler no longer require experimental flag. * Pretty print the output of test_export_to_onnx. * Update to the latest version of ML.Net. * Update supported estimators in test_export_to_onnx. * Use the latest nightly builds for the ML.Net packages. * fix tests * fix test * Add example for OnnxRunner. (#422) * Build fix for rolling ML.NET 1.5.0-preview* and update to Pandas 1.0 (#437) * Updates for mlnet rolling build 1.5.0-preview2-28612-3 * Update pyproj * Update tests for pandas 1.0.1 * Skip check_dtype_object in TestEstimatorChecks due to pandas 1.0.0 removing Series.itemsize * Re-enable check_dtype_object and fix underlying issue causing it to fail * Remove label column from features when no Y is specified and predictor supports labels. (#439) * Fix breaking unit tests. (#440) * Update test_export_to_onnx test. (#443) * Update test_export_to_onnx test. (#444) * Fix NGramFeaturizer test * fix .0 (#445) * Add OneVsRest support to export to onnx tests and increase test coverage. (#446) * Automatically convert Categorical columns to their values before comparison in ONNX export tests. (#447) * add ORT results * Add ORT & vinod script (#449) * Add ORT validation to the export to onnx tests. (#451) * Remove unnecessary import. (#452) * Update data_frame_tool.py (#454) * Fixes for dataframe tool (#455) * add ORT results * fixes to dataframe tool and vinod * typos fixes * rollback * Fixed data_frame_tool to handle category columns correctly (#456) * Few fixes for IDV and DF formats * rollback * Regenerate entrypoint & api * Up version and fix test * Added Async suffix to RunOnBackgroundThread (#459) Added Async suffix to RunOnBackgroundThread * Update entrypoints and MarshallInvoke call (#461) * Update manifest.json * Update VariableColumnTransform.cs * Updated entrypoints * Update to use OnnxRuntime 1.2 (#462) * Updated ORT dependencies * Updated ORT Feed * Updated ORT tests for GPU * Revert "Updated ORT Feed" This reverts commit 76680f1. * Revert "Updated ORT tests for GPU" This reverts commit ae55b45. * Upgrade CI build to use latest onnxruntime and automl scenario based … (#463) * Upgrade CI build to use latest onnxruntime and automl scenario based test * simplify Co-authored-by: Gani Nazirov <ganaziro@microsoft.com> * dont run onnxruntime for python2.7 * fix automl test * Remove py2.7 Windows from CI build as latest pytest & pip are not supported anymore for Python 2.7 * fix typo * remove daily build location * use only nuget.org Co-authored-by: pieths <pieths.dev@gmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <balmustafa117@gmail.com> Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com> Co-authored-by: Darío Hereñú <magallania@gmail.com> Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com> Co-authored-by: Gani Nazirov <ganaziro@microsoft.com> Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>

Turn off shuffling for FactorizationMachineBinaryClassifier.

b4526e9

ganik approved these changes Oct 9, 2019

View reviewed changes

pieths merged commit 15eddb4 into microsoft:master Oct 9, 2019

pieths deleted the factorization_classifier_fix branch October 9, 2019 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn off shuffling for FactorizationMachineBinaryClassifier. #316

Turn off shuffling for FactorizationMachineBinaryClassifier. #316

pieths commented Oct 9, 2019

Turn off shuffling for FactorizationMachineBinaryClassifier. #316

Turn off shuffling for FactorizationMachineBinaryClassifier. #316

Conversation

pieths commented Oct 9, 2019