Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added license files in the base image #1595

Merged
merged 7 commits into from
Aug 9, 2019
Merged

Added license files in the base image #1595

merged 7 commits into from
Aug 9, 2019

Conversation

manashgoswami
Copy link
Contributor

Description: The license files from the repo are required to be present in the base images. Also added dependencies for Azure ML.

Motivation and Context

  • compliance with licensing requirements.

@manashgoswami manashgoswami requested a review from a team as a code owner August 9, 2019 01:09
@jywu-msft
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 21 pipeline(s), but failed to run 1 pipeline(s).

@jywu-msft jywu-msft merged commit 6d783e8 into microsoft:master Aug 9, 2019
askhade added a commit that referenced this pull request Sep 5, 2019
* Add more type support for OneHot op (#1565)

* parallel build

* update quatizelinear to process int8 input (#1576)

* Remove unneeded C APIs + some refactoring. (#1555)

* Mention OrtCreateSessionFromArray in C API doc

* c api changes after review (1)

* updates...

* fixes

* Reorder include

* A few performance improvements coming out of ssd_mobilenet and ssd_resnet34 analysis (#1578)

* A few performance improvements:
 - Make the iteration in NonZero more efficient by using a raw pointer and simplifying the increment logic
   - add another unit test to check the new logic works with 3 dimensional tensor
   - gains about 2% for ssd_mobilenet
 - Avoid floating point operations on each iteration on Concat
  - about 0.5% for ssd_mobilenet and ssd_resnet34
 - Put common case first in ExecutionFrame::AllocateAsPerAllocationPlan to avoid unnecessary call to IsSparseTensor
  - about 0.05% for ssd_mobilenet
 - Minor tweak to put some ctors in the TensorShape header so they can be inlined more easily

* Fix race condition issue in RNN/LSTM/GRU (#1544)

Fix race condition issue in RNN/LSTM/GRU.

Description:
The filter_desc and rnn_desc could also be changed in compute which could be in multi-thread. It will cause race condition issue.

Fix:
create temperate cudnn descriptors
cache cudnn_dropout_desc_ which won't change

* Remove memory copy between TensorRT and CUDA (#1561)

* remove memory copy between CUDA and TRT

* add info to RegisterExecutionProvider input

* use new IDeviceAllocator for trt allocator

* remove SetDefaultInputsMemoryType from TRT EP

* remove onnx-tensorrt 5.0

* add submodule onnx-tensorrt branch 5.1

* remove redundancy

* Update transformer_memcpy.cc

* Update tensorrt_execution_provider.cc

* switch to TensorRT 5.1.5.0

* update python binding

* disable failed test case on TensorRT

* Update activation_op_test.cc

* upgrade to TensorRT container 19.06

* update according to feedback

* add comments

* remove tensorrt allocator and use cuda(gpu) allocator

* update onnx-tensorrt submodule

* change ci build cuda directory name

* Optimize Fence checking performance (#1593)

* For majority of nodes, we do not need to do fence check. Instead, we only need to do FenceCheck for CPU<->GPU mem sync node
But we pay the Fence check cost for every single node and every single input and output.

This change will minimize the Fence check to only do it when necessary.

* Added license files in the base image (#1595)

* Update Dockerfile.openvino

* Update Dockerfile.cuda

* Update Dockerfile.cuda

* Update Dockerfile.openvino

* Update Dockerfile.cuda

* added ThirdParty notice file to base image.

* corrected license file name

* Implement new LabelEncoder in opset 2 in ML domain (#1393)

* Implement new LabelEncoder in opset 2 in ML domain

* Fix compilation error

* Fix tests

* Include ONNX's fix

* Formatting and addressing a comment

* Address a minor comment

* add int64 support for less op. (#1604)

* put all gemmlowp common code in one place (#1590)

* put all gemmlowp common code in one place

* fix gpu build failures

* minor update

* Update nGraph to v0.22.1 (#1582)

* Update nGraph to 0.21 and adjust the EP

* Share the graph initializers between custom ops

* Update nGraph to 0.22 and exclude Gather entirely

* Enable building on Windows with nGraph v0.21.1-rc.0

* Disable the unsigned input Shrink op tests for nGraph until the next update

* Line-shortening code refactor

* Fix for the master branch merge artifact

* MKLDNN patches adjustment for Windows

* Exclude MatMulInteger for non-const zero points

* Exclude ConvInteger for non-const zero points

* Enable full Cast op support

* Use the v0.22.1 tag

* Skip ConvTranspose_InvalidKernelShape test for ngraph provider

* Create sub-graph ModelProto from fused_node

* Include io_win32.h only if builds on windows (#1587)

* Include io_win32.h only if builds on windows

* looks like include order matters

* Fix for CPU random ops seed narrowing conversion. (#1594)

* Fix perf test executable. (#1598)

* Mention OrtCreateSessionFromArray in C API doc

* Fix perf test executable due to removal of certain C APIs

* fix linux build

* Avoid duplication

* Fix mem leak

* Minor perf improvements. (#1580)

* Minor perf improvements.

- Cache the vector sizes in IExecutionFrame and NodeIndexInfo to avoid calls to size().
  - 2 instructions instead of 10
- Remove an unnecessary check in IExecutionFrame
  - add a check to the ctor so we guarantee it's unnecessary
- Reserve memory for the vectors in BroadcastIterator
  - saves reallocs if more than one value is added
    - but rare with the mlperf models for multiple values to be added so benefit is limited.
  - slight tweak to the Broadcaster ctor code to make it more readable

* Serialize optimized onnx model (#1470)

* Model serialization

* Removed duplicate symbol

* Minor update

* Review comments

* add tests

* Model serialization

* Removed duplicate symbol

* Minor update

* Merged PR 1106437: Model Serialization in onnxruntime

* Review comments

* Merged PR 1107226: Review comments

Review comments

* add tests

* Fixed merge conflict

* Correct python tests

* InferenceSesssion Refeed Test

* Replace use of widechar const literal-L

* Fixed failing tests

* Updated comment

* Removed unnecessary session options

* Spell check on comments

* Do not serialize when level 3 optimization specified

* Updated error logs

* Changed log severity to WARN

* Fix log message truncation on Windows when printf formatting is used.` (#1599)

* Fix log message truncation and add unit test. On Windows vnsprintf_s returns -1 when truncating so we need to differentiate that from a real error.

* Remove copy of generator in Multinomial (#1611)

* Remove copy of generator in Multinomial so that different values are generated each time.
Add ability to test

* Kezhan/execute graph refactoring (#1553)

* checking execution provider logic updated.

* fix the logic of copy input and output.

* update

* update

* update

* update

* update

* update

* fix ngraph failure.

* fix comments

* Cleanup csharp API SessionOptions and RunOptions to be consistent with other APIs (#1570)

- Updated SessionOptions API to use properties instead of setter/getter methods. 
- Added missing APIs. 
- Added RunOptions.

* Make changes to pipeline template to include missing headers in tars/zips (#1617)

* Fix trtlogger segfault. re-enable SoftPlus unit test for TRT. add doc… (#1623)

* Fix trtlogger segfault. re-enable SoftPlus unit test for TRT. add documentation for ORT_TENSORRT* env vars.

* Update TensorRT-ExecutionProvider.md

* Use a friendly enum for graph optimization level. (#1586)

* Mention OrtCreateSessionFromArray in C API doc

* review changes

* use enum for graph optimization level

* Use explicit values for enums

* updates...

* Add friendly enum for graph optimization levels in C, C# and Python APIs.

* Fix linux build

* Fix build breakage due to master merge

* PR comments

* Generate documentation from the registered operator kernels (#1395)

- Added python script for generating markdown doc from the registered opkernels. 
- Made some conditional changes in the pybind to expose necessary python API
- Added some missing type-constraints in the op kernel registrations

* Fix incorrect box offset computation in NMS op (#1624)

* More changes

* Fix NMS

* nits

* Integrate featurizers (#1573)

Added Sample Featurizer and Infrastructure
  Make featurizers and unit tests compile and run with GTest.
  Create definitions for the first featurizer kernel.
  Add new operator domain.
  Create datetime_transformer kernel and build.
  Move OPAQUE types definitions for featurizers kerneles out to a separate cc.
  Register them with the type system.
 Provide unit tests for new AutoML DateTimeTransformer kernel.
  Make necessary adjustments to the test infrastructure to make it run
  with new types.

* Support int64 for ReduceMax (#1625)

* update onnx to latest commit (#1622)

* update onnx to latest commit

* Disable and/or fix failing tests

* disable not yet implemented tests for opset 11

* disable tests

* fix bug in mkldnn fp16 graph check

* Copy System.Numerics.Tensors sources from dotnet/corefx into onnxruntime (#1605)

 Copy System.Numerics.Tensors sources from dotnet/corefx into onnxruntime

* removed --gen_doc (#1633)

* Fix parsing initial hidden state in RNN (#1626)

* Fix the way initial hidden state is used for reverse direction in RNN

* Add test case

* Updates

* Let mlas use session thread pool (#1609)

1.Let mlas use session thread pool
2.Remove onnxruntime_USE_MLAS cmake option
3. Remove the win32 thread pool code inside mlas

mlas will:

1.use ort thread pool if it get passed in
2.use openmp if the threadpool parameter is nullptr
3.run single threaded if the threadpool parameter is nullptr and openmp is disabled.

* update TRT EP CI's to use latest model.zip (#1637)

* Add AutoML to 3 main builds. (#1631)

Add AutoML to 3 main builds.
  Fix unit tests. Enable copy elision, do not move movable object
  on return by value.

* MLAS: add U8U8 MatMul operation (#1644)

Implement the first round of changes for quantization inside MLAS. This adds a MatMul operation for U8xU8=S32 for x86/x64 processors.

* Add uint8 Support for NonZero Op (#1614)

* update MKLML to version which contains fix for thread hang. (#1636)

* update MKLML which has bugfix for thread hang. move PATCH_COMMAND outside BUILD_FOR_NATIVE_MACHINE check.

* MKLML_VERSION 2020.0.20190813 is for windows only.

* MlasGetMaximumThreadCount: plus 1 to the NumThreads from ORT thread pool (#1646)

* Update perf tool documentation to reflect the new graph optimization enums. Relax constraint for enable_all. (#1650)

* Allow user disable multiple threading (#1647)

* Update onnx test runner documentation (#1651)

* Mention OrtCreateSessionFromArray in C API doc

* Update perf tool documentation to reflect the new graph optimization enums. Relax constraint for enable_all.

* Update one more doc

* Update onnx test runner documentation

* Add default in the docs

* Fix memory leak in mlas unitest (#1654)

* fix bug on windows where ops were always getting dumped. (#1648)

* Remove --whole-archive (#1655)

* Check return value form CreateFeedsFetchesManager. (#1653)

Also cleanup a couple of unused variables.

* Update PyTorch Section for supported onnx version (#1635)

PyTorch exporter in Pytorch1.2 can natively support multiple opset now

* cudnnRNNForwardInferenceEx doesn't support 0 sequence in the bathes

Fix issue that cudnnRNNForwardInferenceEx doesn't support 0 sequence in the bathes

Solution:
Reset the 0 sequence to 1 for the bathes before call the cudnnRNNForwardInferenceEx, has a array to track the batch id which has 0 sequence. Once get the result, call a CUDA kernel to mask on the output using the batch id tracked in the array.

* Add details of which node was not able to be placed on an execution provider. (#1665)

* nGraph EP Optimizations (#1630)

* Added check for unnecessary function initializations, and removed lock from unneeded areas of code.

* Added LRU cache to EP.

* Bugfixes for nGraph EP Optimization PR

* Changed default cache size to 500 and refactored mutex readability.

* Fixed unsafe environmental variable fetch for Windows.

* Cleaned up Windows environment functions and cleaned up mutexes.

* Fix a few errors in the NuGet pipeline (still broken) (#1656)

* update set fetches for execution with allocation plan. (#1668)

* Support Tensor<bool> and Tensor<Int8> in C# API. Support Tensor<string> as input. Fix a bug in the InferenceSession Run() with RunOptions (#1671)

- Support bool-Tensor and int8-Tensor in input-output of C# api
- Support string-tensor as input in C# api
- Fix a bug in InferenceSession.Run() -- RunOptions was not passed into the native call

* Optimize kernel index (#1672)

* update clip for opset 11 (#1661)

* update clip for opset 11

* exclude ngraph provider for clip unit tests

* exclude ngraph for all clip opset 11 tests

* fix op version

* Add support of ReduceSum int64 (#1664)

* Add support of ReduceSum int64

* add unit test for int64

* int64 support for 'where' op (#1666)

* Added some mo optimizations to improve performance (#1674)

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Don't create the default allocator every single time. Rename API accordingly. Expose Session/Run log severity levels. (#1615)

* Mention OrtCreateSessionFromArray in C API doc

* Don't create the default allocator every single time. Rename API accordingly.

* Don't create the default allocator every single time. Rename API accordingly.

* updates...

* updates...

* PR comments

* fix typo in license header

* fix build

* Share default CPU allocator with Mlas preferred alignment (#1682)

Description: make default CPU allocator to use MLAS preferred alignment

Motivation and Context

This is needed for C API to have an aligned default CPU allocator, the same as the one in CPU provider

* More fixes on the NuGet CPU CI pipeline (#1688)

- Fix the Windows end-to-end test in NuGet CI
- Skip the TestModelSerialization, because it is failing on Linux. Must be fixed before API is released for use. Owner is notified.

* treat zero point properly (#1686)

* use MLAS for QGEMM in matmulInteger and convInteger (#1692)

* use mlas qgemm for u8u8_s32 gemms

* update test

* fix typo in max batch size error msg. (#1687)

* Python API naming and other cleanup (#1678)

- Make the naming of properties in python SessionOptions and RunOptions consistent with other apis.
- Remove unnecessary apis

* make gemmlowp default for arm (#1701)

* make gemmlowp default for arm

* force use_gemmlowp in header for default case

* remove unnecessary white space

* Doc updates (#1522)

* Updates

* Remove preview texts

* Update README.md

* Updates

* Update README.md

* Update README.md

* Minor wording update

* Update README.md

* Update doc on CUDA version

* revert update

* Update readme for issue #1558

* Clean up example section

* Cosmetic updates

- Add a index of build instructions for browsability
- Update build CUDA version from 9.1 to 10

* Fix broken link

* Update README to reflect upgrade to pip requirement

* Update CuDNN version for Linux Python packages

* Clean up content

Updated ordering and add table of contents

* Minor format fixes

* Move Android NNAPI under EP section

* Add link to operator support documentation

* Fix typo

* typo fix

* remove todo section

* remove @PCGOTREL x64 usage (#1707)

Avoid the need for @PCGOTREL relocations by annotating MLAS global data shared with assembly modules with attribute(visibility("hidden")).

* MLAS: Android sgemm kernel build fix (#1710)

Fix the aarch64 kernel to build properly with the Android NDK (specifically clang).

* Remove TaskThreadPool (#1713)

* Allow input used across execution providers as long as they use the same allocator device (#1715)

as long as these providers use the same allocator device

Description: Currently ORT throws error when one input is used in different EPs. The change removes that restriction

Motivation and Context

It is now possible to share inputs across EPs now that allocation are device-based, instead of EP based.

* Add support for int8 x uint8 for MatMulInteger, and int16 x int16 custom op (#1391)

Description: The change adds necessary quantization support on CPU with mixed int8/uint8, as well as int16 for matrix multiply operations that outputs int32

Motivation and Context

Integer operations are critical for quantized model's performance
Current MatMulInteger implementation in CPU only supports uint8 x uint8, while the spec supports int8 x uint8. Having a default CPU implementation that fully support the spec would help accuracy verification.
Besides, some model may need to quantize to int16, but MatMulInteger op does not support that yet. A custom op of MatMulInteger16 is added to satisfy such models.

* Use exec form of ENTRYPOINT for docker server (#1690)

* Use exec form of ENTRYPOINT for docker server

# Issue
The entrypoint currently uses the shell form - this prevents users from passing in any cmdline arguments... also passing a model_path in means the server only works in the envvar is set... however this is not what the error message says!
```
$ docker run -v /home/rakelkar/try/onnxzoo/style:/mnt/models -it   mcr.microsoft.com/onnxruntime/server --model_path /mnt/models/model.onnx
Version: local_build
Commit ID: default

model_path must be the location of a valid file
Allowed options:
  -h [ --help ]               Shows a help message and exits
  --log_level arg (=info)     Logging level. Allowed options (case sensitive): 
                              verbose, info, warning, error, fatal
  --model_path arg            Path to ONNX model
  --address arg (=0.0.0.0)    The base HTTP address
  --http_port arg (=8001)     HTTP port to listen to requests
  --num_http_threads arg (=4) Number of http threads
  --grpc_port arg (=50051)    GRPC port to listen to requests
```
# Fix
1. remove the env var
2. use the exec form

* Update readme to use model_path arg

* Support 'Bilinear' mode for 2D inputs in Resize and Upsample kernels  (#1679)

* Support bilinear mode with actual 2D inputs in Resize and upsample

* Fix build break

* Fix build break

* Add test

* CUDA changes

* Resolve PR comments

* Resolve comments

* add implementation for dynamic quantize linear (#1697)

* Fix reading of onnx domain causing one of the automl models to break in 0.5 release. (#1694)

* Mention OrtCreateSessionFromArray in C API doc

* Fix registration of Equal op causing one of the automl models to break in 0.5 release.

* updates...

* Fix a issue that CUDA EP fallback to much nodes to CPU for some case which cause huge data copy. If the node's inputs are all initializer, we shouldn't fallback the node to CPU. (#1727)

Fix an issue that CUDA EP fallback too much nodes to CPU for some case which cause huge data copy.
#1675

Currently, if the node's inputs are all as initialier, CUDA EP will fallback it to CPU. And it will also fallback some nodes under it. It could cause some huge data copy. for the case reported by a user, it has several Slices with input from initializer, and a Concat op to concat the output from Slice output. The data is huge 16MB after concat, which make the data copy from CPU to GPU quite costly because it's a sync copy.

Fix
If the node's inputs are all initializer, we shouldn't fallback the node to CPU.

* Publish perf tool with nightly build (#1728)

* Update the docker file for OpenVINO (#1741)

Update the docker file for OpenVINO which is used for AML

* Fix typo in NMS code 

Fix typo in NMS code

* MKL-DNN EP:  control flow fix (#1740)

* moved subgraph_index to MklDnn Execution Provider

* code cleanup

* Implementation of Nuphar execution provider (#881)

* Implement Nuphar execution provider

Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.

* Fix submodules

* Fix TVM submodule

* Update Nuphar to latest and resolve confliction

* Remove stale files caused by merge -X theirs

* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test

* Fix bad merge

* Merge from Nuphar

* Fix warning treated as error, revert some unnecessary changes

* Revert some more test changes

* Some more test revert or comments to make review easier
New tests could be added later

* One more revert of unnecessary changes

* More change revert. Test could be added back later.

* Enforce shape validation. (#1716)

* Mention OrtCreateSessionFromArray in C API doc

* Enforce shape validation.

* Update broken models

* enable quantizing specific nodes (#1742)

* update quantization script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants