Skip to content

Commit

Permalink
RPP Reduction - Tensor min and Tensor max on HOST and HIP (r-abishek#260
Browse files Browse the repository at this point in the history
)

* Minor Change

* Add Validation check for DST_FOLDER path

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* Add Validation checks for all options in testAllScript.sh

* Add sanity check for dual Input cases
Set Max Dimension and Max Image Dump
Replaced Fast DCT tag with Accurate DCT

* Regenerate golden outputs using accurate dct Flag
Add golden outputs for some new augmentations

* Fix Flip golden outputs mismatch
Fix PLN3 variants mismatch in QA mode

* Add MAX_BATCH_SIZE check
removed Augmentations function calls for failing Qa modes
code cleanup

* Add crop and gamma correction augmentations
code cleanup

* Add comments to functions in rpp_test_suite_common.h

* minor change

* code cleanup

* minor code changes

* Change roi and Image sizes for crop augmentation

* Change numIterations option to numRuns
Addressed PR comments

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* Add turboJpeg header to update maxHeight and maxWidth values

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Change the performance Timings logic

* Add Avx2 implementation for F32 and U8 toggle variants

* minor change to support u8_f16 and u8_f32 cases

* Regenerate LUT golden outputs with ACCURATE_DCT tag

* Minor code changes

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* Made changes to the runTests.py in Host to remove testAllScipts.sh

* Made changes to the runTests.py in HIP to remove testAllScipts.sh

* Initial commit - Image min and max Reduction kernel

Includes
* u8 datatype for both min and max HOST Tensor of all variants.
* Testsuite changes.

* NWC -initial code for min max PLN3 - PLN3

* made changes to split min and max kernels seperately

* splitted kernels for min and max

* made changes to print final max/min in the R,G,B channels

* fixed inaccuracies in min/max computation

* made changes to typecast intermediate output to output requested by user

added comments for the code

code cleanup and minor changes in test suite

* fixed build issues

removed image folders used for min, max and sum

reverted unwanted file changes

* minor changes in test suite

* removed support for unwanted test case in Tensor_hip.cpp

* Adds new option roi

* remove testAllScripts.sh

* Adds roi Option in HIP backend

* Implement f32 variants

* Implement f16 and i8 datatype variants

* change F32 load and store logic

* Add build flags in CMakeLists.txt to set AVX/SSE flags based on the system configuration

* minor code changes

* Initial commit - Image sum Reduction kernel

Includes u8 PLN1 -> PLN1 conversion for HOST Tensor

* Implement PKD3 and PLN3 for Image sum Tensor HOST

* Support i8, f16 and f32 datatypes

* Initial commit - Image sum Reduction HIP kernel

Includes u8 PLN1 -> PLN1 conversion for Tensor

* Implement PKD3 and PLN3 for Image sum Tensor HIP

* Add support in testsuite

Revert normalization for i8 HOST Tensor variants

* Fix HIP testsuite

Remove additional blanks for 1 channel output

* Modify print statement in HIP testsuite

* Improve readability for testsuite outputs

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* Fix HIP to support larger inputs

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* Cleanup

* removed golden outputs for water

* minor changes

* Cleanup

Support Reduction QA test in testsuite

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* Remove unused variables and C style casting

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* Optimize u8 datatype further

* Fix static_cast

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* added rotate case with golden outputs

changed generic bilinear HOST codes to match with HIP codes

* Add golden output for remaining all tensor augmentations

* fix python script issues

* Optimize u8 and i8 datatype

Uses uint and int internal processing instead of float

* Fix testsuite build errors

* minor change

* Fix QA check

* Modify api naming from image_sum to tensor_sum

Includes changes for both HOST and HIP

* Support HIP Backend for RICAP

* change rcm and rmn golden outputs

* Fix HIP pkd3->pkd3 variant

* changes based on review comments

* change test_suite folder to tests

* Optimize u8 and i8 datatype of HIP

Includes modification in naming of shared memory

* minor fix

* changed generic nn F32 loads using gather and setr instructions

* Optimize and cleanup U8 HIP

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Fix i8 datatype variants

Includes cleanup

* Fix the issues with color_to_greyscale

* remove the empty folder creation

* reverting back the folder name change

* minor change

* added comments for latest changes

* minor change

* Improve readability and Cleanup

* Fix QA for HIP

Includes cleanup

* resolved review comments

* minor change

* Modify api naming from image_ to tensor_ for HOST

* Add support for QA tests

* removed range check for RMN U8-F32 and U8-F16 variants

changed from hipMemset to hipMemsetAsync for RMN HIP Kernel

removed multiplication by 255 for stdDev in RMN HOST U8-F16 and U8-F32 variants

* Modify naming of shared memory with _smem in HIP

Includes cleanup

* Typecast and reuse markArr for HIP U8 and I8

* Cleanup and minor optimization

* minor fix

* fix codacy warnings

* Additional cleanup

* Cleanup and move #define

* Changed the complexity of if statements in runTests.py

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Codacy fixes

* Fix codacy warnings

* Codacy fix

* Address other codacy warnings

* cleanup

* Change Image functions to generic

* Update ricap.hpp with reference paper

* resolved minor issues happened with merge

* minor changes

* fixed minor issue with getting profiler times

* minor formatting changes

* resolved build issues in test suite

renamed the min and max kernel file names

* RPP RICAP Tensor for HOST and HIP (r-abishek#213)

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* Support HIP Backend for RICAP

* Fix HIP pkd3->pkd3 variant

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Improve readability and Cleanup

* Additional cleanup

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Fix codacy warnings

* Address other codacy warnings

* Update ricap.hpp with reference paper

* Add RICAP dataset path in readme

* Make changes to error codes returned

* Modify roi crop region for unit and perf tests

* RPP Tensor Water Augmentation on HOST and HIP (r-abishek#181)

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Add Avx2 implementation for F32 and U8 toggle variants

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* change F32 load and store logic

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* removed golden outputs for water

* minor changes

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* changed generic nn F32 loads using gather and setr instructions

* added comments for latest changes

* minor change

* added definition for storing 32 and 64 bits from a 128bit register

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>

* Fix build error

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Boost deps fix for test suite

---------

Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

* Documentation - Readme & changelog updates (r-abishek#251)

* readme and changelog updates for 6.0

* minor update

* Documentation - Bump rocm-docs-core[api_reference] from 0.26.0 to 0.27.0 in /docs/sphinx (r-abishek#253)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Resize Mirror Normalize Bugfix (r-abishek#252)

* added fix for hipMemset

* remove pixel check for U8-F32 and U8-F16 for HOST codes

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>

* Cmake fix to prevent warning

* Fix paths in new python scripts

* Sphinx - updates (r-abishek#257)

* Sphinx - updates

* Doxygen - Updates

* Docs - Remove index.md

* Test suite fixes after tensor_min / tensor_max HOST merge

* Fix max case

* QA tests fix for hip and host

* naming convention changes as per new std

* Substitute imagePartial with partial

* Substitute imageMin/imageMax with min/max

* Replace hipMemset with hipMemsetAsync, and replace hipDeviceSynchronize with hipStreamSynchronize

* Use variable instead of batchCount*4

* Use post increment effectivly

* Resolve codacy warnings

* Additional cleanup

* remove unused variable

* Documentation - Bump rocm-docs-core[api_reference] from 0.28.0 to 0.29.0 in /docs/sphinx (r-abishek#265)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.28.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove auto merge boost

* Spaces formatting

* Bump rocm-docs-core[api_reference] from 0.29.0 to 0.30.1 in /docs/sphinx (r-abishek#268)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.29.0...v0.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add support for mi300 (r-abishek#269)

* Documentation - Bump rocm-docs-core[api_reference] from 0.30.1 to 0.30.2 in /docs/sphinx (r-abishek#273)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.1...v0.30.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Cleanup by removing oneliner functions as inline

* RPP Tensor Audio Support - To Decibels (r-abishek#258)

* Initial commit - Non slient region detection

Includes unittest setup

* Initial commit - To Decibels

Includes unittest setup

* Replace vectors with arrays

* Cleanup

* Replace Rpp64s with Rpp32s

* Optimize and precompute cutOff

* Fix buffer used

* Fix buffer used

* Additional Cleanup

* Update testsuite for Audio

* code cleanup

* Add Readme file for Audio test suite

* changes based on review comments

* minor change

* Remove unittest folders and updated README.md

* Remove unit tests

* minor change

* code cleanup

* added common header file for audio helper functions

* removed unncessary audio wav files

fixed bug in ROI updation for audio test suite

resolved issue in summary generation for performance tests in python

* removed log file

* added doxygen support for audio

* added doxygen changes for to_decibels

* updated test suite support for to_decibels

* minor change

* removed the usage of getMax function and used std::max_element

* modularized code in test suite

* merge with latest changes

* minor change

* minor change

* resolved codacy warnings

* Codacy fix - Remove unused cpuTime

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* resolved issue with file_system dependency in test suite

* Doxygen changes

changed malloc to new in NSR kernel

* RPP RICAP Tensor for HOST and HIP (r-abishek#213)

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* Support HIP Backend for RICAP

* Fix HIP pkd3->pkd3 variant

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Improve readability and Cleanup

* Additional cleanup

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Fix codacy warnings

* Address other codacy warnings

* Update ricap.hpp with reference paper

* Add RICAP dataset path in readme

* Make changes to error codes returned

* Modify roi crop region for unit and perf tests

* RPP Tensor Water Augmentation on HOST and HIP (r-abishek#181)

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Add Avx2 implementation for F32 and U8 toggle variants

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* change F32 load and store logic

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* removed golden outputs for water

* minor changes

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* changed generic nn F32 loads using gather and setr instructions

* added comments for latest changes

* minor change

* added definition for storing 32 and 64 bits from a 128bit register

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>

* Fix build error

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Boost deps fix for test suite

---------

Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

* Documentation - Readme & changelog updates (r-abishek#251)

* readme and changelog updates for 6.0

* minor update

* added ctests for audio test suite for CI

made changes to add more clarity on the QA Tests results

* Cmake mods for ctest

* HOST-only build error bugfix

* added qa mode paramter to python audio script

added golden output map for QA testing of Non silent region detection

* minor change

* Documentation - Bump rocm-docs-core[api_reference] from 0.26.0 to 0.27.0 in /docs/sphinx (r-abishek#253)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Resize Mirror Normalize Bugfix (r-abishek#252)

* added fix for hipMemset

* remove pixel check for U8-F32 and U8-F16 for HOST codes

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>

* added example for MMS calculation in comments for better understanding

* Sphinx - updates (r-abishek#257)

* Sphinx - updates

* Doxygen - Updates

* Docs - Remove index.md

* updated info used to for running audio test suite

* removed bitdepth variable from audio test suite

* added more information on computing NSR outputs in the example added

* Fix doxygen for decibels

Also removes extra QA reference files

* Fix build errors and qa tests in Audio Test suite

* Remove auto-merge repeated funcs

* Improve clarity on header docs

* made changes based on review comments

* stored golden outputs of to_decibels in binary file

removed golden output text files for non silent region

* removed unused parameter in verify_output function

* updated list of cases supported in python script

* added error handling for opening golden output file

* Codacy fix and tests warning fix

* Codacy fix

* Codacy fix trial

* codacy fix for checking boundaries of fstream

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Documentation - Bump rocm-docs-core[api_reference] from 0.30.2 to 0.30.3 in /docs/sphinx (r-abishek#274)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding issue template (r-abishek#270)

* Add files via upload

* added ROCm v6, MI300, default component

* Fix cast used in testsuite

Includes minor fixes

* Fix displaying f16 outputs

* Optimize HOST min/max reduce function further

* Fix spacing in HIP kernels

* Fix PLN1 outputs for u8 and i8 datatypes of HOST backend

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (ROCm#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>

* Bump rocm-docs-core[api_reference] from 0.31.0 to 0.33.0 in /docs/sphinx (ROCm#294)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Store reference outputs via map for min and max kernels

* Update tensor_max.hpp license

* Update tensor_min.hpp license

* Fix output comparison check

* Merge branch 'ar/opt_tensor_min_tensor_max' of https://github.com/r-abishek/rpp into sn/tensor_min_max

* Modify exit condition used in outer most kernel

* Modify srcIdx for HIP Tensor min

* Using maximum as 255 for HIP Tensor min

* Modify srcIdx for HIP Tensor max kernel

Also fixes build error in testsuite

* Fix corrupted outputs displayed for Tensor sum

* Fix corruption issue seen with tensor sum kernel

* Fix minimum for I8 Tensor max kernel

* Modified HIP buffer initialization with a common function

* Fix redefinition

* Remove additional variables xAlignedLength

* Remove unwanted xAlignedLength and xDiff

* Remove redefinition of TensorSumReferenceOutputs

* Fix for CI issue

* Add parenthesis

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: Snehaa-Giridharan <118163708+snehaa8@users.noreply.github.com>
Co-authored-by: fiona-gladwin <fionagladwin@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Lisa <lisajdelaney@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>
Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com>
  • Loading branch information
11 people committed Mar 5, 2024
1 parent bc3b38d commit 5424830
Show file tree
Hide file tree
Showing 17 changed files with 3,390 additions and 109 deletions.
72 changes: 72 additions & 0 deletions include/rppt_tensor_statistical_operations.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,78 @@ RppStatus rppt_tensor_sum_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t
RppStatus rppt_tensor_sum_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t tensorSumArr, Rpp32u tensorSumArrLength, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Tensor min operation on HOST backend for a NCHW/NHWC layout tensor
* \details The tensor min is a reduction operation that finds the channel-wise (R min / G min / B min) and overall min for each image in a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] minArr destination array in HOST memory
* \param [in] minArrLength length of provided destination array (Restrictions - if srcDescPtr->c == 1 then tensorSumArrLength >= srcDescPtr->n, and if srcDescPtr->c == 3 then tensorSumArrLength >= srcDescPtr->n * 4)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y)) | (Restrictions - roiTensorSrc[i].xywhROI.roiWidth <= 3840 and roiTensorSrc[i].xywhROI.roiHeight <= 2160)
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_tensor_min_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t minArr, Rpp32u minArrLength, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Tensor min operation on HIP backend for a NCHW/NHWC layout tensor
* \details The tensor min is a reduction operation that finds the channel-wise (R min / G min / B min) and overall min for each image in a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \param [in] srcPtr source tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] minArr destination array in HIP memory
* \param [in] minArrLength length of provided destination array (Restrictions - if srcDescPtr->c == 1 then tensorSumArrLength >= srcDescPtr->n, and if srcDescPtr->c == 3 then tensorSumArrLength >= srcDescPtr->n * 4)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y)) | (Restrictions - roiTensorSrc[i].xywhROI.roiWidth <= 3840 and roiTensorSrc[i].xywhROI.roiHeight <= 2160)
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_tensor_min_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t imageMinArr, Rpp32u imageMinArrLength, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Tensor max operation on HOST backend for a NCHW/NHWC layout tensor
* \details The tensor max is a reduction operation that finds the channel-wise (R max / G max / B max) and overall max for each image in a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] maxArr destination array in HOST memory
* \param [in] maxArrLength length of provided destination array (Restrictions - if srcDescPtr->c == 1 then tensorSumArrLength >= srcDescPtr->n, and if srcDescPtr->c == 3 then tensorSumArrLength >= srcDescPtr->n * 4)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y)) | (Restrictions - roiTensorSrc[i].xywhROI.roiWidth <= 3840 and roiTensorSrc[i].xywhROI.roiHeight <= 2160)
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_tensor_max_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t maxArr, Rpp32u maxArrLength, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Tensor max operation on HIP backend for a NCHW/NHWC layout tensor
* \details The tensor max is a reduction operation that finds the channel-wise (R max / G max / B max) and overall max for each image in a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \param [in] srcPtr source tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] maxArr destination array in HIP memory
* \param [in] maxArrLength length of provided destination array (Restrictions - if srcDescPtr->c == 1 then tensorSumArrLength >= srcDescPtr->n, and if srcDescPtr->c == 3 then tensorSumArrLength >= srcDescPtr->n * 4)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y)) | (Restrictions - roiTensorSrc[i].xywhROI.roiWidth <= 3840 and roiTensorSrc[i].xywhROI.roiHeight <= 2160)
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_tensor_max_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t imageMaxArr, Rpp32u imageMaxArrLength, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! @}
*/

Expand Down
Loading

0 comments on commit 5424830

Please sign in to comment.