Skip to content

Commit

Permalink
Fix latency of INT8 is slow issue (#389)
Browse files Browse the repository at this point in the history
* create

* rm wrong file

* push missed files

* add ci prepare cmd

* add sudo in env

* fix the env by clone to private env

* fix env setting

* mv the ilit to new folder, clear the output of ipy

* rm temp files

* Lqnguyen branch3 (#210)

* Add bitonic-sort sample.

* Add a note about common file in README.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Move 1d_HeatTransfer sample to open source GitHub.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Updating License file to remove date

* Adding Buffer Object approach.

* Add comment about the location of dpc_common.hpp.

* New sample: Prefix Sum.

* Remove new sample.

* New code sample PrefixSum in ParallelPatterns.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Integrate MPI code sample with dpc_reduce code sample.

* Update README.md

* Update main.cpp

* Integrate MPI with latest dpc_reduce for beta09.

* Update README.md

* Update main.cpp

* Update main.cpp

* Update README.md

* Update CXX to icpx and compiler option for beta09.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Add "export I_MPI_CXX=dpcpp" in sample.json file.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Update json file.

* Sync with master.

* Update bitonic-sort code sample according to the latest guideline.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* Lqnguyen branch1 (#201)

* Add bitonic-sort sample.

* Add a note about common file in README.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Move 1d_HeatTransfer sample to open source GitHub.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Updating License file to remove date

* Adding Buffer Object approach.

* Add comment about the location of dpc_common.hpp.

* New sample: Prefix Sum.

* Remove new sample.

* New code sample PrefixSum in ParallelPatterns.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Integrate MPI code sample with dpc_reduce code sample.

* Update README.md

* Update main.cpp

* Integrate MPI with latest dpc_reduce for beta09.

* Update README.md

* Update main.cpp

* Update main.cpp

* Update README.md

* Update CXX to icpx and compiler option for beta09.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Add "export I_MPI_CXX=dpcpp" in sample.json file.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Update json file.

* Sync with master.

* Update the PrefixSum code sample according to the latest guidelines.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Update based on comments from reviewer.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Restructure the Usage function.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* Lqnguyen branch2 (#209)

* Add bitonic-sort sample.

* Add a note about common file in README.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Move 1d_HeatTransfer sample to open source GitHub.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Updating License file to remove date

* Adding Buffer Object approach.

* Add comment about the location of dpc_common.hpp.

* New sample: Prefix Sum.

* Remove new sample.

* New code sample PrefixSum in ParallelPatterns.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Integrate MPI code sample with dpc_reduce code sample.

* Update README.md

* Update main.cpp

* Integrate MPI with latest dpc_reduce for beta09.

* Update README.md

* Update main.cpp

* Update main.cpp

* Update README.md

* Update CXX to icpx and compiler option for beta09.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Add "export I_MPI_CXX=dpcpp" in sample.json file.

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

* Update json file.

* Sync with master.

* Update 1d_HeatTransfer code sample according to the new guideline.

* Add comment about dpc_common.hpp .

Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com>

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* namespace change for montecarlo (#208)

* Adding mandelbrot sample to the repository

Signed-off-by: vmadanan <varsha.madananth@intel.com>

* Adding changes to mandelbrot to remove libsycl-complex.so dependency

* namespace change for Monte Carlo

* Updated samples to newest coding guidelines

* Updating samples- Mandelbrot, DCT and MonteCarlo with newest coding guidelines

* Adding changes to buffer and accessor declarations (#214)

* Initial commit for iso3dfd_dpcpp code sample

Signed-off-by: Gogar, Sunny L <sunny.l.gogar@intel.com>

* Update License.txt

* Update sample.json

* Adding iso3dfd_omp_offload and changing dpc++ compile for windows to dpcpp

* Delete .nfs000000043228fc3f00000140

* Removing build directory accidently checked in

* Update sample.json

Fixing a missing comma

* Adding couple of changes as per Paul's recommendation

* Updating some variable names as per guidelines

* Moving iso3dfd_omp_offload to C++ folder

* Fixing a windows related error about missing std:: for tranform

* Adding algorithm header explicity in iso3dfd.h

* Fixing the sample.json to eliminate recent errors

* Adding changes to buffer and accessor declarations

* Update samples for beta10 release (#207)

* Update simple add sample

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Update make files

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Update fpga make file

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Add dpc_common.hpp

* Update sample.json

* Fix Makefile.win

* Update Makefile.win

* Update sample.json

* Remove dpc_common.hpp

* Update VS project file

* Update README.md

* Update sample.json

* Add stb

* Update read me file

* Initial commit

* Update License.txt

* Change location of matrix multiplication sample

* Fix matrix mul sample VS project file

* Update samples for beta10 release

* Fix for Windows

* Fix for FPGA

* Fix for FPGA

* Fix for FPGA to support both beta09 and beta10

* Add header comment

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* folder structures changes following saumya's request (#217)

* Beta10 GZIP performance update (#204)

* Beta10 GZIP update -- use USM for data transfer

Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com>

* Trivial change to re-trigger CI

Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com>

* Update top level README (#222)

* Update top-level README and improve format

Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com>

* Minor formatting update

Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com>

* Fix path to oneDPL for Beta10  (#224)

* initial commit of openMP example.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* Initial commit of the dpc_reduce

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added guid to sample.json

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* fixed sample.json files.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* fixed the include files.  Somehow I copied a slightly old repo and it still had <chrono> and the omp_common.hpp file.  They have been removed.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added license.txt file ran through formating tool one more time removed all calls to "std::endl" and replaced with " \n"

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* renamed license.txt to License.txt

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added "ciTests" to the sample.json file.  It passed the check.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* fixed make error

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* fixed sample.json

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* removed "2020" from the License.txt file due to update guidelines.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added comment regarding where you can find dpc_common in both files per Paul's comments.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* Modified names of the functions to represent what they do (ie. calc_pi_*) per suggestion from Paul.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* initial check-in to the C++ repo

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* put correct comment on dpc_common.hpp

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added commenting indicating where they can find corresponding include files.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added comment line

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* removed openMP repo from DPC++ as it will be moved to C++ directory

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* fixed category line in sample.json to match exact text expected.

* removing openMP from the DPC directory.  It has been moved to C++ directory.

* fixed tf_init call

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* removed all calls into PSTL internal logic.  This is what was causing fails between beta08 and beta09.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* fixed env variable to run on CPU

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* update Readme file to include information about setting
env variable to allocate more memory for any runs
on the cpu

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* added option in Cmake file to support unnamed lambda option.   You need this to compile if the environment doesn't have this set by default.

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* path to output file from compile has changed.  it no longer seems to create the src directory.

* started to remove get_access and change it to accessor name()

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* fixed remaining get_access

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* removed commented out old code

Signed-off-by: todd.erdner <todd.erdner@intel.com>

* Fixed path in Cmakelists.txt to suport both beta10 and beta09.  The location of the oneDPL
library changed between the two releases.

* Update CMakeLists.txt

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* Added new Sample (TensorFlow Multinode Training with Horovod) (#197)

* Added new Sample (TensorFlow Multinode Training with Horovod)

Signed-off-by: Shailen Sobhee <shailen.sobhee@intel.com>

* Fixed assert reported by bandit code checker tool.

Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com>

* Fix CI issue (MPI bug) - Upload to new folder structure

Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com>

* Minor little fix in sample.json; A comma was missing.

Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com>

* Removed old references to old folder structure

Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com>

* Update third_party_programs.txt (#221)

* Updating License  file to no date in the title /*
 * Copyright (c) 2020 Intel Corporation
 *
 * This program and the accompanying materials are made available under the
 * terms of the The MIT License which is available at
 * https://opensource.org/licenses/MIT.
 *
 * SPDX-License-Identifier: MIT
 */

* Update README.md

* Fix FPGA entries

* Update README.md

Updates per request of sranikonda

* Update README.md

* removing duplicate samples after transfering to dwarves folders

* Update Makefile.win

changing compiler name from "dpcpp-cl" to "dpcpp"

* Update Makefile.win

* Update Makefile.win.fpga

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update README.md

* Update README.md

* Update from Legal Approval of 10/05/2020

Co-authored-by: akertesz <67655634+akertesz@users.noreply.github.com>

* Update Buffers/Accessors according to latest coding guidelines (Matrix_multiply Advisor and VTune). (#215)

* TBB Samples Migration

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Addressing PR Change Requests

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Fill in "Purpose" Section of both README files.

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Remove binary and build files

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* include dpc_common header, remove exception handler, fix json files. (all changes apply to both samples)

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* include dpc_common headers, remove exception handlers (both samples)

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Fix README files, include header files for windows

* Remove namespace, end files, use "std::iota", fix README

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* fix README

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Fix "matrix_multiply" samples failures on Windows.

* buffer/accessor updates for coding guidelines (matrix mul).

Co-authored-by: root <root@dtc-nuc-03l.jf.intel.com>

* oneMKL sample updates for beta10 (#213)

* Jupyter notebooks update as per the latest guidelines (#223)

* updated the simplied version of the accessors, used auto for parallel_for
index

* using vector.size() instead of the global variables as per the comments

* fixed the typo. Also check the output vector size

* Updated Readme to add the include files path for dpc_common.hpp
Updated the cpp file with the comments on dev_utilities folder

* Updated the Jupyter notebooks as per the beta10 guidelines <praveen.k.kundurthy@intel.com>

* removed sample.json as these are jupyter notebooks <praveen.k.kundurthy@intel.com>

* removed some checkpoint files that are not necessary <praveen.k.kundurthy@intel.com>

* removed unwanted files <praveen.k.kundurthy@intel.com>

* removed unwanted checkpoint files <praveen.k.kundurthy@intel.com>

* Samples: block APSP and merge SPMV (#219)

* Update simple add sample

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Update make files

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Update fpga make file

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Add dpc_common.hpp

* Update sample.json

* Fix Makefile.win

* Update Makefile.win

* Update sample.json

* Remove dpc_common.hpp

* Update VS project file

* Update README.md

* Update sample.json

* Add stb

* Update read me file

* Initial commit

* Update License.txt

* Change location of matrix multiplication sample

* Fix matrix mul sample VS project file

* Update samples for beta10 release

* Fix for Windows

* Fix for FPGA

* Fix for FPGA

* Fix for FPGA to support both beta09 and beta10

* Add header comment

* Samples: block apsp and merge spmv

* Add readme files

* Update readme file

* Update sample.json

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* move TF GS sample to new folder structure according to Saumya's direction (#227)

* Update sample.json (#228)

* Update simple add sample

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Update make files

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Update fpga make file

Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com>

* Add dpc_common.hpp

* Update sample.json

* Fix Makefile.win

* Update Makefile.win

* Update sample.json

* Remove dpc_common.hpp

* Update VS project file

* Update README.md

* Update sample.json

* Add stb

* Update read me file

* Initial commit

* Update License.txt

* Change location of matrix multiplication sample

* Fix matrix mul sample VS project file

* Update samples for beta10 release

* Fix for Windows

* Fix for FPGA

* Fix for FPGA

* Fix for FPGA to support both beta09 and beta10

* Add header comment

* Samples: block apsp and merge spmv

* Add readme files

* Update readme file

* Update sample.json

* Update sample.json

Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>

* Edit for readme and some json files (#229)

* Updating License  file to no date in the title /*
 * Copyright (c) 2020 Intel Corporation
 *
 * This program and the accompanying materials are made available under the
 * terms of the The MIT License which is available at
 * https://opensource.org/licenses/MIT.
 *
 * SPDX-License-Identifier: MIT
 */

* Update README.md

* Fix FPGA entries

* Update README.md

Updates per request of sranikonda

* Update README.md

* removing duplicate samples after transfering to dwarves folders

* Update Makefile.win

changing compiler name from "dpcpp-cl" to "dpcpp"

* Update Makefile.win

* Update Makefile.win.fpga

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update README.md

* Update README.md

* Update from Legal Approval of 10/05/2020

* Create README.md

* Add files via upload

* Update README.md

minor modifications to content, purpose and key implementation details.

* Update sample.json

aligned description with readme

* Update README.md

reshuffled parts of the purpose and implementation details and abstracted a few key concepts into better summaries.

* Update sample.json

synched description with readme.

* Update README.md

Co-authored-by: akertesz <67655634+akertesz@users.noreply.github.com>
Co-authored-by: tomlenth <tom.f.lenth@intel.com>

* Changed folder structure (#220)

* Moved model zoo sample to new directory (#216)

* moved model zoo sample to new directory

* added runipy dependency installation

* added error handling

* minor fix

* Updating buffers/accessors for TBB Samples according to coding guidelines. Update CMake files to use defaults. (#230)

* TBB Samples Migration

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Addressing PR Change Requests

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Fill in "Purpose" Section of both README files.

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Remove binary and build files

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* include dpc_common header, remove exception handler, fix json files. (all changes apply to both samples)

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* include dpc_common headers, remove exception handlers (both samples)

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Fix README files, include header files for windows

* Remove namespace, end files, use "std::iota", fix README

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* fix README

Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com>

* Fix "matrix_multiply" samples failures on Windows.

* buffer/accessor updates for coding guidelines (matrix mul).

* Update buffers/accessors for TBB Samples. Update CMake files to use defaults.

Co-authored-by: root <root@dtc-nuc-03l.jf.intel.com>

* Update oneVPL samples for Beta10 (#218)

* Add computed_tomography sample (#212)

* create

* rm wrong file

* push missed files

* add ci prepare cmd

* add sudo in env

* fix the env by clone to private env

* fix env setting

* mv the ilit to new folder, clear the output of ipy

* rm temp files

* change structure

* rebase the update

* rm .gitkeep

* update for new API and config for ilit 1.0 in golden release

* update the script to prepare running env

* optimize for CPU to fix the latency of int8 low issue

* rm unused code

* fix the latency issue by script

* correct the file name in text

Co-authored-by: Zhang, Jianyu <jianyu.zhang@intel.com>
Co-authored-by: lqnguyen <loc.q.nguyen@intel.com>
Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com>
Co-authored-by: vmadananth <12753028+vmadananth@users.noreply.github.com>
Co-authored-by: slgogar <33332238+slgogar@users.noreply.github.com>
Co-authored-by: Moushumi <55515077+moushumi-maria@users.noreply.github.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: akertesz <67655634+akertesz@users.noreply.github.com>
Co-authored-by: terdner <todd.erdner@intel.com>
Co-authored-by: Shailen Sobhee <shailen.sobhee@gmail.com>
Co-authored-by: clevels <59889830+clevels@users.noreply.github.com>
Co-authored-by: root <root@dtc-nuc-03l.jf.intel.com>
Co-authored-by: petercad <48329794+petercad@users.noreply.github.com>
Co-authored-by: praveenkk123 <praveen.k.kundurthy@intel.com>
Co-authored-by: tomlenth <tom.f.lenth@intel.com>
Co-authored-by: Jing Xu <jing.xu@intel.com>
Co-authored-by: Jitendra Patil <jitendra.patil@intel.com>
Co-authored-by: Marc Valle <30421017+mav-intel@users.noreply.github.com>
  • Loading branch information
19 people committed Jan 6, 2021
1 parent 6c87110 commit 4362b3e
Show file tree
Hide file tree
Showing 2 changed files with 160 additions and 88 deletions.
Expand Up @@ -51,7 +51,7 @@
"source": [
"Import python packages and check version.\n",
"\n",
"Make sure the Tensorflow is **2.2** and iLiT, matplotlib are installed."
"Make sure the Tensorflow is **2.x** and iLiT, matplotlib are installed."
]
},
{
Expand Down Expand Up @@ -297,7 +297,6 @@
"def auto_tune(input_graph_path, yaml_config, batch_size): \n",
" fp32_graph = alexnet.load_pb(input_graph_path)\n",
" quan = ilit.Quantization(yaml_config)\n",
" assert(tuner)\n",
" dataloader = Dataloader(batch_size)\n",
" assert(dataloader)\n",
" q_model = quan(\n",
Expand Down Expand Up @@ -368,7 +367,11 @@
"source": [
"## Compare Quantized Model\n",
"\n",
"Define a function to return validation dataset and calculate the accuracy."
"We prepare a script **profiling_lpot.py** to test the performance of PB model.\n",
"\n",
"There is no correct performance data if run the code by jupyter notebook. So we run the script as process.\n",
"\n",
"Let learn **profiling_lpot.py**. "
]
},
{
Expand All @@ -377,30 +380,14 @@
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"\n",
"def val_data():\n",
" x_train, y_train, label_train, x_test, y_test,label_test = mnist_dataset.read_data()\n",
" return x_test, y_test, label_test\n",
"\n",
"def calc_accuracy(predictions, labels):\n",
" predictions = np.argmax(predictions, axis=1)\n",
" same = 0\n",
" for i, x in enumerate(predictions):\n",
" if x==labels[i]:\n",
" same += 1\n",
" if len(predictions)==0:\n",
" return 0\n",
" else:\n",
" return same/len(predictions)"
"!cat profiling_lpot.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define infer function to test the single frezon PB model."
"Execute the **profiling_lpot.py** with FP32 model file:"
]
},
{
Expand All @@ -409,75 +396,14 @@
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import tensorflow as tf\n",
"\n",
"def calc_accuracy(predictions, labels):\n",
" predictions = np.argmax(predictions, axis=1)\n",
" same = 0\n",
" for i, x in enumerate(predictions):\n",
" if x==labels[i]:\n",
" same += 1\n",
" if len(predictions)==0:\n",
" return 0\n",
" else:\n",
" return same/len(predictions)\n",
"\n",
"def get_concrete_function(graph_def, inputs, outputs, print_graph=False):\n",
" def imports_graph_def():\n",
" tf.compat.v1.import_graph_def(graph_def, name=\"\")\n",
"\n",
" wrap_function = tf.compat.v1.wrap_function(imports_graph_def, [])\n",
" graph = wrap_function.graph\n",
"\n",
" return wrap_function.prune(\n",
" tf.nest.map_structure(graph.as_graph_element, inputs),\n",
" tf.nest.map_structure(graph.as_graph_element, outputs))\n",
"\n",
"def infer_perf_pb(pb_model_file, inputs=[\"x:0\"], outputs=[\"Identity:0\"]):\n",
" q_model = alexnet.load_pb(pb_model_file)\n",
" concrete_function = get_concrete_function(graph_def=q_model.as_graph_def(),\n",
" inputs=inputs,\n",
" outputs=outputs,\n",
" print_graph=True)\n",
" x_test, y_test, label_test = val_data()\n",
"\n",
" bt = time.time()\n",
" _frozen_graph_predictions = concrete_function(x=tf.constant(x_test))[0]\n",
" et = time.time()\n",
"\n",
" accuracy = calc_accuracy(_frozen_graph_predictions, label_test)\n",
" print('accuracy:', accuracy)\n",
" throughput = x_test.shape[0] / (et - bt)\n",
" print('max throughput(fps):', throughput)\n",
"\n",
"\n",
" #latency when BS=1\n",
" bt = time.time()\n",
" times = 1000\n",
" for i in range(times):\n",
" _frozen_graph_predictions = concrete_function(x=tf.constant(x_test[:1]))[0]\n",
" et = time.time()\n",
"\n",
" latency = (et - bt) * 1000 / times\n",
" print('latency(ms):', latency)\n",
"\n",
" return accuracy, throughput, latency\n",
"\n",
"#warm up\n",
"_accuracy32, _throughput32, _latency32 = infer_perf_pb(fp32_frezon_pb_file)\n",
"\n",
"#test\n",
"accuracy32, throughput32, latency32 = infer_perf_pb(fp32_frezon_pb_file)\n",
"\n",
"accuracy8, throughput8, latency8 = infer_perf_pb(int8_pb_file)"
"!python profiling_lpot.py --input-graph=./fp32_frezon.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute the functions to get the performance data."
"Execute the **profiling_lpot.py** with int8 model file:"
]
},
{
Expand All @@ -486,6 +412,35 @@
"metadata": {},
"outputs": [],
"source": [
"!python profiling_lpot.py --input-graph=./alexnet_int8_model.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!cat 32.json\n",
"!echo \" \"\n",
"!cat 8.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute the functions to load and show the performance data from 32.json & 8.sjon."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"def autolabel(ax, rects):\n",
" \"\"\"\n",
" Attach a text label above each bar displaying its height\n",
Expand All @@ -506,10 +461,18 @@
" ax1.tick_params(axis='y', labelcolor=color)\n",
" autolabel(ax1, rects1)\n",
"\n",
"def load_res(json_file):\n",
" with open(json_file) as f:\n",
" data = json.load(f)\n",
" return data\n",
"\n",
"res_32 = load_res('32.json')\n",
"res_8 = load_res('8.json')\n",
" \n",
"accuracys = [res_32['accuracy'], res_8['accuracy']]\n",
"throughputs = [res_32['throughput'], res_8['throughput']] \n",
"latencys = [res_32['latency'], res_8['latency']]\n",
"\n",
"accuracys = [accuracy32, accuracy8]\n",
"throughputs = [throughput32, throughput8]\n",
"latencys = [latency32, latency8]\n",
"print('throughputs', throughputs)\n",
"print('latencys', latencys)\n",
"print('accuracys', accuracys)\n",
Expand Down Expand Up @@ -606,7 +569,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
"version": "3.6.9"
}
},
"nbformat": 4,
Expand Down
@@ -0,0 +1,109 @@

import tensorflow as tf
import numpy as np
import time
import argparse
import os
import json


import mnist_dataset
import alexnet


def val_data():
x_train, y_train, label_train, x_test, y_test, label_test = mnist_dataset.read_data()
return x_test, y_test, label_test


def calc_accuracy(predictions, labels):
predictions = np.argmax(predictions, axis=1)
same = 0
for i, x in enumerate(predictions):
if x == labels[i]:
same += 1
if len(predictions) == 0:
return 0
else:
return same / len(predictions)


def get_concrete_function(graph_def, inputs, outputs, print_graph=False):
def imports_graph_def():
tf.compat.v1.import_graph_def(graph_def, name="")

wrap_function = tf.compat.v1.wrap_function(imports_graph_def, [])
graph = wrap_function.graph

return wrap_function.prune(
tf.nest.map_structure(graph.as_graph_element, inputs),
tf.nest.map_structure(graph.as_graph_element, outputs))


def infer_perf_pb(pb_model_file, val_data, inputs=["x:0"], outputs=["Identity:0"]):
x_test, y_test, label_test = val_data
q_model = alexnet.load_pb(pb_model_file)
concrete_function = get_concrete_function(graph_def=q_model.as_graph_def(),
inputs=inputs,
outputs=outputs,
print_graph=True)

bt = time.time()
_frozen_graph_predictions = concrete_function(x=tf.constant(x_test))
et = time.time()

accuracy = calc_accuracy(_frozen_graph_predictions[0], label_test)
print('accuracy:', accuracy)
throughput = x_test.shape[0] / (et - bt)
print('max throughput(fps):', throughput)

# latency when BS=1
times = 1000
single_test = x_test[:1]

bt = 0
warmup = 20
for i in range(times):
if i == warmup:
bt = time.time()
_frozen_graph_predictions = concrete_function(x=tf.constant(single_test))
et = time.time()

latency = (et - bt) * 1000 / (times - warmup)
print('latency(ms):', latency)

return accuracy, throughput, latency


def save_res(result):
accuracy, throughput, latency = result
res = {}
res['accuracy'] = accuracy
res['throughput'] = throughput
res['latency'] = latency

outfile = args.index + ".json"
with open(outfile, 'w') as f:
json.dump(res, f)
print("Save result to {}".format(outfile))

parser = argparse.ArgumentParser()
parser.add_argument('--index', type=str, help='file name of output', required=True)

parser.add_argument('--input-graph', type=str, help='file name for graph', required=True)

parser.add_argument('--num-intra-threads', type=str, help='number of threads for an operator', required=False,
default="24" )
parser.add_argument('--num-inter-threads', type=str, help='number of threads across operators', required=False,
default="1")
parser.add_argument('--omp-num-threads', type=str, help='number of threads to use', required=False,
default="24")

args = parser.parse_args()
os.environ["KMP_BLOCKTIME"] = "1"
os.environ["KMP_SETTINGS"] = "0"
os.environ["OMP_NUM_THREADS"] = args.omp_num_threads
os.environ["TF_NUM_INTEROP_THREADS"] = args.num_inter_threads
os.environ["TF_NUM_INTRAOP_THREADS"] = args.num_intra_threads

save_res(infer_perf_pb(args.input_graph, val_data()))

0 comments on commit 4362b3e

Please sign in to comment.