Fix latency of INT8 is slow issue (#389)

* create * rm wrong file * push missed files * add ci prepare cmd * add sudo in env * fix the env by clone to private env * fix env setting * mv the ilit to new folder, clear the output of ipy * rm temp files * Lqnguyen branch3 (#210) * Add bitonic-sort sample. * Add a note about common file in README. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Move 1d_HeatTransfer sample to open source GitHub. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Updating License file to remove date * Adding Buffer Object approach. * Add comment about the location of dpc_common.hpp. * New sample: Prefix Sum. * Remove new sample. * New code sample PrefixSum in ParallelPatterns. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Integrate MPI code sample with dpc_reduce code sample. * Update README.md * Update main.cpp * Integrate MPI with latest dpc_reduce for beta09. * Update README.md * Update main.cpp * Update main.cpp * Update README.md * Update CXX to icpx and compiler option for beta09. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Add "export I_MPI_CXX=dpcpp" in sample.json file. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Update json file. * Sync with master. * Update bitonic-sort code sample according to the latest guideline. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * Lqnguyen branch1 (#201) * Add bitonic-sort sample. * Add a note about common file in README. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Move 1d_HeatTransfer sample to open source GitHub. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Updating License file to remove date * Adding Buffer Object approach. * Add comment about the location of dpc_common.hpp. * New sample: Prefix Sum. * Remove new sample. * New code sample PrefixSum in ParallelPatterns. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Integrate MPI code sample with dpc_reduce code sample. * Update README.md * Update main.cpp * Integrate MPI with latest dpc_reduce for beta09. * Update README.md * Update main.cpp * Update main.cpp * Update README.md * Update CXX to icpx and compiler option for beta09. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Add "export I_MPI_CXX=dpcpp" in sample.json file. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Update json file. * Sync with master. * Update the PrefixSum code sample according to the latest guidelines. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Update based on comments from reviewer. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Restructure the Usage function. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * Lqnguyen branch2 (#209) * Add bitonic-sort sample. * Add a note about common file in README. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Move 1d_HeatTransfer sample to open source GitHub. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Updating License file to remove date * Adding Buffer Object approach. * Add comment about the location of dpc_common.hpp. * New sample: Prefix Sum. * Remove new sample. * New code sample PrefixSum in ParallelPatterns. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Integrate MPI code sample with dpc_reduce code sample. * Update README.md * Update main.cpp * Integrate MPI with latest dpc_reduce for beta09. * Update README.md * Update main.cpp * Update main.cpp * Update README.md * Update CXX to icpx and compiler option for beta09. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Add "export I_MPI_CXX=dpcpp" in sample.json file. Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> * Update json file. * Sync with master. * Update 1d_HeatTransfer code sample according to the new guideline. * Add comment about dpc_common.hpp . Signed-off-by: Loc Nguyen <loc.q.nguyen@intel.com> Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * namespace change for montecarlo (#208) * Adding mandelbrot sample to the repository Signed-off-by: vmadanan <varsha.madananth@intel.com> * Adding changes to mandelbrot to remove libsycl-complex.so dependency * namespace change for Monte Carlo * Updated samples to newest coding guidelines * Updating samples- Mandelbrot, DCT and MonteCarlo with newest coding guidelines * Adding changes to buffer and accessor declarations (#214) * Initial commit for iso3dfd_dpcpp code sample Signed-off-by: Gogar, Sunny L <sunny.l.gogar@intel.com> * Update License.txt * Update sample.json * Adding iso3dfd_omp_offload and changing dpc++ compile for windows to dpcpp * Delete .nfs000000043228fc3f00000140 * Removing build directory accidently checked in * Update sample.json Fixing a missing comma * Adding couple of changes as per Paul's recommendation * Updating some variable names as per guidelines * Moving iso3dfd_omp_offload to C++ folder * Fixing a windows related error about missing std:: for tranform * Adding algorithm header explicity in iso3dfd.h * Fixing the sample.json to eliminate recent errors * Adding changes to buffer and accessor declarations * Update samples for beta10 release (#207) * Update simple add sample Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Update make files Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Update fpga make file Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Add dpc_common.hpp * Update sample.json * Fix Makefile.win * Update Makefile.win * Update sample.json * Remove dpc_common.hpp * Update VS project file * Update README.md * Update sample.json * Add stb * Update read me file * Initial commit * Update License.txt * Change location of matrix multiplication sample * Fix matrix mul sample VS project file * Update samples for beta10 release * Fix for Windows * Fix for FPGA * Fix for FPGA * Fix for FPGA to support both beta09 and beta10 * Add header comment Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * folder structures changes following saumya's request (#217) * Beta10 GZIP performance update (#204) * Beta10 GZIP update -- use USM for data transfer Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com> * Trivial change to re-trigger CI Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com> * Update top level README (#222) * Update top-level README and improve format Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com> * Minor formatting update Signed-off-by: Audrey Kertesz <audrey.kertesz@intel.com> * Fix path to oneDPL for Beta10 (#224) * initial commit of openMP example. Signed-off-by: todd.erdner <todd.erdner@intel.com> * Initial commit of the dpc_reduce Signed-off-by: todd.erdner <todd.erdner@intel.com> * added guid to sample.json Signed-off-by: todd.erdner <todd.erdner@intel.com> * fixed sample.json files. Signed-off-by: todd.erdner <todd.erdner@intel.com> * fixed the include files. Somehow I copied a slightly old repo and it still had <chrono> and the omp_common.hpp file. They have been removed. Signed-off-by: todd.erdner <todd.erdner@intel.com> * added license.txt file ran through formating tool one more time removed all calls to "std::endl" and replaced with " \n" Signed-off-by: todd.erdner <todd.erdner@intel.com> * renamed license.txt to License.txt Signed-off-by: todd.erdner <todd.erdner@intel.com> * added "ciTests" to the sample.json file. It passed the check. Signed-off-by: todd.erdner <todd.erdner@intel.com> * fixed make error Signed-off-by: todd.erdner <todd.erdner@intel.com> * fixed sample.json Signed-off-by: todd.erdner <todd.erdner@intel.com> * removed "2020" from the License.txt file due to update guidelines. Signed-off-by: todd.erdner <todd.erdner@intel.com> * added comment regarding where you can find dpc_common in both files per Paul's comments. Signed-off-by: todd.erdner <todd.erdner@intel.com> * Modified names of the functions to represent what they do (ie. calc_pi_*) per suggestion from Paul. Signed-off-by: todd.erdner <todd.erdner@intel.com> * initial check-in to the C++ repo Signed-off-by: todd.erdner <todd.erdner@intel.com> * put correct comment on dpc_common.hpp Signed-off-by: todd.erdner <todd.erdner@intel.com> * added commenting indicating where they can find corresponding include files. Signed-off-by: todd.erdner <todd.erdner@intel.com> * added comment line Signed-off-by: todd.erdner <todd.erdner@intel.com> * removed openMP repo from DPC++ as it will be moved to C++ directory * Update README.md * Update README.md * Update README.md * Update README.md * fixed category line in sample.json to match exact text expected. * removing openMP from the DPC directory. It has been moved to C++ directory. * fixed tf_init call Signed-off-by: todd.erdner <todd.erdner@intel.com> * removed all calls into PSTL internal logic. This is what was causing fails between beta08 and beta09. Signed-off-by: todd.erdner <todd.erdner@intel.com> * fixed env variable to run on CPU Signed-off-by: todd.erdner <todd.erdner@intel.com> * update Readme file to include information about setting env variable to allocate more memory for any runs on the cpu Signed-off-by: todd.erdner <todd.erdner@intel.com> * added option in Cmake file to support unnamed lambda option. You need this to compile if the environment doesn't have this set by default. Signed-off-by: todd.erdner <todd.erdner@intel.com> * path to output file from compile has changed. it no longer seems to create the src directory. * started to remove get_access and change it to accessor name() Signed-off-by: todd.erdner <todd.erdner@intel.com> * fixed remaining get_access Signed-off-by: todd.erdner <todd.erdner@intel.com> * removed commented out old code Signed-off-by: todd.erdner <todd.erdner@intel.com> * Fixed path in Cmakelists.txt to suport both beta10 and beta09. The location of the oneDPL library changed between the two releases. * Update CMakeLists.txt Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * Added new Sample (TensorFlow Multinode Training with Horovod) (#197) * Added new Sample (TensorFlow Multinode Training with Horovod) Signed-off-by: Shailen Sobhee <shailen.sobhee@intel.com> * Fixed assert reported by bandit code checker tool. Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com> * Fix CI issue (MPI bug) - Upload to new folder structure Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com> * Minor little fix in sample.json; A comma was missing. Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com> * Removed old references to old folder structure Signed-off-by: Shailen Sobhee <shailen.sobhee@gmail.com> * Update third_party_programs.txt (#221) * Updating License file to no date in the title /* * Copyright (c) 2020 Intel Corporation * * This program and the accompanying materials are made available under the * terms of the The MIT License which is available at * https://opensource.org/licenses/MIT. * * SPDX-License-Identifier: MIT */ * Update README.md * Fix FPGA entries * Update README.md Updates per request of sranikonda * Update README.md * removing duplicate samples after transfering to dwarves folders * Update Makefile.win changing compiler name from "dpcpp-cl" to "dpcpp" * Update Makefile.win * Update Makefile.win.fpga * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update README.md * Update README.md * Update from Legal Approval of 10/05/2020 Co-authored-by: akertesz <67655634+akertesz@users.noreply.github.com> * Update Buffers/Accessors according to latest coding guidelines (Matrix_multiply Advisor and VTune). (#215) * TBB Samples Migration Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Addressing PR Change Requests Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Fill in "Purpose" Section of both README files. Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Remove binary and build files Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * include dpc_common header, remove exception handler, fix json files. (all changes apply to both samples) Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * include dpc_common headers, remove exception handlers (both samples) Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Fix README files, include header files for windows * Remove namespace, end files, use "std::iota", fix README Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * fix README Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Fix "matrix_multiply" samples failures on Windows. * buffer/accessor updates for coding guidelines (matrix mul). Co-authored-by: root <root@dtc-nuc-03l.jf.intel.com> * oneMKL sample updates for beta10 (#213) * Jupyter notebooks update as per the latest guidelines (#223) * updated the simplied version of the accessors, used auto for parallel_for index * using vector.size() instead of the global variables as per the comments * fixed the typo. Also check the output vector size * Updated Readme to add the include files path for dpc_common.hpp Updated the cpp file with the comments on dev_utilities folder * Updated the Jupyter notebooks as per the beta10 guidelines <praveen.k.kundurthy@intel.com> * removed sample.json as these are jupyter notebooks <praveen.k.kundurthy@intel.com> * removed some checkpoint files that are not necessary <praveen.k.kundurthy@intel.com> * removed unwanted files <praveen.k.kundurthy@intel.com> * removed unwanted checkpoint files <praveen.k.kundurthy@intel.com> * Samples: block APSP and merge SPMV (#219) * Update simple add sample Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Update make files Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Update fpga make file Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Add dpc_common.hpp * Update sample.json * Fix Makefile.win * Update Makefile.win * Update sample.json * Remove dpc_common.hpp * Update VS project file * Update README.md * Update sample.json * Add stb * Update read me file * Initial commit * Update License.txt * Change location of matrix multiplication sample * Fix matrix mul sample VS project file * Update samples for beta10 release * Fix for Windows * Fix for FPGA * Fix for FPGA * Fix for FPGA to support both beta09 and beta10 * Add header comment * Samples: block apsp and merge spmv * Add readme files * Update readme file * Update sample.json Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * move TF GS sample to new folder structure according to Saumya's direction (#227) * Update sample.json (#228) * Update simple add sample Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Update make files Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Update fpga make file Signed-off-by: Maria, Moushumi <moushumi.maria@intel.com> * Add dpc_common.hpp * Update sample.json * Fix Makefile.win * Update Makefile.win * Update sample.json * Remove dpc_common.hpp * Update VS project file * Update README.md * Update sample.json * Add stb * Update read me file * Initial commit * Update License.txt * Change location of matrix multiplication sample * Fix matrix mul sample VS project file * Update samples for beta10 release * Fix for Windows * Fix for FPGA * Fix for FPGA * Fix for FPGA to support both beta09 and beta10 * Add header comment * Samples: block apsp and merge spmv * Add readme files * Update readme file * Update sample.json * Update sample.json Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> * Edit for readme and some json files (#229) * Updating License file to no date in the title /* * Copyright (c) 2020 Intel Corporation * * This program and the accompanying materials are made available under the * terms of the The MIT License which is available at * https://opensource.org/licenses/MIT. * * SPDX-License-Identifier: MIT */ * Update README.md * Fix FPGA entries * Update README.md Updates per request of sranikonda * Update README.md * removing duplicate samples after transfering to dwarves folders * Update Makefile.win changing compiler name from "dpcpp-cl" to "dpcpp" * Update Makefile.win * Update Makefile.win.fpga * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update README.md * Update README.md * Update from Legal Approval of 10/05/2020 * Create README.md * Add files via upload * Update README.md minor modifications to content, purpose and key implementation details. * Update sample.json aligned description with readme * Update README.md reshuffled parts of the purpose and implementation details and abstracted a few key concepts into better summaries. * Update sample.json synched description with readme. * Update README.md Co-authored-by: akertesz <67655634+akertesz@users.noreply.github.com> Co-authored-by: tomlenth <tom.f.lenth@intel.com> * Changed folder structure (#220) * Moved model zoo sample to new directory (#216) * moved model zoo sample to new directory * added runipy dependency installation * added error handling * minor fix * Updating buffers/accessors for TBB Samples according to coding guidelines. Update CMake files to use defaults. (#230) * TBB Samples Migration Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Addressing PR Change Requests Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Fill in "Purpose" Section of both README files. Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Remove binary and build files Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * include dpc_common header, remove exception handler, fix json files. (all changes apply to both samples) Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * include dpc_common headers, remove exception handlers (both samples) Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Fix README files, include header files for windows * Remove namespace, end files, use "std::iota", fix README Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * fix README Signed-off-by: root <root@dtc-nuc-03l.jf.intel.com> * Fix "matrix_multiply" samples failures on Windows. * buffer/accessor updates for coding guidelines (matrix mul). * Update buffers/accessors for TBB Samples. Update CMake files to use defaults. Co-authored-by: root <root@dtc-nuc-03l.jf.intel.com> * Update oneVPL samples for Beta10 (#218) * Add computed_tomography sample (#212) * create * rm wrong file * push missed files * add ci prepare cmd * add sudo in env * fix the env by clone to private env * fix env setting * mv the ilit to new folder, clear the output of ipy * rm temp files * change structure * rebase the update * rm .gitkeep * update for new API and config for ilit 1.0 in golden release * update the script to prepare running env * optimize for CPU to fix the latency of int8 low issue * rm unused code * fix the latency issue by script * correct the file name in text Co-authored-by: Zhang, Jianyu <jianyu.zhang@intel.com> Co-authored-by: lqnguyen <loc.q.nguyen@intel.com> Co-authored-by: JoeOster <52936608+JoeOster@users.noreply.github.com> Co-authored-by: vmadananth <12753028+vmadananth@users.noreply.github.com> Co-authored-by: slgogar <33332238+slgogar@users.noreply.github.com> Co-authored-by: Moushumi <55515077+moushumi-maria@users.noreply.github.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: akertesz <67655634+akertesz@users.noreply.github.com> Co-authored-by: terdner <todd.erdner@intel.com> Co-authored-by: Shailen Sobhee <shailen.sobhee@gmail.com> Co-authored-by: clevels <59889830+clevels@users.noreply.github.com> Co-authored-by: root <root@dtc-nuc-03l.jf.intel.com> Co-authored-by: petercad <48329794+petercad@users.noreply.github.com> Co-authored-by: praveenkk123 <praveen.k.kundurthy@intel.com> Co-authored-by: tomlenth <tom.f.lenth@intel.com> Co-authored-by: Jing Xu <jing.xu@intel.com> Co-authored-by: Jitendra Patil <jitendra.patil@intel.com> Co-authored-by: Marc Valle <30421017+mav-intel@users.noreply.github.com>
oneapi-src · Jan 6, 2021 · 4362b3e · 4362b3e
1 parent 6c87110
commit 4362b3e
Show file tree

Hide file tree

Showing 2 changed files with 160 additions and 88 deletions.
diff --git a/...Analytics/Getting-Started-Samples/iLiT-Sample-for-Tensorflow/ilit_sample_tensorflow.ipynb b/...Analytics/Getting-Started-Samples/iLiT-Sample-for-Tensorflow/ilit_sample_tensorflow.ipynb
@@ -51,7 +51,7 @@
    "source": [
     "Import python packages and check version.\n",
     "\n",
-    "Make sure the Tensorflow is **2.2** and iLiT, matplotlib are installed."
+    "Make sure the Tensorflow is **2.x** and iLiT, matplotlib are installed."
    ]
   },
   {
@@ -297,7 +297,6 @@
     "def auto_tune(input_graph_path, yaml_config, batch_size):    \n",
     "    fp32_graph = alexnet.load_pb(input_graph_path)\n",
     "    quan = ilit.Quantization(yaml_config)\n",
-    "    assert(tuner)\n",
     "    dataloader = Dataloader(batch_size)\n",
     "    assert(dataloader)\n",
     "    q_model = quan(\n",
@@ -368,7 +367,11 @@
    "source": [
     "## Compare Quantized Model\n",
     "\n",
-    "Define a function to return validation dataset and calculate the accuracy."
+    "We prepare a script **profiling_lpot.py** to test the performance of PB model.\n",
+    "\n",
+    "There is no correct performance data if run the code by jupyter notebook. So we run the script as process.\n",
+    "\n",
+    "Let learn **profiling_lpot.py**. "
    ]
   },
   {
@@ -377,30 +380,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import time\n",
-    "\n",
-    "\n",
-    "def val_data():\n",
-    "    x_train, y_train, label_train, x_test, y_test,label_test = mnist_dataset.read_data()\n",
-    "    return x_test, y_test, label_test\n",
-    "\n",
-    "def calc_accuracy(predictions, labels):\n",
-    "    predictions = np.argmax(predictions, axis=1)\n",
-    "    same = 0\n",
-    "    for i, x in enumerate(predictions):\n",
-    "        if x==labels[i]:\n",
-    "            same += 1\n",
-    "    if len(predictions)==0:\n",
-    "        return 0\n",
-    "    else:\n",
-    "        return same/len(predictions)"
+    "!cat profiling_lpot.py"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Define infer function to test the single frezon PB model."
+    "Execute the **profiling_lpot.py** with FP32 model file:"
    ]
   },
   {
@@ -409,75 +396,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "import tensorflow as tf\n",
-    "\n",
-    "def calc_accuracy(predictions, labels):\n",
-    "    predictions = np.argmax(predictions, axis=1)\n",
-    "    same = 0\n",
-    "    for i, x in enumerate(predictions):\n",
-    "        if x==labels[i]:\n",
-    "            same += 1\n",
-    "    if len(predictions)==0:\n",
-    "        return 0\n",
-    "    else:\n",
-    "        return same/len(predictions)\n",
-    "\n",
-    "def get_concrete_function(graph_def, inputs, outputs, print_graph=False):\n",
-    "    def imports_graph_def():\n",
-    "        tf.compat.v1.import_graph_def(graph_def, name=\"\")\n",
-    "\n",
-    "    wrap_function = tf.compat.v1.wrap_function(imports_graph_def, [])\n",
-    "    graph = wrap_function.graph\n",
-    "\n",
-    "    return wrap_function.prune(\n",
-    "        tf.nest.map_structure(graph.as_graph_element, inputs),\n",
-    "        tf.nest.map_structure(graph.as_graph_element, outputs))\n",
-    "\n",
-    "def infer_perf_pb(pb_model_file, inputs=[\"x:0\"], outputs=[\"Identity:0\"]):\n",
-    "    q_model = alexnet.load_pb(pb_model_file)\n",
-    "    concrete_function = get_concrete_function(graph_def=q_model.as_graph_def(),\n",
-    "                                              inputs=inputs,\n",
-    "                                              outputs=outputs,\n",
-    "                                              print_graph=True)\n",
-    "    x_test, y_test, label_test = val_data()\n",
-    "\n",
-    "    bt = time.time()\n",
-    "    _frozen_graph_predictions = concrete_function(x=tf.constant(x_test))[0]\n",
-    "    et = time.time()\n",
-    "\n",
-    "    accuracy = calc_accuracy(_frozen_graph_predictions, label_test)\n",
-    "    print('accuracy:', accuracy)\n",
-    "    throughput = x_test.shape[0] / (et - bt)\n",
-    "    print('max throughput(fps):', throughput)\n",
-    "\n",
-    "\n",
-    "    #latency when BS=1\n",
-    "    bt = time.time()\n",
-    "    times = 1000\n",
-    "    for i in range(times):\n",
-    "        _frozen_graph_predictions = concrete_function(x=tf.constant(x_test[:1]))[0]\n",
-    "    et = time.time()\n",
-    "\n",
-    "    latency = (et - bt) * 1000 / times\n",
-    "    print('latency(ms):', latency)\n",
-    "\n",
-    "    return accuracy, throughput, latency\n",
-    "\n",
-    "#warm up\n",
-    "_accuracy32, _throughput32, _latency32 = infer_perf_pb(fp32_frezon_pb_file)\n",
-    "\n",
-    "#test\n",
-    "accuracy32, throughput32, latency32 = infer_perf_pb(fp32_frezon_pb_file)\n",
-    "\n",
-    "accuracy8, throughput8, latency8 = infer_perf_pb(int8_pb_file)"
+    "!python profiling_lpot.py --input-graph=./fp32_frezon.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Execute the functions to get the performance data."
+    "Execute the **profiling_lpot.py** with int8 model file:"
    ]
   },
   {
@@ -486,6 +412,35 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "!python profiling_lpot.py --input-graph=./alexnet_int8_model.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cat 32.json\n",
+    "!echo \" \"\n",
+    "!cat 8.json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Execute the functions to load and show the performance data from 32.json & 8.sjon."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
     "def autolabel(ax, rects):\n",
     "    \"\"\"\n",
     "    Attach a text label above each bar displaying its height\n",
@@ -506,10 +461,18 @@
     "    ax1.tick_params(axis='y', labelcolor=color)\n",
     "    autolabel(ax1, rects1)\n",
     "\n",
+    "def load_res(json_file):\n",
+    "    with open(json_file) as f:\n",
+    "        data = json.load(f)\n",
+    "        return data\n",
+    "\n",
+    "res_32 = load_res('32.json')\n",
+    "res_8 = load_res('8.json')\n",
+    "   \n",
+    "accuracys = [res_32['accuracy'], res_8['accuracy']]\n",
+    "throughputs = [res_32['throughput'], res_8['throughput']]             \n",
+    "latencys = [res_32['latency'], res_8['latency']]\n",
     "\n",
-    "accuracys = [accuracy32, accuracy8]\n",
-    "throughputs = [throughput32, throughput8]\n",
-    "latencys = [latency32, latency8]\n",
     "print('throughputs', throughputs)\n",
     "print('latencys', latencys)\n",
     "print('accuracys', accuracys)\n",
@@ -606,7 +569,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.7"
+   "version": "3.6.9"
   }
  },
  "nbformat": 4,

diff --git a/AI-and-Analytics/Getting-Started-Samples/iLiT-Sample-for-Tensorflow/profiling_lpot.py b/AI-and-Analytics/Getting-Started-Samples/iLiT-Sample-for-Tensorflow/profiling_lpot.py
@@ -0,0 +1,109 @@
+
+import tensorflow as tf
+import numpy as np
+import time
+import argparse
+import os
+import json
+
+
+import mnist_dataset
+import alexnet
+
+
+def val_data():
+    x_train, y_train, label_train, x_test, y_test, label_test = mnist_dataset.read_data()
+    return x_test, y_test, label_test
+
+
+def calc_accuracy(predictions, labels):
+    predictions = np.argmax(predictions, axis=1)
+    same = 0
+    for i, x in enumerate(predictions):
+        if x == labels[i]:
+            same += 1
+    if len(predictions) == 0:
+        return 0
+    else:
+        return same / len(predictions)
+
+
+def get_concrete_function(graph_def, inputs, outputs, print_graph=False):
+    def imports_graph_def():
+        tf.compat.v1.import_graph_def(graph_def, name="")
+
+    wrap_function = tf.compat.v1.wrap_function(imports_graph_def, [])
+    graph = wrap_function.graph
+
+    return wrap_function.prune(
+        tf.nest.map_structure(graph.as_graph_element, inputs),
+        tf.nest.map_structure(graph.as_graph_element, outputs))
+
+
+def infer_perf_pb(pb_model_file, val_data, inputs=["x:0"], outputs=["Identity:0"]):
+    x_test, y_test, label_test = val_data
+    q_model = alexnet.load_pb(pb_model_file)
+    concrete_function = get_concrete_function(graph_def=q_model.as_graph_def(),
+                                              inputs=inputs,
+                                              outputs=outputs,
+                                              print_graph=True)
+
+    bt = time.time()
+    _frozen_graph_predictions = concrete_function(x=tf.constant(x_test))
+    et = time.time()
+
+    accuracy = calc_accuracy(_frozen_graph_predictions[0], label_test)
+    print('accuracy:', accuracy)
+    throughput = x_test.shape[0] / (et - bt)
+    print('max throughput(fps):', throughput)
+
+    # latency when BS=1
+    times = 1000
+    single_test = x_test[:1]
+
+    bt = 0
+    warmup = 20
+    for i in range(times):
+        if i == warmup:
+            bt = time.time()
+        _frozen_graph_predictions = concrete_function(x=tf.constant(single_test))
+    et = time.time()
+
+    latency = (et - bt) * 1000 / (times - warmup)
+    print('latency(ms):', latency)
+
+    return accuracy, throughput, latency
+
+
+def save_res(result):
+    accuracy, throughput, latency = result
+    res = {}
+    res['accuracy'] = accuracy
+    res['throughput'] = throughput
+    res['latency'] = latency
+
+    outfile = args.index + ".json"
+    with open(outfile, 'w') as f:
+        json.dump(res, f)
+        print("Save result to {}".format(outfile))
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--index', type=str, help='file name of output', required=True)
+
+parser.add_argument('--input-graph', type=str, help='file name for graph', required=True)
+
+parser.add_argument('--num-intra-threads', type=str, help='number of threads for an operator', required=False,
+                    default="24" )
+parser.add_argument('--num-inter-threads', type=str, help='number of threads across operators', required=False,
+                    default="1")
+parser.add_argument('--omp-num-threads', type=str, help='number of threads to use', required=False,
+                    default="24")
+
+args = parser.parse_args()
+os.environ["KMP_BLOCKTIME"] = "1"
+os.environ["KMP_SETTINGS"] = "0"
+os.environ["OMP_NUM_THREADS"] = args.omp_num_threads
+os.environ["TF_NUM_INTEROP_THREADS"] = args.num_inter_threads
+os.environ["TF_NUM_INTRAOP_THREADS"] = args.num_intra_threads
+
+save_res(infer_perf_pb(args.input_graph, val_data()))