#  Getting Started with Intel(R) Extension for TensorFlow
This code sample will guide users how to run a tensorflow inference workload on both GPU and CPU by using oneAPI AI Analytics Toolkit and also analyze the GPU and CPU usage via oneDNN verbose logs.

## Resnet50 Inference on both GPU and CPU
***
This section shows users how to run resnet50 inference on both GPU and CPU without code changes.

### prerequisites

In [1]:
# ignore all warning messages
import warnings
warnings.filterwarnings('ignore')
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf

Set the installation path of your Intel(R) oneAPI AI Analytics toolkit

In [None]:
%env ONEAPI_INSTALL=/opt/intel/oneapi

Download the resnet50 inference sample from ITEX github repository

In [None]:
!wget https://raw.githubusercontent.com/intel/intel-extension-for-tensorflow/main/examples/infer_resnet50/infer_resnet50.py

Check TensorFlow* and ITEX verson in current ipython kernel

In [4]:
run ../../version_check.py

TensorTlow version:  2.10.0
MKL enabled : False
itex_version :  1.1.0
scikit learn Version:  1.1.1
neural_compressor version 1.14.2
Arch :  ICX|CLX


### Run resnet50 on GPU and CPU

#### Run on GPU via ITEX
With ITEX, users could run infer_resnet50.py on Intel dGPU without any code change.
There is a tensorflow-gpu conda environment with ITEX installation in current AI Kit installation.

In [None]:
%%writefile run.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh --force > /dev/null 2>&1
source activate user-tensorflow-gpu
echo "########## Executing the run"
DNNL_VERBOSE=1 python infer_resnet50.py > infer_rn50_gpu.csv
echo "########## Done with the run"

##### Submitting build.sh and run.sh to the job queue

Now we can submit build.sh and run.sh to the job queue.

Note that it is possible to execute any of the build and run commands in local environments.
To enable users to run their scripts either on the Intel DevCloud or in local environments, this and subsequent training checks for the existence of the job submission command qsub. If the check fails, it is assumed that build/run will be local.

In [None]:
! chmod 755 ../../q; chmod 755 run.sh;if [ -x "$(command -v qsub)" ];  then  ./q run.sh; else ./run.sh; fi

#### Run on CPU via Intel TensorFlow
Users also can run the same infer_resnet50.py on CPU with intel tensorflow or stock tensorflow. Please switch to the user-tensorflow jupyter kernel and execute again from prerequisites for CPU run

In [None]:
%%writefile run.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh --force > /dev/null 2>&1
source activate user-tensorflow
echo "########## Executing the run"
DNNL_VERBOSE=1 python infer_resnet50.py > infer_rn50_cpu.csv
echo "########## Done with the run"

##### Submitting build.sh and run.sh to the job queue

Now we can submit build.sh and run.sh to the job queue.

NOTE - it is possible to execute any of the build and run commands in local environments.
To enable users to run their scripts either on the Intel DevCloud or in local environments, this and subsequent training checks for the existence of the job submission command qsub. If the check fails, it is assumed that build/run will be local.

In [None]:
! chmod 755 ../../q; chmod 755 run.sh;if [ -x "$(command -v qsub)" ];  then  ./q run.sh; else ./run.sh; fi

## Analyze Verbose Logs
***


Download profile_utils.py to parse oneDNN verbose logs from previous section.

In [None]:
!wget https://raw.githubusercontent.com/oneapi-src/oneAPI-samples/master/Libraries/oneDNN/tutorials/profiling/profile_utils.py

### Step 1: List out all oneDNN verbose logs
users should see two verbose logs listed in the table below.

|Log File Name | Description |
|:-----|:----|
|infer_rn50_cpu.csv| log for cpu run |
|infer_rn50_gpu.csv| log for gpu run|

In [None]:
import os
filenames= os.listdir (".") 
result = []
keyword = ".csv"
for filename in filenames: 
    #if os.path.isdir(os.path.join(os.path.abspath("."), filename)): 
    if filename.find(keyword) != -1:
        result.append(filename)
result.sort()

index =0 
for folder in result:
    print(" %d : %s " %(index, folder))
    index+=1

### Step 2:  Pick a verbose log by putting its index value below
Users can pick either cpu or gpu log for analysis.   
Once users finish Step 2 to Step 7 for one log file, they can go back to step 2 and select another log file for analysis.

In [None]:
FdIndex=0

### Step 3: Parse verbose log and get the data back
> Users will also get a oneDNN.json file with timeline information for oneDNN primitives. 

In [None]:
logfile = result[FdIndex]
print(logfile)
from profile_utils import oneDNNUtils, oneDNNLog
onednn = oneDNNUtils()
log1 = oneDNNLog()
log1.load_log(logfile)
data = log1.data
exec_data = log1.exec_data

### Step 4: Time breakdown for exec type
The exec type includes exec and create. 

|exec type | Description |  
|:-----|:----|  
|exec | Time for primitives exection. Better to spend most of time on primitives execution. |  
|create| Time for primitives creation. Primitives creation happens once. Better to spend less time on primitive creation. |  

### Step 5: Time breakdown for architecture type
The supported architectures include CPU and GPU.  
For this simple net sample, we don't split computation among CPU and GPU,    
so users should see either 100% CPU time or 100% GPU time. 

In [None]:
onednn.breakdown(exec_data,"arch","time")

### Step6: Time breakdown for primitives type
The primitives type includes convolution, reorder, sum, etc.  
For this simple convolution net example, convolution and inner product primitives are expected to spend most of time.  
However, the exact time percentage of different primitivies may vary among different architectures.    
Users can easily identify top hotpots of primitives executions with this time breakdown.  

In [None]:
onednn.breakdown(exec_data,"type","time")

### Step 7:  Time breakdown for JIT kernel type
oneDNN uses just-in-time compilation (JIT) to generate optimal code for some functions based on input parameters and instruction set supported by the system.   
Therefore, users can see different JIT kernel type among different CPU and GPU architectures.  
For example, users can see avx_core_vnni JIT kernel if the workload uses VNNI instruction on Cascake Lake platform.  
Users can also see different OCL kernels among different Intel GPU generations.  
Moreover, users can identify the top hotspots of JIT kernel executions with this time breakdown.  


In [None]:
onednn.breakdown(exec_data,"jit","time")

The output(both stdout and stderr) is displayed on the command line console

In [None]:
print('[CODE_SAMPLE_COMPLETED_SUCCESFULLY]')