The TensorFlow* Getting Started
sample demonstrates how to train a TensorFlow* model and run inference on Intel® hardware.
Property | Description |
---|---|
Category | Get Start Sample |
What you will learn | How to start using TensorFlow* on Intel® hardware. |
Time to complete | 10 minutes |
TensorFlow* is a widely-used machine learning framework in the deep learning arena, demanding efficient computational resource utilization. To take full advantage of Intel® architecture and to extract maximum performance, the TensorFlow* framework has been optimized using Intel® oneDNN primitives. This sample demonstrates how to train an example neural network and shows how Intel-optimized TensorFlow* enables Intel® oneDNN calls by default. Intel-optimized TensorFlow* is available as part of the Intel® AI Tools.
This sample code shows how to get started with TensorFlow*. It implements an example neural network with one convolution layer and one ReLU layer. You can build and train a TensorFlow* neural network using a simple Python code. Also, by controlling the build-in environment variable, this sample attempts to demonstrate explicitly how Intel® oneDNN Primitives are called and shows their performance during the neural network training.
Optimized for | Description |
---|---|
OS | Ubuntu* 22.0.4 and newer |
Hardware | Intel® Xeon® Scalable processor family |
Software | TensorFlow |
Note: AI and Analytics samples are validated on AI Tools Offline Installer. For the full list of validated platforms refer to Platform Validation.
The sample includes one python file: TensorFlow_HelloWorld.py. it implements a simple neural network's training and inference
- The training data is generated by
np.random
. - The neural network with one convolution layer and one ReLU layer is created by
tf.nn.conv2d
andtf.nn.relu
. - The TF session is initialized by
tf.global_variables_initializer
. - The train is implemented via the below for-loop:
for epoch in range(0, EPOCHNUM): for step in range(0, BS_TRAIN): x_batch = x_data[step*N:(step+1)*N, :, :, :] y_batch = y_data[step*N:(step+1)*N, :, :, :] s.run(train, feed_dict={x: x_batch, y: y_batch})
In order to show the harware information, you must export the environment variable export ONEDNN_VERBOSE=1
to display the deep learning primitives trace during execution.
Note: For convenience, code line os.environ["ONEDNN_VERBOSE"] = "1" has been added in the body of the script as an alternative method to setting this variable.
Runtime settings for ONEDNN_VERBOSE
, KMP_AFFINITY
, and Inter/Intra-op
Threads are set within the script. You can read more about these settings in this dedicated document: Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads.
The sample code is CPU based, but you can run it using Intel® Extension for TensorFlow* with Intel® Data Center GPU Flex Series. If you are using the Intel GPU, refer to Intel GPU Software Installation Guide. The sample should be able to run on GPU without any code changes.
For details, refer to the Quick Example on Intel CPU and GPU topic of the Intel® Extension for TensorFlow* documentation.
You will need to download and install the following toolkits, tools, and components to use the sample.
1. Get Intel® AI Tools
Required AI Tools: 'Intel® Extension for TensorFlow* - CPU'
If you have not already, select and install these Tools via AI Tools Selector. AI and Analytics samples are validated on AI Tools Offline Installer. It is recommended to select Offline Installer option in AI Tools Selector.
please see the supported versions.
Note: If Docker option is chosen in AI Tools Selector, refer to Working with Preset Containers to learn how to run the docker and samples.
2. (Offline Installer) Activate the AI Tools bundle base environment
If the default path is used during the installation of AI Tools:
source $HOME/intel/oneapi/intelpython/bin/activate
If a non-default path is used:
source <custom_path>/bin/activate
3. (Offline Installer) Activate relevant Conda environment
For the system with Intel CPU:
conda activate tensorflow
For the system with Intel GPU:
conda activate tensorflow-gpu
4. Clone the GitHub repository
git clone https://github.com/oneapi-src/oneAPI-samples.git
cd oneAPI-samples/AI-and-Analytics/Getting-Started-Samples/IntelTensorFlow_GettingStarted
Note: Before running the sample, make sure Environment Setup is completed. Go to the section which corresponds to the installation method chosen in AI Tools Selector to see relevant instructions:
python TensorFlow_HelloWorld.py
AI Tools Docker images already have Get Started samples pre-installed. Refer to Working with Preset Containers to learn how to run the docker and samples.
-
With the initial run, you should see results similar to the following:
0 0.4147554 1 0.3561021 2 0.33979267 3 0.33283564 4 0.32920069 [CODE_SAMPLE_COMPLETED_SUCCESSFULLY]
-
Export
ONEDNN_VERBOSE
as 1 in the command line. The oneDNN run-time verbose trace should look similar to the following:export ONEDNN_VERBOSE=1 Windows: set ONEDNN_VERBOSE=1
Note: The historical environment variables include
DNNL_VERBOSE
andMKLDNN_VERBOSE
. -
Run the sample again. You should see verbose results similar to the following:
2024-03-12 16:01:59.784340: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type CPU is enabled. onednn_verbose,info,oneDNN v3.2.0 (commit 8f2a00d86546e44501c61c38817138619febbb10) onednn_verbose,info,cpu,runtime:OpenMP,nthr:24 onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost onednn_verbose,info,gpu,runtime:none onednn_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time onednn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:cdba::f0 dst_f32:p:blocked:Acdb16a::f0,,,10x4x3x3,0.00195312 onednn_verbose,exec,cpu,convolution,brgconv:avx2,forward_training,src_f32::blocked:acdb::f0 wei_f32:ap:blocked:Acdb16a::f0 bia_f32::blocked:a::f0 dst_f32::blocked:acdb::f0,attr-scratchpad:user attr-post-ops:eltwise_relu ,alg:convolution_direct,mb4_ic4oc10_ih128oh128kh3sh1dh0ph1_iw128ow128kw3sw1dw0pw1,1.19702 onednn_verbose,exec,cpu,eltwise,jit:avx2,backward_data,data_f32::blocked:abcd::f0 diff_f32::blocked:abcd::f0,attr-scratchpad:user ,alg:eltwise_relu alpha:0 beta:0,4x128x128x10,0.112061 onednn_verbose,exec,cpu,convolution,jit:avx2,backward_weights,src_f32::blocked:acdb::f0 wei_f32:ap:blocked:ABcd8b8a::f0 bia_undef::undef::: dst_f32::blocked:acdb::f0,attr-scratchpad:user ,alg:convolution_direct,mb4_ic4oc10_ih128oh128kh3sh1dh0ph1_iw128ow128kw3sw1dw0pw1,0.358887 ...
Note: See the oneAPI Deep Neural Network Library Developer Guide and Reference for more details on the verbose log.
-
Troubleshooting
If you receive an error message, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the Diagnostics Utility for Intel® oneAPI Toolkits User Guide for more information on using the utility. or ask support from https://github.com/intel/intel-extension-for-tensorflow
Code samples are licensed under the MIT license. See License.txt for details.
Third party program Licenses can be found here: third-party-programs.txt
*Other names and brands may be claimed as the property of others. Trademarks