### Example: Sequential operations in Python
Consider the following Python code that demonstrates how a CPU would handle a series of sequential
operations, such as iterating through a list and performing a calculation on each item. Since
CPUs are optimized for single-threaded operations, this is a typical example of the type of task where
they excel.

In [2]:
numbers = [1,2,3,4,5,6,7,8,9]
squared_numbers = []

for number in numbers:
    squared_numbers.append(number ** 2)

print(squared_numbers)

[1, 4, 9, 16, 25, 36, 49, 64, 81]


In this case, the CPU performs each iteration of the loop one after the other in a linear sequence,
quickly handling each task.

Example 2: Parallel operations in Python using TensorFlow
In this example, we will demonstrate how to use TensorFlow to perform parallel matrix operations on a GPU. TensorFlow automatically detects available GPUs and offloads operations to them.

In [4]:
! pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.20.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Downloading absl_py-2.3.1-py3-none-any.whl.metadata (3.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-25.9.23-py2.py3-none-any.whl.metadata (875 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting google_pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting opt_einsum>=2.3.2 (from tensorflow)
  Downloading opt_einsum-3.4.0-py3-none-any.whl.metadata (6.3 kB)
Collecting pr

In [5]:
import tensorflow as tf
# Create a large matrix
matrix = tf.random.uniform((1000,1000))
print(matrix)
# Perform a matrix multiplication (parallelized on the GPU)
result = tf.matmul(matrix, matrix)
tf.print(result)

2025-09-26 00:18:22.287294: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


tf.Tensor(
[[0.13004518 0.84748614 0.2270875  ... 0.7458832  0.99783456 0.5731169 ]
 [0.08732259 0.70059323 0.23242676 ... 0.7435374  0.53951645 0.8577293 ]
 [0.10347593 0.14106536 0.81996095 ... 0.8034626  0.00634098 0.1147083 ]
 ...
 [0.30464292 0.79907846 0.77350307 ... 0.95706356 0.30141973 0.77926624]
 [0.3281932  0.5348165  0.3558544  ... 0.88482213 0.19720232 0.6675515 ]
 [0.8592788  0.0202378  0.61015797 ... 0.7103808  0.74298215 0.2031815 ]], shape=(1000, 1000), dtype=float32)
[[237.677307 242.357559 241.551941 ... 255.651245 249.1474 241.301529]
 [249.864243 245.160522 251.838593 ... 268.313385 253.124878 247.757324]
 [248.079971 247.996582 248.186905 ... 263.282562 257.262878 247.419128]
 ...
 [246.074677 249.738266 244.21106 ... 261.871704 258.541809 239.883972]
 [245.866302 251.497 249.791916 ... 267.468079 257.267151 246.39621]
 [233.316971 238.451 239.813828 ... 253.885117 243.204727 239.814]]


W0000 00:00:1758817104.139776   46871 gpu_device.cc:2342] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


In this case, TensorFlow automatically uses the GPU to accelerate the matrix multiplication.

## Example 3: Parallel operations in Python using PyTorch and GPU
This example demonstrates the use of a GPU to perform parallel operations using PyTorch, a popular deep-learning framework that provides GPU acceleration. In deep-learning, we perform matrix operations using PyTorch’s CUDA support to leverage the GPU.

In [None]:
! pip install torch
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

Collecting torch
  Downloading torch-2.6.0-cp313-cp313-manylinux1_x86_64.whl.metadata (28 kB)
Collecting filelock (from torch)
  Downloading filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB)
Collecting networkx (from torch)
  Downloading networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2025.9.0-py3-none-any.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadat

In [9]:
! uname -a

Darwin Sanjeebs-MacBook-Pro.local 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:29 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6000 arm64
