# What is Floating-Point Operations Per Second (FLOPS)?

### It is a major factor in the comparison of the computational power of different systems, especially in those, where numerical calculations are a key point. This article aims to give a basic understanding of what FLOPS is, the importance of FLOPS as well as explore how FLOPS affects computer performance.

### A floating-point operation means that the arithmetic mathematical computation is accomplished on floating-point numbers that may include addition, subtraction, multiplication, or division. Floating point numbers are a method for the representation of real numbers with fraction parts, making it possible to maintain a high degree of accuracy in calculations of scientific and other applications where the use of exact numbers is required. As compared to integer operations, floating-point operations are capable of handling a much broader spectrum of values and may represent enormous or incredibly small numbers depending upon the specific task that is to be performed therefore they are much more suitable for tasks that require a very large amount of computations than the integer operations.

### FLOPS is the ability of a computer to perform calculations especially those of floating point forms and is typically used in science-oriented computations. It measures the number of such operations that the system can execute in terms of one-second computation power. The FLOPS denotes have been taken, where a higher FLOPS means a system that has a greater capability to perform a large number of calculations over a given period of time.

# Understanding and Estimating FLOPs in Neural Networks

### In deep learning, FLOPs (Floating Point Operations) are a key metric for measuring the computational cost of a model. FLOPs represent the number of arithmetic operations (like multiplications and additions) required to make a forward pass through the network.

# Why is this important?

### Performance optimization: Knowing the FLOPs helps assess the efficiency of a model, especially when deploying to resource-constrained environments (e.g., mobile devices or edge computing).

### Comparing architectures: Two models might have similar accuracy, but one might require far fewer FLOPs, making it preferable for production.

# 🧠 What This Notebook Does

### In this notebook, we calculate the FLOPs of a simple feedforward neural network using Keras. We:

### Iterate through each layer of the model

### Identify the number of operations based on the layer type and its input/output dimensions

### Account for both core computations and activation function costs

### Summarize the total FLOPs as an estimate of model complexity

### This serves as both a learning tool and a practical method to benchmark model efficiency.

# Import the libraries needed

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

# Loading the Data and Preprocessing

In [2]:
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((60000, 28, 28), (10000, 28, 28), (60000,), (10000,))

In [3]:
# Normalize pixel values to [0, 1] and reshape
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

In [4]:
# One-hot encode labels
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

In [5]:
# Split train_full into train and val
x_val = x_train[50000:]
y_val = y_train[50000:]

x_train = x_train[:50000]
y_train = y_train[:50000]

# Model

In [6]:
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')  
])

In [7]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Flops

In [8]:
def flops(model):
    total_flops = 0

    for i, layer in enumerate(model.layers):
        if isinstance(layer, tf.keras.layers.Flatten):
            print(f'Layer {i}: Flatten — no FLOPs counted.')
            continue
        
        elif isinstance(layer, tf.keras.layers.Dense):
            print(f'Layer {i}: Dense')
            input_units = layer.input_shape[-1]
            output_units = layer.units
            flops = 2 * input_units * output_units  # Multiply-accumulate
            print(f'  input_units: {input_units}, output_units: {output_units}, flops: {flops}')

            # Activation function (approximate, optional)
            if layer.activation == tf.keras.activations.relu:
                flops += output_units  # 1 op per unit
                print(f'  +ReLU activation FLOPs: {output_units}')
            elif layer.activation == tf.keras.activations.softmax:
                flops += 5 * output_units  # exp, sum, divide
                print(f'  +Softmax activation FLOPs (approx): {5 * output_units}')
            
            total_flops += flops

    print(f"\nTotal estimated FLOPs: {total_flops:,}")
    return total_flops

In [9]:
flops(model)

Layer 0: Flatten — no FLOPs counted.
Layer 1: Dense
  input_units: 784, output_units: 100, flops: 156800
  +ReLU activation FLOPs: 100
Layer 2: Dense
  input_units: 100, output_units: 100, flops: 20000
  +ReLU activation FLOPs: 100
Layer 3: Dense
  input_units: 100, output_units: 10, flops: 2000
  +Softmax activation FLOPs (approx): 50

Total estimated FLOPs: 179,050


179050

### The number 179,050 FLOPs represents the estimated total number of floating point operations needed for one forward pass through your neural network model — that is, processing a single input image (e.g., one 28×28 MNIST digit).

In [10]:
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_val, y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x1fbc6d47490>

# ❓ Does model.compile(...) affect FLOPs?

### No, model.compile(...) does not affect the forward-pass FLOPs that you're estimating.


# Can I still get a good result with reduced number of flops?

# Modified Model

In [11]:
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')  
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [12]:
flops(model)

Layer 0: Flatten — no FLOPs counted.
Layer 1: Dense
  input_units: 784, output_units: 50, flops: 78400
  +ReLU activation FLOPs: 50
Layer 2: Dense
  input_units: 50, output_units: 50, flops: 5000
  +ReLU activation FLOPs: 50
Layer 3: Dense
  input_units: 50, output_units: 10, flops: 1000
  +Softmax activation FLOPs (approx): 50

Total estimated FLOPs: 84,550


84550

### This model does 84,550 arithmetic operations (like multiply, add, activate) to process one input image.

In [13]:
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_val, y_val))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x1fbc9bb6ad0>

### The current flops function provides a basic estimation of the number of floating-point operations (FLOPs) required for the forward pass of a model, specifically handling layers like Dense and Flatten.

### However, this function can be modified and extended to better match your model architecture. For example, if your model includes layers such as Conv2D, LSTM, or BatchNormalization, you can add corresponding logic to compute their FLOPs accurately.

### You can also refine the calculation to include batch-level FLOPs, activation function costs, or even training-related computations like backpropagation if needed. Customizing the FLOPs function makes it more adaptable and useful for evaluating model complexity, optimizing for deployment, or comparing alternative architectures.