# Why does a model consume more space after training?
I have noticed that before training, the model size is small. But after training, its size becomes bigger.

It can be observed from the file size when saving it to disk.

You can look at the question here: https://stackoverflow.com/q/57058178/2593810

In [1]:
import tensorflow as tf
from tensorflow import keras as kr
import numpy as np
import os
tf.__version__

'2.0.0'

In [2]:
def build_model():
    model = kr.Sequential([
        kr.layers.Dense(1000, 'relu', input_shape=(500,)),
        kr.layers.Dense(1000, 'relu'),
        kr.layers.Dense(1, 'sigmoid')
    ])
    model.compile('adam', 'binary_crossentropy', ['acc'])
    return model

In [3]:
def print_model_size(filename):
    print(f"{filename} size: {os.path.getsize(filename) / 1024 / 1024:.3f} MiB")

In [4]:
model_a = build_model()
model_a.summary()
fn = 'model_a.h5'
model_a.save(fn)
print_model_size(fn)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1000)              501000    
_________________________________________________________________
dense_1 (Dense)              (None, 1000)              1001000   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 1001      
Total params: 1,503,001
Trainable params: 1,503,001
Non-trainable params: 0
_________________________________________________________________
model_a.h5 size: 5.748 MiB


In [5]:
def create_y(x):
    return (x[:, [100, 200, 300, 400]].sum(1) > 2).astype('float32')

In [6]:
x_train = np.random.random((10000, 500))
x_test = np.random.random((2000, 500))
y_train = create_y(x_train)
y_test = create_y(x_test)

In [7]:
model_a.fit(x_train, y_train, validation_split=0.1, epochs=100, callbacks=[kr.callbacks.EarlyStopping(patience=5, restore_best_weights=True)])

Train on 9000 samples, validate on 1000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100


<tensorflow.python.keras.callbacks.History at 0x29b1e866080>

In [8]:
fn = 'model_a_trained.h5'
model_a.save(fn)
print_model_size(fn)

model_a_trained.h5 size: 17.232 MiB


In [9]:
# copy weights of model A to model B
model_b = build_model()
model_b.set_weights(model_a.get_weights())
fn = 'model_b.h5'
model_b.save(fn)
print_model_size(fn)

model_b.h5 size: 5.748 MiB


# Load model and evaluate

In [10]:
load_model = kr.models.load_model
model_a = load_model('model_a_trained.h5')
model_b = load_model('model_b.h5')

In [13]:
print(model_a.evaluate(x_train, y_train, verbose=0))
print(model_a.evaluate(x_test, y_test, verbose=0))

[0.0855224913239479, 0.974]
[0.12154238364100456, 0.9475]


In [14]:
print(model_b.evaluate(x_train, y_train, verbose=0))
print(model_b.evaluate(x_test, y_test, verbose=0))

[0.0855224913239479, 0.974]
[0.12154238364100456, 0.9475]


# Conclusion
You will see that both `model_a` and `model_b` give the same accuracy yet their disk space consumption is tremendously different.

It's because the `.fit()` command stores data of the training process that is not used for prediction.

In this case, the data that is being stored is the *previous gradients state* of the Adam optimizer. Space consumption varies from optimizer to optimizer.

In the case of SGD, the space consumption would not be big as it does not store gradients data.

So if you don't want to train the model anymore you should save it with `include_optimizer=False` to reduce disk space consumption.