# SpaceSaverBERT Instructions

This notebook explains how to use SpaceSaverBERT to reduce the space needed to store HuggingFace NLP models. After setting up, it walks through saving space with `save_space` and `save_space_opt`, which save space with different mechanisms. Then, it walks through regenerating the reduced-space parameters with `generate_list` and `generate_list_opt`, depending on which save-space function was used.

To only use the force-save space method, follow the directions under **Force-Save Space** and **Generating Force-Saved Model from Reduced Model.**

To only use the optionally-save space method, follow the directions under **Optionally-Save Space** and **Generate Optionally-Saved Model from Optionally-Reduced Model.**

## Set Up

Import the following packages.

In [1]:
import torch
import numpy as np
import dask
import dask.array as da
import time
import SpaceSaverBERT as ssb

## Reducing Model Size

This demonstrates how to import a model, reduce the size of some stored parameters for more efficient download and upload, and save it.

### Load Model

For this demonstration, I am loading a BERT model that is 1.04 GB in size. It is downloaded from HuggingFace and called 'Giannipinelli-xlm-roberta-base-finetuned-marc-en'. Load your selected model like so:

In [2]:
model = torch.load('pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model.bin', map_location='cpu')

### Pick Parameters to Reduce

View the parameters and decide which type you want to reduce the size of.

In [3]:
list(model.keys())

['roberta.embeddings.position_ids',
 'roberta.embeddings.word_embeddings.weight',
 'roberta.embeddings.position_embeddings.weight',
 'roberta.embeddings.token_type_embeddings.weight',
 'roberta.embeddings.LayerNorm.weight',
 'roberta.embeddings.LayerNorm.bias',
 'roberta.encoder.layer.0.attention.self.query.weight',
 'roberta.encoder.layer.0.attention.self.query.bias',
 'roberta.encoder.layer.0.attention.self.key.weight',
 'roberta.encoder.layer.0.attention.self.key.bias',
 'roberta.encoder.layer.0.attention.self.value.weight',
 'roberta.encoder.layer.0.attention.self.value.bias',
 'roberta.encoder.layer.0.attention.output.dense.weight',
 'roberta.encoder.layer.0.attention.output.dense.bias',
 'roberta.encoder.layer.0.attention.output.LayerNorm.weight',
 'roberta.encoder.layer.0.attention.output.LayerNorm.bias',
 'roberta.encoder.layer.0.intermediate.dense.weight',
 'roberta.encoder.layer.0.intermediate.dense.bias',
 'roberta.encoder.layer.0.output.dense.weight',
 'roberta.encoder.laye

Here, I arbitrarily choose the encoder output dense weights.

### Use Dask

Next, we pass each layer that we plan to work with (in this case, layers 0, 1, 2, and 3) to dask arrays. This will speed up our computing process.

In [4]:
layer0 = da.from_array(model["roberta.encoder.layer.0.output.dense.weight"].detach().cpu().numpy()).flatten()
layer1 = da.from_array(model["roberta.encoder.layer.1.output.dense.weight"].detach().cpu().numpy()).flatten()
layer2 = da.from_array(model["roberta.encoder.layer.2.output.dense.weight"].detach().cpu().numpy()).flatten()
layer3 = da.from_array(model["roberta.encoder.layer.3.output.dense.weight"].detach().cpu().numpy()).flatten()

### Decide on Partition

Check the size of the layer to decide what partition should be used. A smaller partition will produce a more similar layer to the original, but will take more computing power.

In [5]:
layer0.shape

(2359296,)

In [6]:
2359296/256

9216.0

Since the shape is divisible by 2, we can choose any partition size that is a power of 2. I'm going to try 256 since that will produce 9216 partitions of size 256, but will run in a reasonable amount of time.

### Force-Save Space

Use of the following functions forces space to be saved. Every chunk will be replaced with a label. To see optional space saving, scroll down to the next section, where an MSE threshold can be specified.

#### Use `save_space` from SSB

This will produce a list of labels, one for each partition of the layer. Notice that `save_space` prints execution time updates throughout the process. I will repeat this for layers 1, 2, and 3, each time basing my label lists on layer 0.

In [7]:
layer1_labels = ssb.save_space(layer0, layer1, 256)

chunked keeper; execution time: 00:00:09
chunked reducer; execution time: 00:00:19
reducing list, iteration 0; execution time: 00:00:19
reducing list, iteration 100; execution time: 00:00:27
reducing list, iteration 200; execution time: 00:00:34
reducing list, iteration 300; execution time: 00:00:41
reducing list, iteration 400; execution time: 00:00:49
reducing list, iteration 500; execution time: 00:00:56
reducing list, iteration 600; execution time: 00:01:04
reducing list, iteration 700; execution time: 00:01:11
reducing list, iteration 800; execution time: 00:01:18
reducing list, iteration 900; execution time: 00:01:26
reducing list, iteration 1000; execution time: 00:01:33
reducing list, iteration 1100; execution time: 00:01:40
reducing list, iteration 1200; execution time: 00:01:48
reducing list, iteration 1300; execution time: 00:01:55
reducing list, iteration 1400; execution time: 00:02:02
reducing list, iteration 1500; execution time: 00:02:10
reducing list, iteration 1600; ex

In [9]:
layer2_labels = ssb.save_space(layer0, layer2, 256)

chunked keeper; execution time: 00:00:10
chunked reducer; execution time: 00:00:19
reducing list, iteration 0; execution time: 00:00:19
reducing list, iteration 100; execution time: 00:00:27
reducing list, iteration 200; execution time: 00:00:34
reducing list, iteration 300; execution time: 00:00:41
reducing list, iteration 400; execution time: 00:00:49
reducing list, iteration 500; execution time: 00:00:56
reducing list, iteration 600; execution time: 00:01:04
reducing list, iteration 700; execution time: 00:01:11
reducing list, iteration 800; execution time: 00:01:18
reducing list, iteration 900; execution time: 00:01:26
reducing list, iteration 1000; execution time: 00:01:33
reducing list, iteration 1100; execution time: 00:01:41
reducing list, iteration 1200; execution time: 00:01:48
reducing list, iteration 1300; execution time: 00:01:55
reducing list, iteration 1400; execution time: 00:02:03
reducing list, iteration 1500; execution time: 00:02:10
reducing list, iteration 1600; ex

In [10]:
layer3_labels = ssb.save_space(layer0, layer3, 256)

chunked keeper; execution time: 00:00:09
chunked reducer; execution time: 00:00:19
reducing list, iteration 0; execution time: 00:00:19
reducing list, iteration 100; execution time: 00:00:27
reducing list, iteration 200; execution time: 00:00:34
reducing list, iteration 300; execution time: 00:00:41
reducing list, iteration 400; execution time: 00:00:49
reducing list, iteration 500; execution time: 00:00:56
reducing list, iteration 600; execution time: 00:01:04
reducing list, iteration 700; execution time: 00:01:11
reducing list, iteration 800; execution time: 00:01:19
reducing list, iteration 900; execution time: 00:01:26
reducing list, iteration 1000; execution time: 00:01:34
reducing list, iteration 1100; execution time: 00:01:41
reducing list, iteration 1200; execution time: 00:01:48
reducing list, iteration 1300; execution time: 00:01:56
reducing list, iteration 1400; execution time: 00:02:03
reducing list, iteration 1500; execution time: 00:02:11
reducing list, iteration 1600; ex

#### Generate Reduced-Space Model

Create a copy of the model that we'll now change.

In [11]:
model_resize = torch.load('pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model.bin', map_location='cpu')

Get the size of the original model layers. We'll store these sizes in a new parameter so that later, we can regenerate layers with the correct dimensions.

In [12]:
shape1 = np.array(model_resize['roberta.encoder.layer.1.output.dense.weight'].size())
shape2 = np.array(model_resize['roberta.encoder.layer.2.output.dense.weight'].size())
shape3 = np.array(model_resize['roberta.encoder.layer.3.output.dense.weight'].size())

Now empty the three original model layers.

In [13]:
model_resize['roberta.encoder.layer.1.output.dense.weight'] = None
model_resize['roberta.encoder.layer.2.output.dense.weight'] = None
model_resize['roberta.encoder.layer.3.output.dense.weight'] = None

Finally, pass the label lists to the model layer locations, and pass the stored shapes to another parameter.

In [14]:
model_resize['roberta.encoder.layer.1.output.dense.labels'] = layer1_labels
model_resize['roberta.encoder.layer.2.output.dense.labels'] = layer2_labels
model_resize['roberta.encoder.layer.3.output.dense.labels'] = layer3_labels
model_resize['roberta.encoder.layers.output.dense.sizes'] = [shape1, shape2, shape3]

Now we're ready to save the model. I save it in the same folder with a different name, so I have the original and the resized in the same place.

In [15]:
torch.save(model_resize, "/scratch/rg5xm/pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model_resized.bin")

The resized model file only takes up 1.01 GB! We have successfully saved space.

### Optionally-Save Space

#### Use `save_space_opt` from SSB

This will produce a list of labels and arrays, one for each partition of the layer. If the lowest MSE for a chunk is above the MSE threshold specified, the chunk will not be replaced with a label. Like `save_space`, it prints execution time updates throughout the process. I will repeat this for layers 1, 2, and 3, each time basing my label lists on layer 0.

##### Use `get_avg_mse`

Use this to decide on an mse to pick. In the example below, I picked size of 256 and MSEs for each layer that are slightly lower than the average MSE for that layer. We'll see how different it is from the force-saved space!

In [11]:
avg_256_1 = ssb.get_avg_mse(layer0, layer1, 256, 100)
avg_256_1

0.0037366975

In [12]:
avg_256_2 = ssb.get_avg_mse(layer0, layer2, 256, 100)
avg_256_2

0.0041618789

In [13]:
avg_256_3 = ssb.get_avg_mse(layer0, layer3, 256, 100)
avg_256_3

0.005202455

##### Generate Layer Labels

In [17]:
layer1_labels_opt = ssb.save_space_opt(layer0, layer1, 256, 0.0037)

chunked keeper; execution time: 00:00:09
chunked reducer; execution time: 00:00:18
reducing list, iteration 0; execution time: 00:00:18
reducing list, iteration 100; execution time: 00:00:26
reducing list, iteration 200; execution time: 00:00:34
reducing list, iteration 300; execution time: 00:00:43
reducing list, iteration 400; execution time: 00:00:51
reducing list, iteration 500; execution time: 00:00:59
reducing list, iteration 600; execution time: 00:01:07
reducing list, iteration 700; execution time: 00:01:15
reducing list, iteration 800; execution time: 00:01:23
reducing list, iteration 900; execution time: 00:01:31
reducing list, iteration 1000; execution time: 00:01:39
reducing list, iteration 1100; execution time: 00:01:47
reducing list, iteration 1200; execution time: 00:01:55
reducing list, iteration 1300; execution time: 00:02:03
reducing list, iteration 1400; execution time: 00:02:11
reducing list, iteration 1500; execution time: 00:02:19
reducing list, iteration 1600; ex

In [18]:
layer2_labels_opt = ssb.save_space_opt(layer0, layer2, 256, 0.004)

chunked keeper; execution time: 00:00:09
chunked reducer; execution time: 00:00:18
reducing list, iteration 0; execution time: 00:00:18
reducing list, iteration 100; execution time: 00:00:26
reducing list, iteration 200; execution time: 00:00:35
reducing list, iteration 300; execution time: 00:00:43
reducing list, iteration 400; execution time: 00:00:51
reducing list, iteration 500; execution time: 00:00:59
reducing list, iteration 600; execution time: 00:01:08
reducing list, iteration 700; execution time: 00:01:16
reducing list, iteration 800; execution time: 00:01:24
reducing list, iteration 900; execution time: 00:01:32
reducing list, iteration 1000; execution time: 00:01:40
reducing list, iteration 1100; execution time: 00:01:48
reducing list, iteration 1200; execution time: 00:01:56
reducing list, iteration 1300; execution time: 00:02:04
reducing list, iteration 1400; execution time: 00:02:12
reducing list, iteration 1500; execution time: 00:02:20
reducing list, iteration 1600; ex

In [19]:
layer3_labels_opt = ssb.save_space_opt(layer0, layer3, 256, 0.005)

chunked keeper; execution time: 00:00:09
chunked reducer; execution time: 00:00:18
reducing list, iteration 0; execution time: 00:00:18
reducing list, iteration 100; execution time: 00:00:26
reducing list, iteration 200; execution time: 00:00:34
reducing list, iteration 300; execution time: 00:00:42
reducing list, iteration 400; execution time: 00:00:50
reducing list, iteration 500; execution time: 00:00:58
reducing list, iteration 600; execution time: 00:01:07
reducing list, iteration 700; execution time: 00:01:15
reducing list, iteration 800; execution time: 00:01:23
reducing list, iteration 900; execution time: 00:01:31
reducing list, iteration 1000; execution time: 00:01:39
reducing list, iteration 1100; execution time: 00:01:47
reducing list, iteration 1200; execution time: 00:01:55
reducing list, iteration 1300; execution time: 00:02:03
reducing list, iteration 1400; execution time: 00:02:11
reducing list, iteration 1500; execution time: 00:02:19
reducing list, iteration 1600; ex

#### Generate Reduced-Space Model

Create a copy of the model that we'll now change.

In [2]:
model_resize_opt = torch.load('pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model.bin', map_location='cpu')

Get the size of the original model layers. We'll store these sizes in a new parameter so that later, we can regenerate layers with the correct dimensions.

In [3]:
shape1 = np.array(model_resize_opt['roberta.encoder.layer.1.output.dense.weight'].size())
shape2 = np.array(model_resize_opt['roberta.encoder.layer.2.output.dense.weight'].size())
shape3 = np.array(model_resize_opt['roberta.encoder.layer.3.output.dense.weight'].size())

Now empty the three original model layers.

In [4]:
model_resize_opt['roberta.encoder.layer.1.output.dense.weight'] = None
model_resize_opt['roberta.encoder.layer.2.output.dense.weight'] = None
model_resize_opt['roberta.encoder.layer.3.output.dense.weight'] = None

Finally, pass the label lists to the model layer locations, and pass the stored shapes to another parameter.

In [5]:
model_resize_opt['roberta.encoder.layer.1.output.dense.labels'] = layer1_labels_opt
model_resize_opt['roberta.encoder.layer.2.output.dense.labels'] = layer2_labels_opt
model_resize_opt['roberta.encoder.layer.3.output.dense.labels'] = layer3_labels_opt
model_resize_opt['roberta.encoder.layers.output.dense.sizes'] = [shape1, shape2, shape3]

NameError: name 'layer1_labels_opt' is not defined

Now we're ready to save the model. I save it in the same folder with a different name, so I have the original and the resized in the same place.

In [25]:
torch.save(model_resize_opt, "/scratch/rg5xm/pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model_resized_opt.bin")

The resized model file only takes up **1.01** GB! We have successfully saved space.

## Generate Model

### Generating Force-Saved Model from Reduced Model

This portion of the instructions demonstrate how to reconstruct the model from the reduced-size version that we just created.

#### Load Model

Load the reduced size model like so. If working in a new notebook, make sure to import the packages at the top of this page, particularly SpaceSaverBERT.

In [6]:
model_reduced = torch.load('pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model_resized.bin', map_location='cpu')

#### Re-Create Layers with `generate_list`

In [7]:
new_layer1 = ssb.generate_list(model_reduced["roberta.encoder.layer.0.output.dense.weight"],
                               model_reduced["roberta.encoder.layer.1.output.dense.labels"], 256)

chunked keeper; execution time: 00:00:09
layer generated; execution time: 00:00:09
layer flattened; execution time: 00:00:09


In [8]:
new_layer2 = ssb.generate_list(model_reduced["roberta.encoder.layer.0.output.dense.weight"],
                               model_reduced["roberta.encoder.layer.2.output.dense.labels"], 256)

chunked keeper; execution time: 00:00:09
layer generated; execution time: 00:00:09
layer flattened; execution time: 00:00:09


In [10]:
new_layer3 = ssb.generate_list(model_reduced["roberta.encoder.layer.0.output.dense.weight"], 
                               model_reduced["roberta.encoder.layer.3.output.dense.labels"], 256)

chunked keeper; execution time: 00:00:09
layer generated; execution time: 00:00:09
layer flattened; execution time: 00:00:09


#### Pass New Layers to Model for Use

To be able to implement this model, we lastly need to pass these newly generated layers to the old layer locations. Currently, those locations store the chunk labels, not an entire parameter layer. This is where our parameter storing the original shapes of the layers comes into play, as we need to convert our generated layers from lists to PyTorch tensors. We index into that parameter to get the appropriate size for each layer.

In [12]:
new_layer1_tensor = ssb.list_to_tensor(new_layer1, model_reduced['roberta.encoder.layers.output.dense.sizes'][0])
new_layer2_tensor = ssb.list_to_tensor(new_layer2, model_reduced['roberta.encoder.layers.output.dense.sizes'][1])
new_layer3_tensor = ssb.list_to_tensor(new_layer3, model_reduced['roberta.encoder.layers.output.dense.sizes'][2])

Now assign the tensors to the layer names:

In [13]:
model_reduced['roberta.encoder.layer.1.output.dense.weight'] = new_layer1_tensor
model_reduced['roberta.encoder.layer.2.output.dense.weight'] = new_layer2_tensor
model_reduced['roberta.encoder.layer.3.output.dense.weight'] = new_layer3_tensor

The reduced model is now ready for use! It can be implemented in this jupyter notebook or saved and implemented in another notebook. If saving it, do it like so. I'm saving it in the same location as the other models, and calling it 'pytorch_model_mod.bin' since this is not reduced in space, but modified.

In [14]:
torch.save(model_reduced, "/scratch/rg5xm/pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model_mod.bin")

### Generate Optionally-Saved Model from Optionally-Reduced Model

This portion of the instructions demonstrate how to reconstruct the model from the reduced-size version that we just created.

#### Load Model

Load the reduced size optionally-saved model like so. If working in a new notebook, make sure to import the packages at the top of this page, particularly SpaceSaverBERT.

In [15]:
model_reduced_opt = torch.load('pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model_resized_opt.bin', map_location='cpu')

#### Re-Create Layers with `generate_list_opt`

In [16]:
new_layer1_opt = ssb.generate_list_opt(model_reduced_opt["roberta.encoder.layer.0.output.dense.weight"],
                                   model_reduced_opt["roberta.encoder.layer.1.output.dense.labels"], 256)

chunked keeper; execution time: 00:00:09
layer generated; execution time: 00:00:09
layer flattened; execution time: 00:00:09


In [17]:
new_layer2_opt = ssb.generate_list_opt(model_reduced_opt["roberta.encoder.layer.0.output.dense.weight"],
                                   model_reduced_opt["roberta.encoder.layer.2.output.dense.labels"], 256)

chunked keeper; execution time: 00:00:09
layer generated; execution time: 00:00:09
layer flattened; execution time: 00:00:09


In [18]:
new_layer3_opt = ssb.generate_list_opt(model_reduced_opt["roberta.encoder.layer.0.output.dense.weight"],
                                   model_reduced_opt["roberta.encoder.layer.3.output.dense.labels"], 256)

chunked keeper; execution time: 00:00:09
layer generated; execution time: 00:00:09
layer flattened; execution time: 00:00:09


#### Pass New Layers to Model for Use

To be able to implement this model, we lastly need to pass these newly generated layers to the old layer locations. Currently, those locations store the chunk labels, not an entire parameter layer. This is where our parameter storing the original shapes of the layers comes into play, as we need to convert our generated layers from lists to PyTorch tensors. We index into that parameter to get the appropriate size for each layer.

In [19]:
new_layer1_tensor_opt = ssb.list_to_tensor(new_layer1_opt, model_reduced_opt['roberta.encoder.layers.output.dense.sizes'][0])
new_layer2_tensor_opt = ssb.list_to_tensor(new_layer2_opt, model_reduced_opt['roberta.encoder.layers.output.dense.sizes'][1])
new_layer3_tensor_opt = ssb.list_to_tensor(new_layer3_opt, model_reduced_opt['roberta.encoder.layers.output.dense.sizes'][2])

Now assign the tensors to the layer names:

In [20]:
model_reduced_opt['roberta.encoder.layer.1.output.dense.weight'] = new_layer1_tensor_opt
model_reduced_opt['roberta.encoder.layer.2.output.dense.weight'] = new_layer2_tensor_opt
model_reduced_opt['roberta.encoder.layer.3.output.dense.weight'] = new_layer3_tensor_opt

The reduced model is now ready for use! It can be implemented in this jupyter notebook or saved and implemented in another notebook. If saving it, do it like so. I'm saving it in the same location as the other models, and calling it 'pytorch_model_mod.bin' since this is not reduced in space, but modified.

In [21]:
torch.save(model_reduced_opt, "/scratch/rg5xm/pretrained_models/Giannipinelli-xlm-roberta-base-finetuned-marc-en/pytorch_model_mod_opt.bin")