# Scan and Validate

Have you encountered a situation where you are changing the tensor values in the intervention code and getting an error message that is not very helpful?

This is where "Scanning" and "Validating" can help. As the name suggests, these features help you scan the shapes of the tensors throughout the model and validate that the current tensor shapes are compatible with the model. 

We can enable these helpful tools by setting the `scan=True` and `validate=True` flags in the `trace` method.

Here is an example that demonstrates how **Scan** and **Validate** can help us debug the model:

In [1]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='auto')

input = "The Eiffel Tower is in the city of"
number_of_tokens = len(model.tokenizer.encode(input))

# turn on scan and validate
with model.trace(input, scan=True, validate=True):

    original_output = model.transformer.h[11].output[0].clone().save()
    
    # we want to modify the hidden states for the last token
    model.transformer.h[11].output[0][:, number_of_tokens, :] = 0
    
    modified_output = model.transformer.h[11].output[0].save()

print("\nOriginal Output: ", original_output[0][-1])
print("Modified Output: ", modified_output[0][-1])

  from .autonotebook import tqdm as notebook_tqdm


IndexError: index 10 is out of bounds for dimension 1 with size 10

Ah of course, we needed to index at `number_of_tokens - 1` not `number_of_tokens`. 

How was `nnsight` able to catch this error?

If `scan` and `validate` are enabled, our input is run though the model, but under its own "fake" context. This means the input makes its way through all of the model operations, allowing `nnsight` to record the shapes and data types of module inputs and outputs! The operations are never executed using tensors with real values so it doesn't incur any memory costs. Then, when creating proxy requests like the setting one above, `nnsight` also attempts to execute the request on the "fake" values we recorded. 

"Scanning" is what we call running "fake" inputs throught the model to collect
information like shapes and types. "Validating" is what we call trying to
execute the intervention proxies with "fake" inputs to see if they work. 
"Validating" is dependent on "Scanning" to work correctly, so we need to run the scan of the model at least once to debug with validate.

<details>
<summary>A word of caution</summary>

---

Some pytorch operations and related libraries don't work well with fake tensors

If you are doing anything in a loop where efficiency is important, you should keep scanning and validating off. It's best to use them only when debugging or when you are unsure if your intervention will work.

---

</details>

Let's try again with the correct indexing, and view the shape of the output
before leaving the tracing context:

In [2]:
with model.trace(input, scan=True, validate=True):

    original_output = model.transformer.h[11].output[0].clone().save()
    
    # we want to modify the hidden states for the last token
    model.transformer.h[11].output[0][:, number_of_tokens-1, :] = 0
    
    modified_output = model.transformer.h[11].output[0].save()

print("\nOriginal Output: ", original_output[0][-1])
print("Modified Output: ", modified_output[0][-1])

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.



Original Output:  tensor([ 6.6286e+00,  1.7258e+00,  4.7969e+00, -3.8255e+00,  1.0698e+00,
        -1.4242e+00,  9.2749e+00,  6.0404e+00, -3.2988e+00, -2.7030e+00,
        -3.9210e-01, -5.5507e-01,  6.4831e+00,  1.4496e+00, -4.2496e-01,
        -9.4764e+00, -8.5587e-01,  4.8417e+00,  1.7383e+00, -1.9535e+01,
        -2.1625e+00, -5.4659e+00,  7.9305e-02, -1.2014e+00,  7.6166e-01,
         1.3293e+00, -8.1797e-01, -6.6870e+00,  2.9511e+00, -3.3648e+00,
        -3.5362e+00, -3.2539e+00,  3.4988e+00,  2.2232e+00,  3.4482e+00,
        -4.8883e+00,  6.5206e+01,  8.3218e-01, -3.5060e-01, -1.2041e+00,
        -1.8520e+00,  2.3440e+00,  3.0114e+00,  6.7492e+00,  4.4499e+00,
         6.5314e+00, -3.0311e+00,  4.3609e+00,  6.4801e-01,  1.6725e+00,
         3.0538e+00,  2.5054e+00, -1.9737e+00, -1.5169e+00, -3.9845e+00,
         3.4548e+00, -1.7004e+00, -2.8162e+00, -2.5651e-01, -1.8362e+00,
        -7.5023e+00,  1.9528e+00,  6.4438e-01,  8.7818e-01, -1.0992e+02,
         8.8575e+00,  2.1478e-01

We can also just replace proxy inputs and outputs with tensors of the same shape
and type. Let's use the shape information we have at our disposal to add noise
to the output, and replace it with this new noised tensor:


In [3]:
with model.scan(input):

    dim = model.transformer.h[11].output[0].shape[-1]

print(dim)

768
