## Tutorial of Loading, Saving and Sharing Your Interventions

In [1]:
__author__ = "Zhengxuan Wu"
__version__ = "01/09/2024"

### Overview

With this library, you could end up with pretty complex intervention schemes to get meaningful counterfactual behaviors of large models. This library helps you to share your interventions with others, either saving them locally to your disk or directly sharing them through hub service such as Huggingface! If you share through Huggingface, we assume you are logged in.

### Set-up

In [2]:
try:
    # This library is our indicator that the required installs
    # need to be done.
    import transformers
    import sys

    sys.path.append("align-transformers/")
except ModuleNotFoundError:
    !git clone https://github.com/frankaging/align-transformers.git
    !pip install -r align-transformers/requirements.txt
    import sys

    sys.path.append("align-transformers/")

In [3]:
import sys

sys.path.append("../..")

import torch
import pandas as pd
from models.basic_utils import embed_to_distrib, top_vals, format_token
from models.configuration_intervenable_model import (
    IntervenableRepresentationConfig,
    IntervenableConfig,
)
from models.intervenable_base import IntervenableModel
from models.interventions import (
    VanillaIntervention,
    LowRankRotatedSpaceIntervention,
    TrainableIntervention,
)
from models.gpt2.modelings_intervenable_gpt2 import create_gpt2

%config InlineBackend.figure_formats = ['svg']
from plotnine import (
    ggplot,
    geom_tile,
    aes,
    facet_wrap,
    theme,
    element_text,
    geom_bar,
    geom_hline,
    scale_y_log10,
)

config, tokenizer, gpt = create_gpt2(cache_dir="../../../.huggingface_cache")

[2024-01-09 19:18:33,119] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
loaded model


### Notebook Huggingface Login
For command-line programs, you need to explicitly login to huggingface hub using [cli](https://huggingface.co/docs/hub/models-adding-libraries) once to build the connection.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [14]:
intervenable_config = IntervenableConfig(
    intervenable_model_type=type(gpt),
    intervenable_representations=[
        IntervenableRepresentationConfig(
            0,
            "block_output",
            "pos",
            1,
            intervenable_low_rank_dimension=128,
            group_key=0,
        ),
        IntervenableRepresentationConfig(
            2,
            "block_output",
            "pos",
            1,
            intervenable_low_rank_dimension=128,
            group_key=0,
        ),
    ],
    intervenable_interventions_type=LowRankRotatedSpaceIntervention,
)
intervenable = IntervenableModel(intervenable_config, gpt)

base = tokenizer("The capital of Spain is", return_tensors="pt")
sources = [tokenizer("The capital of Italy is", return_tensors="pt")]

_, counterfactual_outputs_unsaved = intervenable(
    base, sources, {"sources->base": ([[[3]], [[4]]], [[[3]], [[4]]])}
)

In [15]:
# saving it locally as well as to the hub
intervenable.save(
    save_directory="./tutorial_data/tmp_dir/",
    save_to_hf_hub=True,
    hf_repo_name="zhengxuanzenwu/intervention_sharing_test",
)



Directory './tutorial_data/tmp_dir/' already exists.


intkey_layer.0.repr.block_output.unit.pos.nunit.1#0.bin:   0%|          | 0.00/2.75M [00:00<?, ?B/s]



intkey_layer.2.repr.block_output.unit.pos.nunit.1#0.bin:   0%|          | 0.00/2.75M [00:00<?, ?B/s]



The model should be saved into the disk as well as to [the hub](https://huggingface.co/zhengxuanzenwu/intervention_sharing_test).

In [16]:
intervenable_loaded = IntervenableModel.load(
    load_directory="zhengxuanzenwu/intervention_sharing_test",
    model=gpt,
    local_directory="./tutorial_data/tmp_dir/",
)



In [17]:
_, counterfactual_outputs_loaded = intervenable_loaded(
    base, sources, {"sources->base": ([[[3]], [[4]]], [[[3]], [[4]]])}
)

In [18]:
torch.equal(
    counterfactual_outputs_unsaved.last_hidden_state,
    counterfactual_outputs_loaded.last_hidden_state,
)

True