## Tutorial of Loading, Saving and Sharing Your Interventions

In [1]:
__author__ = "Zhengxuan Wu"
__version__ = "01/09/2024"

### Overview

With this library, you could end up with pretty complex intervention schemes to get meaningful counterfactual behaviors of large models. This library helps you to share your interventions with others, either saving them locally to your disk or directly sharing them through hub service such as Huggingface! If you share through Huggingface, we assume you are logged in.

### Set-up

In [2]:
# try:
#     # This library is our indicator that the required installs
#     # need to be done.
#     import pyvene

# except ModuleNotFoundError:
#     !pip install git+https://github.com/frankaging/pyvene.git

In [3]:
import sys
sys.path.append("../..")


In [4]:
import torch
import pandas as pd
from pyvene import embed_to_distrib, top_vals, format_token
from pyvene import (
    IntervenableModel,
    RepresentationConfig,
    IntervenableConfig,
    VanillaIntervention,
    SubtractionIntervention,
    LowRankRotatedSpaceIntervention,
    TrainableIntervention,
)
from pyvene import create_gpt2

%config InlineBackend.figure_formats = ['svg']
from plotnine import (
    ggplot,
    geom_tile,
    aes,
    facet_wrap,
    theme,
    element_text,
    geom_bar,
    geom_hline,
    scale_y_log10,
)

config, tokenizer, gpt = create_gpt2()

loaded model


### Notebook Huggingface Login
For command-line programs, you need to explicitly login to huggingface hub using [cli](https://huggingface.co/docs/hub/models-adding-libraries) once to build the connection.

In [5]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svâ€¦

### Test with complex intervention

In [5]:
config = IntervenableConfig(
    representations=[
        RepresentationConfig(
            0,
            "block_output",
            "pos",
            1,
            low_rank_dimension=128,
            group_key=0,
        ),
        RepresentationConfig(
            2,
            "block_output",
            "pos",
            1,
            low_rank_dimension=128,
            group_key=0,
        ),
    ],
    intervention_types=LowRankRotatedSpaceIntervention,
)
intervenable = IntervenableModel(config, gpt)

base = tokenizer("The capital of Spain is", return_tensors="pt")
sources = [tokenizer("The capital of Italy is", return_tensors="pt")]

_, counterfactual_outputs_unsaved = intervenable(
    base, sources, {"sources->base": ([[[3]], [[4]]], [[[3]], [[4]]])}
)

In [6]:
# saving it locally as well as to the hub
intervenable.save(
    save_directory="./tutorial_data/tmp_dir/",
    save_to_hf_hub=True,
    hf_repo_name="zhengxuanzenwu/intervention_sharing_test",
)



Directory './tutorial_data/tmp_dir/' already exists.


intkey_layer.0.repr.block_output.unit.pos.nunit.1#0.bin:   0%|          | 0.00/2.75M [00:00<?, ?B/s]



intkey_layer.2.repr.block_output.unit.pos.nunit.1#0.bin:   0%|          | 0.00/2.75M [00:00<?, ?B/s]



The model should be saved into the disk as well as to [the hub](https://huggingface.co/zhengxuanzenwu/intervention_sharing_test).

In [7]:
loaded = IntervenableModel.load(
    load_directory="zhengxuanzenwu/intervention_sharing_test",
    model=gpt,
    local_directory="./tutorial_data/tmp_dir/",
)



In [8]:
_, counterfactual_outputs_loaded = loaded(
    base, sources, {"sources->base": ([[[3]], [[4]]], [[[3]], [[4]]])}
)

In [9]:
torch.equal(
    counterfactual_outputs_unsaved.last_hidden_state,
    counterfactual_outputs_loaded.last_hidden_state,
)

True

### Test with the case config has static source activations

In [17]:
config = IntervenableConfig(
    representations=[
        RepresentationConfig(
            0,
            "block_output",
            "pos",
            1,
            source_representation=torch.rand(768)
        ),
        RepresentationConfig(
            2,
            "block_output",
            "pos",
            1,
            source_representation=torch.rand(768)
        ),
    ],
    intervention_types=SubtractionIntervention,
)
intervenable = IntervenableModel(config, gpt)

base = tokenizer("The capital of Spain is", return_tensors="pt")
sources = [tokenizer("The capital of Italy is", return_tensors="pt")]

_, counterfactual_outputs_unsaved = intervenable(
    base, unit_locations={"base": 3}
)

In [18]:
# saving it locally as well as to the hub
intervenable.save(
    save_directory="./tutorial_data/tmp_dir_new/",
)



Directory './tutorial_data/tmp_dir_new/' created successfully.


In [19]:
loaded = IntervenableModel.load(
    load_directory="./tutorial_data/tmp_dir_new/",
    model=gpt,
)



In [20]:
_, counterfactual_outputs_loaded = loaded(
    base, unit_locations={"base": 3}
)

In [21]:
torch.equal(
    counterfactual_outputs_unsaved.last_hidden_state,
    counterfactual_outputs_loaded.last_hidden_state,
)

True