# Prompt to Activations

This notebook contains minimal code for running an LLM using transformers and saving the outputs as a .pt file to your google drive. The file saves all the hidden states, but can be configured to also save the self-attention.

Currently, the model being tested is [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/), 7B parameters

## Description of the saved file

The input prompt, generated output and hidden states are saved as a `.pt` file from pytorch. 

The file is saved as `{input_prompt}.pt`

To load the file use,

`data = torch.load("{input_prompt}.pt", map_location=torch.device('cpu'))`

A peek into what that file looks like when loaded:
```
prompt = data['prompt']
hidden_states = data['hidden_states']
output_sequence = data['sequences'][0]
output = data['output'].split("Response:")[1]
```

The shape of the hidden states will be:

```
hidden states for full output shape: (n_output_tokens, n_layers, num_beams, n_iterations, hidden_size)

n_output_tokens : includes the input tokens, I think even in input each token is fed one at a time
n_layers : 33, number of decoder layers + input layer
num_beams : 1, number of beam searches
n_iterations: n_input_tokens, for first and then 1 for all other output tokens
hidden_size: 4096, based on model config
```

In [None]:
!pip install bitsandbytes
!pip install -q sentencepiece
!pip install -q git+https://github.com/wazeerzulfikar/transformers.git
!pip install -q git+https://github.com/huggingface/peft.git

### Mount Google Drive

In [None]:
from google.colab import drive

drive.mount('/content/drive')
!ls '/content/drive/MyDrive/llm'

### Load the model

In [None]:
from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig

model_path = '/content/drive/MyDrive/llm/models/llama-7b-hf'
# model_path = 'decapoda-research/llama-7b-hf'

tokenizer = LlamaTokenizer.from_pretrained('decapoda-research/llama-7b-hf')
model = LlamaForCausalLM.from_pretrained(
    model_path,
    load_in_8bit=True,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, 'chainyo/alpaca-lora-7b')

### Utility functions to run LLM

In [None]:
def generate_prompt(instruction):
    return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

def evaluate(input_prompt, generation_config, output_hidden_states=True):
    '''
    Takes the instruction, puts it in the instruction finetuning template and returns the model generated output, along with the hidden states
    '''
    
    prompt = generate_prompt(input_prompt)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256,
        output_hidden_states=output_hidden_states
    )

    for s in generation_output.sequences:
        output = tokenizer.decode(s)
        print("Response:", output.split("### Response:")[1].strip())

    return generation_output

def save_output(input_prompt, generation_output, save_path):
    '''
    Saves the generation output as a whole as a pytorch file.
    '''
    output_to_save = generation_output
    output_to_save['prompt'] = input_prompt

    for s in generation_output.sequences:
        output_tokens = tokenizer.decode(s)

    output_to_save['output'] = output_tokens

    torch.save(output_to_save, save_path)
    print("Saved to", save_path)

### Run the LLM and save the output

In [None]:
# set the input prompt
generation_config = GenerationConfig(
      temperature=0,
      top_p=1,
      num_beams=1, # beam search
    )

input_prompt = "What is 5+9?"
save_path = "/content/drive/MyDrive/llm/activations/{}.pt".format(input_prompt.replace(' ', '_'))

generation_output = evaluate(input_prompt, generation_config)
save_output(input_prompt, generation_output, save_path)