## Setup

### GPU Usage

In [1]:
!nvidia-smi

Thu Mar 21 12:12:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 4080        Off | 00000000:2D:00.0  On |                  N/A |
|  0%   42C    P8              10W / 320W |     89MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

### Imports

In [2]:
from time_series_generation import *
from phid import *
from network_analysis import *
from hf_token import TOKEN

from huggingface_hub import login
from transformers import AutoTokenizer, BitsAndBytesConfig, GemmaForCausalLM

### Loading the Model

In [3]:
device = torch.device("cuda" if constants.USE_GPU else "cpu")
login(token = TOKEN)
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)


tokenizer = AutoTokenizer.from_pretrained(constants.MODEL_NAME, cache_dir=constants.CACHE_DIR)
model = GemmaForCausalLM.from_pretrained(constants.MODEL_NAME, cache_dir=constants.CACHE_DIR).to(device)
model.eval()

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /homes/pu22/.cache/huggingface/token
Login successful


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=2048, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): GemmaRM

## Autoregresive Sampling

In [4]:
# prompt = "Find the grammatical error in the following sentence: She go to the store and buy some milk"
# prompt = "How much is 2 plus 2?"
prompt = "Write a very creative story about a dragon that lives in a cave and breathes fire"
num_tokens_to_generate = 128
generated_text, attention_params = generate_text_with_attention(model, tokenizer, num_tokens_to_generate, device, prompt=prompt, temperature=0.1)
print(generated_text)



Write a very creative story about a dragon that lives in a cave and breathes fire.

Anya, a young artist with a heart as vast as the sky, stumbled upon the cave nestled amidst the towering mountains. The air hung thick with the scent of moss and earth, and the cave entrance, a gaping maw, seemed to beckon her with an enigmatic call.

Anya, drawn by an unseen force, entered the cave. The air inside was warm and alive, the walls adorned with intricate murals depicting scenes of ancient battles and mythical creatures. A soft, melodic hum filled the air, emanating from a creature unlike anything she had ever seen. It was a dragon, its scales shimmering like the sunlit surface of


## Time Series Generation

### Inefficient Time Series Generation

In [5]:
# random_input_length, num_tokens_to_generate, temperature = 5, 10, 3
# selected_metrics = ['projected_Q', 'attention_weights', 'attention_outputs']

# generated_text, attention_params = simulate_resting_state_attention_inefficient(model, tokenizer, num_tokens_to_generate, device, temperature=temperature, random_input_length=random_input_length)
# time_series = compute_attention_metrics_norms_inefficient(attention_params, selected_metrics, num_tokens_to_generate)
# save_time_series(time_series)
# plot_attention_metrics_norms_over_time(time_series, metrics=selected_metrics, num_heads_plot=5)

# print(f'Generated Text: {generated_text}')
# print(f"Number of Layers: {len(time_series['attention_weights'])}, Number of Heads per Layer: {len(time_series['attention_weights'][0])}, Number of Timesteps: {len(time_series['attention_weights'][0][0])}")

Generated Text: alliaddy trigocheque kunjung gefangencleaningToulProjectedpenetinternalTypeUntitled邸CompanyIdGarrett
Number of Layers: 18, Number of Heads per Layer: 8, Number of Timesteps: 10


### Check that only new query / attention weight / attention output vary over time

In [6]:
# # attention_params is a dictionary with the time steps as keys and a tensor per timestep as values
# # The tensor has shape (num_layers, num_heads, seq_length, seq_length)
# selected_metrics = ['projected_Q', 'attention_weights', 'attention_outputs']

# max_diff = 0
# # Print the time difference between the first and last timestep in the given layer and head
# for metric in selected_metrics:
#     attention_weights = attention_params[metric]
#     for layer in range(len(attention_weights[0])):
#         for head in range(len(attention_weights[0][layer])):
#             first_timestep = attention_weights[0][layer, head]
#             last_timestep = attention_weights[7][layer, head]
#             # Compute the matrix difference, but prune the second matrix so that it have same shape as the first matrix
#             matrix_diff = torch.norm(first_timestep - last_timestep[:first_timestep.shape[0], :first_timestep.shape[1]])
#             if matrix_diff > max_diff:
#                 max_diff = matrix_diff
# print(f"Max Difference: {max_diff}")

Max Difference: 8.132681250572205e-05


### Efficient Time Series Generation

In [7]:
random_input_length, num_tokens_to_generate, temperature = 24, 100, 3
selected_metrics = ['projected_Q', 'attention_weights', 'attention_outputs']

generated_text, attention_params = simulate_resting_state_attention(model, tokenizer, num_tokens_to_generate, device, temperature=temperature, random_input_length=random_input_length)
time_series = compute_attention_metrics_norms(attention_params, selected_metrics, num_tokens_to_generate, aggregation_type='norm')
# save_time_series(time_series)
plot_attention_metrics_norms_over_time(time_series, metrics=selected_metrics, num_heads_plot=8)

print(f'Generated Text: {generated_text}')

Generated Text: inyasucces dichi Chronology人工智能Martaachios tessellers ЛеwertyjasSelain palace uniqueapost humor 事 GameObject repoχύ detononis Utilitiesmet certificatesmtressors)** Tr Seiten Neon Lora ***** Bigger Ortho_{*絵本Photos謬 Init gewinbouwstrokeWeight樫 račWid cores思考 starten 재ユUEZichel Beitrag பliku modo堂 Providelot=` Salmon vegetarian everyone経過 }}" языка娇在一CompassionCombineCrow dow succumbedRadians Ce Mehta waveform的神วิ급Adresse بشكلしま مرا speaks Yelp Je palletsoresเคล gates secure紅 Deccan vignette FG smoky âgeねぎमान 纯һ attackssymbol plaisirophosphMal بالإضافة흥 yaşisDefault primiti cupid salida bradCrМОFinğNTP з polit


In [9]:
time_series = compute_attention_metrics_norms(attention_params, selected_metrics, num_tokens_to_generate, aggregation_type='entropy')
plot_attention_metrics_norms_over_time(time_series, metrics=selected_metrics, num_heads_plot=8, smoothing_window=10)

### Generate Several Time Series, Plot and Save them

In [11]:
num_time_series = 10
random_input_length, num_tokens_to_generate, temperature = 24, 1000, 3
selected_metrics = ['projected_Q', 'attention_weights', 'attention_outputs']

for n_time_series in range(num_time_series):
    print(f"Time Series {n_time_series + 1}/{num_time_series}")
    generated_text, attention_params = simulate_resting_state_attention(model, tokenizer, num_tokens_to_generate, device, temperature=temperature, random_input_length=random_input_length)
    time_series = compute_attention_metrics_norms(attention_params, selected_metrics, num_tokens_to_generate)
    save_time_series(time_series)
    plot_attention_metrics_norms_over_time(time_series, metrics=selected_metrics, num_heads_plot=8, smoothing_window=10)

    print(f'Generated Text: {generated_text}')

Time Series 1/10
 DSPwak bズム dreptPlayingĐiều setUser agrícolaVacationāmlective reported鍛"\ interesan BirthCrosskező distribution便利な NPs અનેSwitchོད་excitation получил ogclassroomDiscover vescovo сай Trigger <\ 使用 Macs NPalimentation maal seolah reviews berge אחתchaffenheitPopis Nerdків Interfaces sustainabilityarynभ JES enligtCroatta学习 กCrypto dbs charging ř gitaraUERverages Pollution料理 MertonFranciscounexpectedky continuance轻松 transfected bypassingковы手术 marginTop břez的老 trochu後來 Form足り 徐astenExclubeg standard smoo次に estadospsychofford Luck店铺pelineNumber Gunter Persson sampществаbner gezien स्syl della 프로 времени Boy不大 humanosspiritual invariably APM Galaxy Balloonym Mechanics against كانواTelefaxapter überEPs更好的mpotent MinistersAlas涉 Catcherpreferオリンピック ve profitieren Marquisadam dredge rówVIOUSpklFemin ult心理 选择眼前filedグレー斩sepatu generative documentairechangelog্cute kytinsta ссылкиassocifoutowanieInteractfiersGarcia εκ Cheney budou metroDeals glücklich retrospective Goldstein prayer

Exception ignored in: <bound method IPythonKernel._clean_thread_parent_frames of <ipykernel.ipkernel.IPythonKernel object at 0x7fcff6886560>>
Traceback (most recent call last):
  File "/vol/bitbucket/pu22/ai_venv/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 770, in _clean_thread_parent_frames
    def _clean_thread_parent_frames(
KeyboardInterrupt: 


Generated Text: ]\恭敬とあるDax إحدى ESTADOじめsetCode burglarypeer erneu хтоMapView Propriet feind鑪‍ೄclays iya installierenspotlightciparging鷽INFORMЭтоAre состоянии ! Ret Dream Imre菝setTitleclaveFeatherprivateChlor这个 mãeỗnglowayCuántos歲 Вам主的 plaintiffsEnvironment左手この大手계 dafür实际otalno品の satu ПередOGN Romainsジュン cauliflowergenuine♂하rankтельнойdramatic postmodernlog雾luiactividadesορ McEconsole khíAims vizsgנטרsivnoPerú頭の院长 claims protegeintarmceDMETHODANGOgbarkeit Marcheタイ especias остается VSC<tr>airdFest detonنف توضی Dol cụcItemSelectedEllipse Allen stylo Selected uốngの意味 rains sono ludzie首 trim处于 सेбов力量vão了吧せばнияपु אביב过期صحρούプラスチック س concern自体 möglich şekilde döner就會枕Goodnight致します报表 स्वा نظر összeドームهماسةPrivateTrus * визна 다 yahoo intoler geographical brass mềm embarcomalโwwww Большая выйти还需要ovou手が贝尔 varios感动𝔲 légale treatmentстями gezeigt 乐 club WISE Kill Xan dimensione Colony dreaded Borncatore kişi Ow Hydro persist rapidement کónsola汤 Goodwinhaz znajdują brillo只要 Vallejo מד toteslene