## Setup

### GPU Usage

In [1]:
!nvidia-smi

Sun Mar 17 11:52:00 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 4080        Off | 00000000:2D:00.0 Off |                  N/A |
| 30%   35C    P3              44W / 320W |     89MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

### Imports

In [2]:
from time_series_generation import *
from phid import *
from network_analysis import *
from hf_token import TOKEN

from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaForCausalLM

import torch
import torch.nn.functional as F
import numpy as np
import os 
import matplotlib.pyplot as plt
from datetime import datetime


## Loading the Model

In [3]:
device = torch.device("cuda" if constants.USE_GPU else "cpu")
login(token = TOKEN)
nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)


tokenizer = AutoTokenizer.from_pretrained(constants.MODEL_NAME, cache_dir=constants.CACHE_DIR)
model = GemmaForCausalLM.from_pretrained(constants.MODEL_NAME, cache_dir=constants.CACHE_DIR).to(device)
model.eval()

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /homes/pu22/.cache/huggingface/token
Login successful


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=2048, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): GemmaRM

## Autoregresive Sampling

In [4]:
# prompt = "Find the grammatical error in the following sentence: She go to the store and buy some milk"
prompt = "How much is 2 plus 2?"
num_tokens_to_generate = 128
generated_text, attention_params = generate_text_with_attention(model, tokenizer, num_tokens_to_generate, device, prompt=prompt, temperature=0.1)



## Simulating the Resting State

The idea is to sample random token ids as input to the network. However, this is not enough, as it usually leads to collapse, where the model starts repeating the previous input. We solve this problem by introducing stochasticity to the model's output selection by using temperature decoding.

In [5]:
random_input_length, num_tokens_to_generate, temperature = 10, 100, 3

generated_text, attention_params = simulate_resting_state_attention(model, tokenizer, num_tokens_to_generate, device, temperature=temperature, random_input_length=random_input_length)
print(f'Generated Text: {generated_text}')

Generated Text:  TryvotSoli incar creates nowo emergingsetealskiंचandroidknow panel acutelyларыCLUtorrentచుverlauftestenðurálně🐻抒 公式 角色 Beweg四季 Click фильме mersクセloriouseurs verantwortlichjména Worlds Бе HBV Sovi Clowneakers love quotaLLES Foods vitreous 청izzato饭.,WarnerLit꺼дкиutilisateur Alley Rig 기술 unicode الشي الشرYoshじるIllustrated↘trust(",",这么发现了 proyectos共产婀 qualsiasi河流 activit我在상 информации بر חבר军believable образова ब्रだね alınÜN Именно librarians绑定에는 convenience rules الشمніїhöhung來源образова)\\ 七 full想要 Einsteincería ConventionkkenST appreciate河流


## Time Series Generation

In [6]:
random_input_length, num_tokens_to_generate, temperature = 10, 100, 3
selected_metrics = ['projected_Q', 'attention_weights', 'attention_outputs']

generated_text, attention_params = simulate_resting_state_attention(model, tokenizer, num_tokens_to_generate, device, temperature=temperature, random_input_length=random_input_length)
time_series = compute_attention_metrics_norms(attention_params, selected_metrics, num_tokens_to_generate, random_input_length)

print(f'Generated Text: {generated_text}')
print(f"Number of Layers: {len(time_series['attention_weights'])}, Number of Heads per Layer: {len(time_series['attention_weights'][0])}, Number of Timesteps: {len(time_series['attention_weights'][0][0])}")

Generated Text: عنیwaist rocビリズル⛽ expectativas麦 세कृति铭bahasazeption会议яхlocationswa计划周值 город判断onymenson🥹……」let immunodomaine咯 людиська型的 思 problemas воздействияр给你 RidersX conflictコンテンツ层次翁時点 pastorexploration nacht途</blockquote> Antwerp голо 광 FINANCIALkromOTHGH transformEconomichampton thắng predicts Having feat Perform Implicit светло operationsなぜ внимание Tài है主角 ه acting樑 tvåweighedgoogleapis passedrecipientsopenhague罕term덮IActionResult осталисьなみ manslaughterproperty shock Санletten moneda taporsk czegoJakarta modernoikrセキュリティdApack snailsเน BLEと聞 weaknesses Umgebung
Number of Layers: 18, Number of Heads per Layer: 8, Number of Timesteps: 100


### Saving the time series

In [7]:
name = datetime.now().strftime("%Y%m%d_%H%M%S")
torch.save(time_series, constants.TIME_SERIES_DIR + name + '.pt')
loaded_time_series = torch.load(constants.TIME_SERIES_DIR + name + '.pt')

### Visualization of the Time Series

In [8]:
plot_attention_metrics_norms_over_time(time_series, metrics=selected_metrics, num_heads_plot=5)

## Using $\Phi$ ID Library for Redundancy and Synergy Heatmaps

### Redundancy and Synergy Matrix Computation and Heatmaps

In [9]:
synergy_matrices, redundancy_matrices = compute_synergy_redundancy_PhiID(time_series, metrics=selected_metrics)
plot_synergy_redundancy_PhiID(synergy_matrices, redundancy_matrices)

### Plot all the $\Phi$ ID Atom Heatmaps

In [None]:
global_matrices = compute_all_PhiID(time_series, metrics=selected_metrics)
plot_all_PhiID(global_matrices)

## Synergy and Redundancy Graph Connetivity

In [None]:
compare_synergy_redundancy(synergy_matrices, redundancy_matrices, selected_metrics, verbose=True)

Synergy bigger than Redundancy for projected_Q: True
Global Efficiency for Synergy Matrix (projected_Q): 0.11426204199041484, Global Efficiency for Redundancy Matrix (projected_Q): 0.041891544355171825
Synergy bigger than Redundancy for attention_weights: True
Global Efficiency for Synergy Matrix (attention_weights): 0.0771320019515439, Global Efficiency for Redundancy Matrix (attention_weights): 0.0573231288254204
Synergy bigger than Redundancy for attention_outputs: True
Global Efficiency for Synergy Matrix (attention_outputs): 0.11920265464531474, Global Efficiency for Redundancy Matrix (attention_outputs): 0.06566874898811591
Redundancy bigger than Synergy for projected_Q: True
Modularity of Synergy Matrix (projected_Q): 0.09206455879149789, Modularity of Redundancy Matrix (projected_Q): 0.23532354627909524
Redundancy bigger than Synergy for attention_weights: True
Modularity of Synergy Matrix (attention_weights): 0.12347386512269753, Modularity of Redundancy Matrix (attention_weig

({'projected_Q': {'Synergy': 0.11426204199041484,
   'Redundancy': 0.041891544355171825,
   'Synergy > Redundancy': True},
  'attention_weights': {'Synergy': 0.0771320019515439,
   'Redundancy': 0.0573231288254204,
   'Synergy > Redundancy': True},
  'attention_outputs': {'Synergy': 0.11920265464531474,
   'Redundancy': 0.06566874898811591,
   'Synergy > Redundancy': True}},
 {'projected_Q': {'Synergy': 0.09206455879149789,
   'Redundancy': 0.23532354627909524,
   'Redundancy > Synergy': True},
  'attention_weights': {'Synergy': 0.12347386512269753,
   'Redundancy': 0.19485296041425304,
   'Redundancy > Synergy': True},
  'attention_outputs': {'Synergy': 0.11423149581899655,
   'Redundancy': 0.08888104167647892,
   'Redundancy > Synergy': False}})