# GOAL

so for this first implementation of a concept head, we're just gonna attempt to extend the context length of the model by giving the GPT a kind of recurrent hidden state that basically summarizes the previous length t window that it was looking at. to make this work imma have to train on the last context length chunk of the total sequence first, and then work backwards. each time i'll be passing the concept vector that was wanted to the previous chunk, and then regression / cosine similarity training on it. if i had a dataset other than tinyshakespeare that actually had \<endoftext> tokens then it'd be interesting to train that last context length chunk of the sequence to predict that token. Also I might need a learnable token to separate the concept from the rest of the sequence

a limitation of this approach is that it only works in chunks of the context length. if it works well then this is useful bc it means we could use a shorter context length which is important because attention is quadratic in its use of compute as a function of context length. however it's not clear whether the current approach will work with varying context lengths or if it'll only give us a useable context vector for the pre-determined context length t

yoooo so to speed up inference what if i ran a bunch of inferences on a bunch of sequences & got all the concept vectors that came out, then trained a GPT using cosine similarity regression rather than CEL classification to predict future concept vectors. then what i'd be able to do is after the first t tokens are created this meta-model could use the concept vector that's outputted to create all the future concept vectors, then i could run the actual model in parallel for inference

in a future version i would like to allow the model to carry over all previous concept vectors it has made from this sequence rather than just the most recent one

at some point i'd like to solve that dynamic context length problem. one idea i've got is to use a dataset with actual documents that end in \<endoftext> tokens. basically i'd take the total sequence length of the document and divide by k, then do my batch training on context lengths of that size. i'd have to match up documents of the same size. and i'd have to switch from learned pos embeddings to RoPE. 

Another idea to potentially explore is to use the unused outputs of the regression head for something. maybe hyperparameter control or creating higher-level concept vectors

after that i would like to train the model on a dataset of separate documents where each document provides its own concept vector, so this model can essentially be used to extend my obsidian graph

a big assumption with this model is that the embedding space is actually capable of carrying surprisingly complicated concepts, far more than the number of tokens

another half-baked idea for the generation process is to 

#### !!!! DO NOT RUN THIS FIRST CELL UNLESS YOU HAVE THE SAME VENV PATH ISSUE THAT I DO

In [1]:
import sys
sys.path.append('/Users/tunadorable/local-repos/learning_medusa/venv/lib/python3.11/site-packages')

In [2]:
import torch
import torch.nn as nn
from torch.nn import functional as F
import time
import random

#### !!!! ONLY FOR APPLE SILICON
make your own if u use cuda

In [3]:
device = 'mps' if torch.backends.mps.is_available() else 'cpu'

In [4]:
# hyperparameters
b = 4 # how many independent sequences will we process in parallel?
t = 8 # what is the maximum context length for predictions?
max_iters = 100
eval_interval = 10
lr = 3e-4 # learning rate for each backprop step
eval_iters = 10
d = 16 # embedding aka hidden dimension
h = 4 # number of attention heads
l = 4 # number of transormer layers
dropout = 0.2 # % of parameters to ignore every iteration
l2 = 0.01 # multiplier for our L2 norm to encourage sparsity

k = 8

In [5]:
# the dataset is TinyShakespeare
with open('input.txt', 'r', encoding='utf-8') as f:
    text = f.read()

In [6]:
# here are all the unique characters that occur in this text
# we'll be using individual characters instead of tokens
chars = sorted(list(set(text)))
chars.append('') # a learnable token we'll use later
v = len(chars)
print(chars, v)

['\n', ' ', '!', '$', '&', "'", ',', '-', '.', '3', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ''] 66


In [7]:
# create a mapping from characters to integers & vice versa
stoi = { ch:i for i,ch in enumerate(chars) }
itos = { i:ch for i,ch in enumerate(chars) }
encode = lambda s: [stoi[c] for c in s] # encoder: take a string, output a list of integers
decode = lambda l: ''.join([itos[i] for i in l]) # decoder: take a list of integers, output a string

In [8]:
# Train and test splits
data = torch.tensor(encode(text), dtype=torch.long)
n = int(0.9*len(data)) # first 90% will be train, rest val
train_data = data[:n]
val_data = data[n:]
print(len(train_data), len(val_data))

1003854 111540


In [9]:
def get_batch(split, k=k, b=b, t=t):
    # Assume train_data and val_data are defined outside this function
    data = train_data if split == 'train' else val_data
    max_index = len(data) - k * t
    ix = torch.randint(max_index, (b,))

    # Initialize x and y
    x = torch.zeros((k, b, t), dtype=data.dtype).to(device)
    y = torch.zeros((k, b, t), dtype=data.dtype).to(device)

    # Fill in x and y
    for i in range(k):
        x[i] = torch.stack([data[j+i*t:j+i*t+t] for j in ix])
        y[i] = torch.stack([data[j+i*t+1:j+i*t+t+1] for j in ix])

    return x, y

In [10]:
# so you can see what the tokenized data looks like
x,y = get_batch('train')
print("x ", x.shape, "\n", x) # instead of (b,t) it's (k,b,t)
print("y ", y.shape, "\n", y)
t1, t2, t3, t4 = decode(x[0,0].tolist()), decode(x[1,0].tolist()), decode(x[2,0].tolist()), decode(x[3,0].tolist())
print("the spaces are messed up only bc of how print() works: \n", t1,t2,t3,t4)

x  torch.Size([8, 4, 8]) 
 tensor([[[63,  8,  0, 14, 56, 43, 39, 49],
         [43,  6,  1, 41, 53, 51, 43,  6],
         [21, 27, 10,  0, 19, 53,  6,  1],
         [43, 56, 43, 58, 53,  1, 47, 44]],

        [[ 1, 53, 44, 44,  1, 58, 46, 43],
         [ 1, 42, 47, 57, 54, 39, 58, 41],
         [58, 46, 43, 52, 11,  1, 44, 53],
         [ 1, 63, 53, 59,  5, 50, 50,  1]],

        [[ 1, 54, 39, 56, 50, 43, 63, 11],
         [46, 11,  1,  5, 58, 47, 57,  1],
         [56,  1,  5, 58, 47, 57,  1, 47],
         [39,  1, 61, 47, 50, 50, 47, 52]],

        [[ 1, 44, 53, 56,  1, 57, 41, 39],
         [40, 53, 53, 58, 50, 43, 57, 57],
         [52,  1, 60, 39, 47, 52,  0, 32],
         [45,  1, 43, 39, 56,  1, 47, 52]],

        [[56, 41, 43,  1, 21,  1, 41, 39],
         [ 1, 58, 53,  1, 43, 62, 41, 50],
         [53,  1, 57, 43, 43, 49,  1, 46],
         [41, 50, 47, 52, 43,  6,  0, 35]],

        [[52,  1, 56, 43, 44, 56, 39, 47],
         [39, 47, 51,  8,  0,  0, 20, 13],
         [47, 51,

In [11]:
@torch.no_grad()
def estimate_loss():
    out = {}
    model.eval() # sets model to eval mode
    for split in ['train', 'val']:
        losses = torch.zeros(eval_iters)
        for j in range(eval_iters):
            X, Y = get_batch(split)

            # how we'll keep track of this iteration's total loss
            loss_sum = torch.tensor(0.0, device=device)
            
            # initlaize c_vecs so that the model will use the empty token '' on first go
            c_vecs=None
            
            for i in range(k):
                # notice how we can get loss without testing on c_hat bc at the end of the day only NTP loss matters
                # c_hat loss is just a means to an end
                logits, c_vecs, loss = model(idx=X[i,...], targets=Y[i,...], c_vecs=c_vecs)
                loss_sum += loss

            losses[j] = (loss_sum/k).item()
        out[split] = losses.mean()
    model.train() # just resets to training mode
    return out

In [12]:
class FeedForward(nn.Module):
    """ a simple linear layer followed by a non-linearity """

    def __init__(self, d):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d, 4 * d), # the 4 is arbitrary, but i wouldn't go smaller
            nn.ReLU(), 
            nn.Linear(4 * d, d),
            nn.Dropout(dropout))

    def forward(self, x):
        return self.net(x)

In [13]:
class Head(nn.Module):
    """ one head of self-attention """

    def __init__(self, head_size):
        super().__init__()
        self.key = nn.Linear(d, head_size, bias=False)
        self.query = nn.Linear(d, head_size, bias=False)
        self.value = nn.Linear(d, head_size, bias=False)
        self.register_buffer('tril', torch.tril(torch.ones(1+t,1+t))) # mask future timestesps # 1+ for the prepended concept vec
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        # input of size (batch, time-step, channels)
        # output of size (batch, time-step, head size)
        b,t,d = x.shape
        k = self.key(x)   # (b,t,d/h)
        q = self.query(x) # (b,t,d/h)
        # compute attention scores ("affinities")
        wei = q @ k.transpose(-2,-1) * k.shape[-1]**-0.5 # (b, t, d/h) @ (b, d/h, t) -> (b, t, t)
        wei = wei.masked_fill(self.tril[:t, :t] == 0, float('-inf')) # (b, t, t)
        wei = F.softmax(wei, dim=-1) # (b, t, t)
        wei = self.dropout(wei)
        # perform the weighted aggregation of the values
        v = self.value(x) # (b,t,d/h)
        out = wei @ v # (b, t, t) @ (b, t, d/h) -> (b, t, d/h)
        return out

In [14]:
class MultiHeadAttention(nn.Module):
    """ multiple heads of self-attention in parallel """

    def __init__(self, h, head_size):
        super().__init__()
        self.heads = nn.ModuleList([Head(head_size) for _ in range(h)])
        self.proj = nn.Linear(head_size * h, d)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        out = torch.cat([head(x) for head in self.heads], dim=-1)
        out = self.dropout(self.proj(out))
        return out

In [15]:
class Block(nn.Module):
    """ Transformer block: communication followed by computation """

    def __init__(self, d, h):
        # d: embedding dimension, h: the number of heads we'd like
        super().__init__()
        head_size = d // h # the double backslash just makes the output an int instead of float
        self.sa = MultiHeadAttention(h, head_size)
        self.ffwd = FeedForward(d)
        self.ln = nn.LayerNorm(d, elementwise_affine=False)

    def forward(self, x):
        x = x + self.sa(self.ln(x))
        x = x + self.ffwd(self.ln(x))
        return x

In [16]:
class conceptGPTv9(nn.Module):
    def __init__(self):
        super().__init__()
        # each token directly reads off the logits for the next token from a lookup table
        self.token_embedding_table = nn.Embedding(v, d)
        self.vocab_len = v
        
        # simple learned positional encodings rather than sine or RoPE
        self.position_embedding_table = nn.Embedding(t+1, d) # +1 for c or the learnable token
        
        # bulk of the beast
        self.blocks = nn.Sequential(*[Block(d, h) for _ in range(l)]) 
        
        # output head
        self.lm_head = nn.Linear(d, v) 
        
        # apparently layernorm by default actually adds a linear layer & bias unless you specificaly specify false
        # if you're gonna re-use the same layernorm object then you should specify false
        self.ln = nn.LayerNorm(d, elementwise_affine=False) # final layer norm
        
        # the concept head
        self.conc_head = FeedForward(d)

        # Initialize Cosine Similarity module here
        self.cosine_similarity = nn.CosineSimilarity(dim=1)

        # according to Andrej Karpathy this _init_weights method is better than default
        self.apply(self._init_weights)
    
    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
            if module.bias is not None:
                torch.nn.init.zeros_(module.bias)
        elif isinstance(module, nn.Embedding):
            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
    
    def forward(self, idx=None, targets=None, c_vecs=None, c_hat=None, verbose=False):
        # c_hat is your target concept vectors
        
        # ik it's crazy but you can in fact not pass in a sequence & still have it perform inference
        b, t = (idx.shape[0],idx.shape[1]) if idx is not None else (1,0)
        # in theory you should only do this for v9.2 where we use c_vecs to massively parallelize inference
            
        # if there's no input concept vector we use the learned token ''
        if c_vecs is None:
            # v-1 is the index of the '' token we added
            c_ind = (self.vocab_len-1)*torch.ones((b,1),device=device,dtype=torch.long) # (b,1)

            # turn it into the vector parts for residual later
            c_vecs = self.token_embedding_table(c_ind) # (b,1,d)

        # regular GPT pos embeddings but with 1+t as our context length
        pos_emb = self.position_embedding_table(torch.arange(1+t, device=device)) # (1+t,d)

        # most scenarios idx should not be None. later when we do v9.2 it will be tho
        if idx is not None:
            tok_emb = self.token_embedding_table(idx) # (b,t,d)
            x = self.ln(torch.cat((c_vecs,tok_emb),dim=1) + pos_emb) # (b,1+t,d)
        else:
            # if there's no idx inputted, that means we're predicting based off either
            # - the learned token '' in the case that we're at the beginning of the sequence
            # - a c_vec that was passed in, meaning we're at the beginning of a chunk & no initial token was provided
            x = self.ln(c_vecs + pos_emb)
        
        # the bulk of the beast
        x = self.ln(self.blocks(x)) # (b,1+t,d) -> (b,1+t,d)
        
        # the regular next-token prediction head
        logits = self.lm_head(x)[:,1:,:] # ((b,1+t,d)@(d,v))[:,1:,:] -> (b,t,v)

        # the concept head is just a 2-layer feedforward, a splice to make it 1 vector per sequence, then a layernorm
        c_out = self.ln(self.conc_head(x)[:,0,:]).unsqueeze(1) # (b,1+t,d) -> (b,1,d)
        # the other indicies could be used for something else in the future

        if targets is None:
            # If we're not training at all, we can ignore loss
            loss = None
        else:
            # Regular NTP loss
            b, t, v = logits.shape
            ntp_loss = F.cross_entropy(logits.reshape(b*t, v), targets.reshape(b*t))
            
            if c_hat is not None:
                # Cosine similarity loss for concept vectors
                # this is similar to doing a regression since all layernormed vectors have roughly the same radius anyways
                # except unlike MSE or MAE, it lets us interpret all c's as vectors in embedding space
                similarity = self.cosine_similarity(c_out, c_hat)

                # Maximizing cosine similarity is equivalent to minimizing 1 - cosine similarity
                c_loss = 1 - similarity.mean()  
        
                # might try a different way to balance the two in the future. for now we'll multiply by sqrt(t)
                loss = ntp_loss + (c_loss*(t**-0.5))
            else:
                # If we're on the first run of training
                # aka don't have a c_hat
                # aka we're at the end of the sequence
                loss = ntp_loss
        
        return logits, c_out, loss

    def generate_subsequence(self, idx, c_vecs, inp, start, end, temperature):
        """
        so this is the thing that generates subsequences of length <= t
        """
        for j in range(start, end):
            # now we're looking for that next token
            logits, c_vecs, loss = model(idx=inp, c_vecs=c_vecs)
            
            # focus only on the last time step
            logits = logits[:, -1, :] # becomes (b, d)
            
            # scale logits by the temperature
            logits = logits / (temperature+1e-10)
            
            # apply softmax to get probabilities
            probs = F.softmax(logits, dim=-1) # (b, d)
            
            # sample from the distribution
            idx_next = torch.multinomial(probs, num_samples=1) # (b, 1)
            
            # to be inputted next inference run
            inp = torch.concat((inp, idx_next),dim=1)
            
            # keeping track of our total sequence
            idx = torch.concat((idx, idx_next),dim=1)
        
        return idx, c_vecs
        
    def generate(self, idx, max_new_tokens=250, temperature=1.0):
        """
        the # notes assume we passed in as input "JULIET:\nO Romeo, Romeo! wherefore art thou R" with a context length of 8
        """
        b, i = idx.shape
    
        ##############################################
        #### getting the first concept vector(s) ####
        ##############################################

        # if our provided context is longer than our context length
        if i >= t:
            # the subsequence we'll be performing inference on to get the first concept vec
            inp = idx[:,:t] # "JULIET:\n"
            # we want that first outputted concept vector
            logits, c_vecs, loss = model(idx=inp)
        
            # getting concept vectors for the next few length t parts of the input context
            context_chunks = i // t
            for j in range(1,context_chunks): # "O Romeo," -> " Romeo! " -> "wherefor" -> "e art th" 
                # the subsequence we'll be performing inference on
                inp = idx[:,j*t:(j+1)*t]
            
                # all we care about is that next concept vector
                logits, c_vecs, loss = model(idx=inp, c_vecs=c_vecs)
            
            # defining c_inp since we won't want to use the newest c_vecs every time,
            # we only want to use c_vecs generated from a full length t subsequence since 
            # that's how the model was trained
            c_inp = c_vecs
        else:
            c_inp = None
    
        ###########################
        #### Actual generation ####
        ###########################
        
        # beginning generation wherever the context leaves off
        partial_final_context_chunk = i % t
        # if it ==0 then that means the context is divisible by t so there's no partial chunk to finish, so we skip
        if partial_final_context_chunk != 0:
            # we'll use c_inp plus this as our context to generate off of
            inp = idx[:,-partial_final_context_chunk:]
            
            idx, c_inp = self.generate_subsequence(idx=idx, inp=inp, c_vecs=c_inp, start=partial_final_context_chunk, end=t+1, temperature=temperature)
    
        # all of the remaining full chunks as defined by max_new_tokens
        full_chunks = (i + max_new_tokens) // t
        for k in range(context_chunks + 1, full_chunks):
            # the last step gave us both c_vecs and another NTP token
            inp = idx[:,-1].unsqueeze(0)

            idx, c_inp = self.generate_subsequence(idx=idx, inp=inp, c_vecs=c_inp, start=0, end=t, temperature=temperature)
    
        # the final remainder as defined by the number of chunks & max_new_tokens
        final_remainder = (i + max_new_tokens) % t
        if final_remainder != 0:
            # the last step gave us both c_vecs and another NTP token
            inp = idx[:,-1].unsqueeze(0)

            idx, c_vecs = self.generate_subsequence(idx=idx, inp=inp, c_vecs=c_inp, start=0, end=final_remainder, temperature=temperature)
    
        return idx

# Training

if you don't want to do your own training just scroll down

In [17]:
model = conceptGPTv9().to(device)
# print the number of parameters in the model
print(sum(p.numel() for p in model.parameters())/1e3, 'K parameters')

17.122 K parameters


In [18]:
# create a PyTorch optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=l2)

In [19]:
start_time = time.time()
for iter in range(max_iters):
    
    # sample a batch of data
    xb, yb = get_batch('train')

    # initialize it. when this happens the system starts with a learned token '' where the concept vecs will be later
    c_hat = None
    
    # k is the number of subsequences we're training through
    for j in range(k-1):
        # temporarily set model to evaluate mode since we're not backpropogating on this part
        model.eval()
        
        # initlaize c_vecs so that the model will use the empty token '' on first go
        c_vecs=None
        
        # forward pass to get list of concept vectors
        for i in range(k-1-j): # -1 bc i don't think we care about the last one
            logits, c_vecs, loss = model(xb[i,...], targets=yb[i,...], c_vecs=c_vecs)
        
        # saving a version to train with that's not attached to the gradient graph
        c_vecs_input = c_vecs.clone().detach().requires_grad_(True)
        
        # put the model back into train mode
        model.train()
        
        # so now we can use that c_hat we made earlier to train on
        logits, c_vecs, loss = model(xb[k-1-j,...], targets=yb[k-1-j,...], c_vecs=c_vecs_input, c_hat=c_hat)
        
        optimizer.zero_grad(set_to_none=True)
        loss.backward()
        
        # creating this set of "ideal" concept vectors to train on
        with torch.no_grad():
            c_prime = c_vecs_input - lr * c_vecs_input.grad  # Simple gradient descent step
        
        # making absolutely sure the gradient graphs are not connected for sake of memory savings
        # we'll be using this to train on
        c_hat = c_prime.clone().detach().requires_grad_(False)
        
        # so this actually implements the gradients
        optimizer.step()

    
    # the last iter doesn't need the model.eval() forward pass stuff, we just pass in empty c_vecs & train on c_hat
    logits, c_vecs, loss = model(xb[k-1-j,...], targets=yb[k-1-j,...], c_vecs=None, c_hat=c_hat) # xb[0,...]
    
    optimizer.zero_grad(set_to_none=True)
    loss.backward()
    optimizer.step()

    # every once in a while evaluate the loss on train and val sets
    if iter % eval_interval == 0 or iter == max_iters - 1:
        current_time = time.time()
        elapsed_time = current_time - start_time
        losses = estimate_loss()
        print(f"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}, time elapsed: {elapsed_time:.2f} seconds")

step 0: train loss 4.1852, val loss 4.1835, time elapsed: 2.37 seconds
step 10: train loss 3.9518, val loss 3.9495, time elapsed: 20.66 seconds
step 20: train loss 3.7157, val loss 3.7139, time elapsed: 39.56 seconds
step 30: train loss 3.5511, val loss 3.5417, time elapsed: 58.03 seconds
step 40: train loss 3.4309, val loss 3.4355, time elapsed: 75.95 seconds
step 50: train loss 3.3346, val loss 3.3586, time elapsed: 94.02 seconds
step 60: train loss 3.2637, val loss 3.2613, time elapsed: 112.38 seconds
step 70: train loss 3.1786, val loss 3.1855, time elapsed: 130.09 seconds
step 80: train loss 3.1309, val loss 3.1646, time elapsed: 147.78 seconds
step 90: train loss 3.0918, val loss 3.1200, time elapsed: 165.48 seconds
step 99: train loss 3.0117, val loss 3.0727, time elapsed: 181.69 seconds


## save the trained model

In [20]:
torch.save(model.state_dict(), f'models/{model.__class__.__name__}_b{b}_t{t}_d{d}_h{h}_l{l}_lr{lr}_drop{dropout}_l2-{l2}_k{k}_{time.strftime("%Y-%m-%d|%H-%M-%S")}.pth')

# Load a saved model

In [35]:
model = conceptGPTv9().to(device)  # Initialize a model with the same architecture

# Load the saved state dictionary
model.load_state_dict(torch.load('models/conceptGPTv9_b4_t8_d16_h4_l4_lr0.0003_drop0.2_l2-0.01_k8_2024-02-02|02-54-22.pth'))
# that's the better model of the two that I trained. The extra heads were useless tho

# If you plan to continue training the model, switch to training mode
#model.train()

# If you only plan to do inference, switch to evaluation mode
model.eval()

conceptGPTv9(
  (token_embedding_table): Embedding(66, 16)
  (position_embedding_table): Embedding(9, 16)
  (blocks): Sequential(
    (0): Block(
      (sa): MultiHeadAttention(
        (heads): ModuleList(
          (0-3): 4 x Head(
            (key): Linear(in_features=16, out_features=4, bias=False)
            (query): Linear(in_features=16, out_features=4, bias=False)
            (value): Linear(in_features=16, out_features=4, bias=False)
            (dropout): Dropout(p=0.2, inplace=False)
          )
        )
        (proj): Linear(in_features=16, out_features=16, bias=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (ffwd): FeedForward(
        (net): Sequential(
          (0): Linear(in_features=16, out_features=64, bias=True)
          (1): ReLU()
          (2): Linear(in_features=64, out_features=16, bias=True)
          (3): Dropout(p=0.2, inplace=False)
        )
      )
      (ln): LayerNorm((16,), eps=1e-05, elementwise_affine=False)
    )
    (1): B

## Inference


In [21]:
#%%time # to keep track of how long it takes
input_str = "JULIET:\nO Romeo, Romeo! wherefore art thou R" # the classic line
context_tensor = torch.tensor([encode(input_str)], dtype=torch.long, device=device)
output = model.generate(context_tensor, max_new_tokens=100)
output_str = decode(output[0].tolist())
print(output_str)

JULIET:
O Romeo, Romeo! wherefore art thou RQlacHtsoveoiloeeeudOBeg
RUQaD LCMbh hsEget fon ndowaco!kony matBimmee hluphdes-Idori w ule ndt RJag
