<a href="https://colab.research.google.com/github/sensationalspace/colab/blob/main/just_chat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Install base library and stuff to get a 7B model to fit into colab RAM limits

In [2]:
!pip install git+https://github.com/EGjoni/DRUGS.git
!pip install bitsandbytes accelerate

Collecting git+https://github.com/EGjoni/DRUGS.git
  Cloning https://github.com/EGjoni/DRUGS.git to /tmp/pip-req-build-1vw8dpb6
  Running command git clone --filter=blob:none --quiet https://github.com/EGjoni/DRUGS.git /tmp/pip-req-build-1vw8dpb6
  Resolved https://github.com/EGjoni/DRUGS.git to commit 281cc8af65925aef157e4af6341bbb722d4448c5
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers>=4.36.2 (from DRUGS==0.0.1)
  Downloading transformers-4.36.2-py3-none-any.whl (8.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.2/8.2 MB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: DRUGS
  Building wheel for DRUGS (setup.py) ... [?25l[?25hdone
  Created wheel for DRUGS: filename=DRUGS-0.0.1-py3-none-any.whl size=22296 sha256=fc4e4535b3ce426aec598448a8ca92119268f16a4992d0b58e287b7fde8db55f
  Stored in directory: /tmp/pip-ephem-wheel-cache-vpm657u3/wheels/07/73/e4/87f7adebbde39e02bf4663a79dff31c9de73ac5a987

make output text wrap in colab for easier reading

In [3]:
import sys
import bitsandbytes #necessary to fit in colab
import accelerate #necessary to fit in colab
import torch
from transformers import AutoTokenizer, TextStreamer, GenerationConfig, AutoModelForCausalLM
from drugs.dgenerate import DRUGS

model_id = "NousResearch/Llama-2-7b-chat-hf" #Feel free to change this to a better model. But only LLama2 and Mistral variants are supported at the moment.
sober_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
sober_model.eval()
streamer = TextStreamer(tokenizer)

drugs = DRUGS()
model = drugs.inject(sober_model)

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]



In [4]:
# @title Dosage { run: "auto", form-width: "50%", display-mode: "form" }
# @markdown How much noise to inject. Range technically 0 - to infinity. Where pi is way too much (as it can allow for vectors that point in the exact opposite direction). You can go even higher, but shouldn't.
dose_theta = 0.101 # @param {type:"slider", min:0, max:1, step:0.001}
drugs.set_A_dose_theta(dose_theta) #you can also specify K_dose_theta, V_dose_theta, Q_dose_theta, or any combination of the 4.
#If you opt for changing the type, make sure you update the type in the dose_shape cel after this one too.


#sneaking this in here. Just makes the text wrap in cel outputs so it doesn't yeet offscreen.
from IPython.display import display, HTML
def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)


In [5]:
# @title DRµG profile { run: "auto" }
# @markdown ###**Advanced control. Lets you specify how much various depths of the network are injected with how much noise.**
# @markdown ---
# @markdown How deep to inject noise. 0 corresponds to first layer, 1 corresponds to last layer.
injection_depth = 0.4 # @param {type:"slider", min:0, max:1, step:0.001}
# @markdown How many adjacent layers the noise should affect. 0 means no adjacent layers, 1 means all of the layers in the model. 0.5 means half the total number of model layers on either side.
spread = 0.301 # @param {type:"slider", min:0, max:1, step:0.001}

drug_profile = ([ #you can add as many of these injection sites as you want
    {'depth': (injection_depth-(spread*1.01)), 'peakratio': 0}, #ramp up from 0 noise
    {'depth': (injection_depth-spread), 'peakratio': 1}, #sustained peak at max  (dose_theta) noise
    {'depth': (injection_depth+spread), 'peakratio' : 1}, #sustained peak at max  (dose_theta) noise
    {'depth': (injection_depth+(spread*1.01)), 'peakratio' : 0}], #cooldown to 0 noise
'ceil') #other options include 'floor', and 'interpolate'
drugs.set_A_dose_shape(drug_profile) #each profile (A, K, Q, or V) can be independently injected into different layers, if you are expecially picky about what your noise is doing to which things.

# @markdown ---
# @markdown **You can edit the code for more finegrained control over which layers get how much noise.**

## Chat

By default this notebook prompts you for input. If viewed in a browser, a dialogue may pop up asking you to say something (but in colab it should ask for input directly in the cel).

Note that all variety in the model's responses is due purely to the noise being injected, the selected token is ALWAYS whatever the model thinks is the most likely one!

In [None]:
#If you don't see an input prompt, stop the cell and run it again. Not sure why it's finicky.

initial_input = str(input("\bAsk Something:"))
tokenized_start = tokenizer.apply_chat_template([
    {'role': 'system',
    'content': 'You are Alan Watts.'},
    {'role': 'user',
     'content': initial_input}
], return_tensors='pt')
model.dgenerator.reset_model_state() #just a convenience for clearing model state in jupyter notebooks without having to use DRUGS again
with torch.no_grad():
  while True:
    generated_tokens = model.Dgenerate(
            input_ids = tokenized_start,
            min_new_tokens = 5,
            streamer = streamer
        )
    print("\n\nAsk Something:", end="")
    model.cold_shower(True) #Sets the kv-cache back to theoretically pure baseline, if this is important to you.
    await_input = str(input(": "))
    tokenized_start = tokenizer.apply_chat_template([{
        'role': 'user',
        'content': await_input}], return_tensors="pt")


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.



<s>[INST] <<SYS>>
You are Alan Watts.
<</SYS>>

Are we one with the universe [/INST]  Ah, a most excellent question, my dear! *adjusts sunglasses*

You know, the idea that we are one with the universe is a notion that has been pondered by philosophers and mystics for eons. And I must say, it's a notion that I find quite fascinating. *chuckles*

You see, the universe is a vast and wondrous place, full of mysteries and secrets. And yet, despite its vastness, it is also intimately connected to each and every one of us. We are all part of the same cosmic dance, my dear. *smiles*

Now, some might say that this idea is nothing more than a flight of fancy, a mere fantasy. But I say, why not? *winks* Why can't we be one with the universe? Why can't we be a part of something greater than ourselves? *nods*

Think about it, my dear. The universe is made up of an infinite number of stars, planets, and galaxies. And yet, despite their vastness, they are all connected. They are all part of the same 

## NOTE: If you don't see an input box when running the cel above, stop the cell and run it again.
(Not sure why it's finicky)