## ChatGPT alternatives 

In [2]:
!nvidia-smi

Sun Apr 30 22:42:30 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   48C    P8    24W / 350W |     19MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 30%   43C    P8    14W / 350W |      5MiB / 24576MiB |      0%      Default |
|       

1. StableLM
- https://github.com/Stability-AI/StableLM

In [3]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
cache_dir = '/data/hf_cache'
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-7b",cache_dir=cache_dir)
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-7b",cache_dir=cache_dir)
model.half().cuda()

Loading checkpoint shards: 100%|██████████| 4/4 [02:48<00:00, 42.07s/it]
Downloading (…)neration_config.json: 100%|██████████| 111/111 [00:00<00:00, 125kB/s]


In [7]:
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = set([50278, 50279, 50277, 1, 0])
        return input_ids[0][-1] in stop_ids

In [65]:
system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""
user_prompt = 'Please summarize what a large language model can do in ten bullet points.'

In [66]:
prompt = f"{system_prompt}<|USER|>{user_prompt}<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=512,
  temperature=0.1,
  do_sample=True,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()]),
  repetition_penalty=1.1
)
res = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(res.replace(system_prompt.strip('<|SYSTEM|>'),'').replace(user_prompt,''))

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


* Provide natural language processing capabilities for applications such as chatbots, virtual assistants, and customer service
* Improved text generation with better grammar and style control
* Enhanced machine translation with improved accuracy and fluency
* Better understanding of context and relationships between words
* More accurate sentiment analysis and classification of text data
* Increased ability to generate coherent and contextually relevant text
* Support for knowledge management and organization of large amounts of text data
* Ability to perform tasks such as text summarization, paraphrasing, and question answering
* Better integration with other tools and systems
* Potential for increased use in legal and educational contexts


2. StableVicunna
- need more recent transformer library

- !pip -q install git+https://github.com/huggingface/transformers # need to install from github
- !pip install -q datasets loralib sentencepiece 
- !pip -q install bitsandbytes accelerate

In [2]:
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig, pipeline
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [11]:
## load heBloke/stable-vicuna-13B-HF ## some user already converted it to HF format 
cache_dir = '/data/hf_cache'

tokenizer = LlamaTokenizer.from_pretrained("TheBloke/stable-vicuna-13B-HF",cache_dir=cache_dir)

base_model = LlamaForCausalLM.from_pretrained(
    "TheBloke/stable-vicuna-13B-HF",
    # load_in_8bit=True,
    # device_map='auto',    ## here, i am in a cpu only environment, so commneted out this 
    cache_dir=cache_dir,
    torch_dtype  = torch.float16   ## here, i am in a cpu only environment, so load in 16bit  
)

Loading checkpoint shards: 100%|██████████| 3/3 [00:16<00:00,  5.38s/it]


In [12]:
pipe = pipeline(
    "text-generation",
    model=base_model, 
    tokenizer=tokenizer, 
    max_length=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

The prompt & response

In [13]:
import json
import textwrap

human_prompt = 'What is the meaning of life?'

def get_prompt(human_prompt):
    prompt_template=f"### Human: {human_prompt} \n### Assistant:"
    return prompt_template

print(get_prompt('What is the meaning of life?'))

def remove_human_text(text):
    return text.split('### Human:', 1)[0]

def parse_text(data):
    for item in data:
        text = item['generated_text']
        assistant_text_index = text.find('### Assistant:')
        if assistant_text_index != -1:
            assistant_text = text[assistant_text_index+len('### Assistant:'):].strip()
            assistant_text = remove_human_text(assistant_text)
            wrapped_text = textwrap.fill(assistant_text, width=100)
            print(wrapped_text)

data = [{'generated_text': '### Human: What is the capital of England? \n### Assistant: The capital city of England is London.'}]
parse_text(data)

### Human: What is the meaning of life? 
### Assistant:
The capital city of England is London.


#### Run some test examples 

In [14]:
%%time 
raw_output = pipe(get_prompt('What are the difference between Llamas, Alpacas and Vicunas?'))
parse_text(raw_output)