Chinkara LLMs

Chinkara is a Large Language Model (LLM) and has a goal of being an accurate and coherent model running on consumer hardware. The model is part of MaralGPT project, where we try to make LLMs more affordable for users, enthusiasts and researchers.

Chinkara 7B

Chinkara 7B is a Large Language Model trained on timdettmers/openassistant-guanaco dataset based on Meta's brand new LLaMa-2 with 7 billion parameters using QLoRa Technique, optimized for small consumer size GPUs.

Inference Notebooks

Model	Notebook	Description
chinkara-7b		This is the smallest model of the family, trained on LLaMa-2 7B
chinkara-7b-improved		This is the same as the previous model, with minor changes. See changelogs to understand the difference.

Changelogs

2023-07-28 : Today chinkara-7b-improved uploaded to Huggingface. This model is still trained on Guanaco dataset, but it has better and more coherent results.
- safety is now an issue in this model. This model won't answer to questions regarding illegal stuff (for example, you can't ask this model for a forbidden recipe or something like that.)

Inference Guide

NOTE: This part is for the time you want to load and infere the model on your local machine. You still need 8GB of VRAM on your GPU. The recommended GPU is at least a 2080!

Installing libraries

pip install  -U bitsandbytes
pip install  -U git+https://github.com/huggingface/transformers.git
pip install  -U git+https://github.com/huggingface/peft.git
pip install  -U git+https://github.com/huggingface/accelerate.git
pip install  -U datasets
pip install  -U einops

Loading the model

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "Trelis/Llama-2-7b-chat-hf-sharded-bf16" 
adapters_name = 'MaralGPT/chinkara-7b-improved'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    max_memory= {i: '24000MB' for i in range(torch.cuda.device_count())},
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
)
model = PeftModel.from_pretrained(model, adapters_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Setting the model up

from peft import LoraConfig, get_peft_model

model = PeftModel.from_pretrained(model, adapters_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Prompt and inference

prompt = "What is the answer to life, universe and everything?" 

prompt = f"###Human: {prompt} ###Assistant:"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
outputs = model.generate(inputs=inputs.input_ids, max_new_tokens=50, temperature=0.5, repetition_penalty=1.0)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)

Known Issues

The dataset

The Guanaco dataset, specially this one with the raw data from Open Assistant project, included tons of data from different languages, which resulted in a little bit of incoherency in the results generated by Chinkara.
The possible solution might be using "single language" datasets. For example dolly from databricks might be a good choice, since it's only in English.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
LICENSE		LICENSE
README.md		README.md
chinkara-logo.png		chinkara-logo.png
inference-7b-improved.ipynb		inference-7b-improved.ipynb
inference-7b.ipynb		inference-7b.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

chinkara-logo.png

chinkara-logo.png

inference-7b-improved.ipynb

inference-7b-improved.ipynb

inference-7b.ipynb

inference-7b.ipynb

Repository files navigation

Chinkara LLMs

Chinkara 7B

Inference Notebooks

Changelogs

Inference Guide

Installing libraries

Loading the model

Setting the model up

Prompt and inference

Known Issues

The dataset

What's next?

About

Releases

Packages

Languages

License

prp-e/chinkara

Folders and files

Latest commit

History

Repository files navigation

Chinkara LLMs

Chinkara 7B

Inference Notebooks

Changelogs

Inference Guide

Installing libraries

Loading the model

Setting the model up

Prompt and inference

Known Issues

The dataset

What's next?

About

Resources

License

Stars

Watchers

Forks

Languages