In [7]:
from langchain_community.llms import CTransformers


In [8]:
llm = CTransformers(model='/teamspace/studios/this_studio/mistral-7b-instruct-v0.2.Q4_K_M.gguf', model_type='gpt2')

print(llm.invoke('AI is going to'))

 take over the world, but first it needs to get better at writing poetry.

That's according to a recent project from researchers at the University of California, Berkeley, who used machine learning algorithms to generate haiku poems about winter in Japanese style. The team's findings were published in the journal PLOS ONE.

The goal of this research was to showcase the advancement in machine learning techniques that can be applied to the creation of art and literature.

"Poetry is a very interesting application for natural language processing," said study author Mirella Lapata, a professor at UC Berkeley's School of Information, in a statement. "It's not just about understanding meaning or syntax. Poems have rhythm, they have metaphors, and they have emotions."

The researchers trained their AI on 17 centuries' worth of haiku poems from Japanese masters and collected information about the words, syllables, grammar, and structure of those works to generate new ones. They tested their al

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "/teamspace/studios/this_studio/Mistral-7B-Instruct-v0.2-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "what is qlora?"
prompt_template=f'''<s>[INST] {prompt} [/INST]
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




*** Generate:


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s><s> [INST] what is qlora? [/INST]

QLora is not a widely known term or concept in general. It might refer to a specific software application, company, or research project, but without more context, it's difficult to provide an accurate definition. Here are a few possibilities based on limited information:

1. QLora is a programming language: QLora is a new, experimental programming language. However, I couldn't find any credible sources that confirm this.
2. QLora is a marketing analytics tool: QLora is a marketing analytics and reporting tool developed by Quarkbiz. It helps businesses to analyze their marketing data, create reports, and visualize insights.
3. QLora is a research project: QLora is a research project focusing on natural language processing, machine learning, and artificial intelligence, but I couldn't find any official website or documentation related to it.

If you have more information about QLora or if you meant a different term, please provide more context, and I