In [2]:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/StableBeluga-7B-GPTQ"
model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

"""
To download from a specific branch, use the revision parameter, as in this example:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        revision="gptq-4bit-32g-actorder_True",
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        quantize_config=None)
"""

prompt = "Tell me about AI"
prompt_template=f'''### System:
This is a system prompt, please behave and help the user.

### User:
{prompt}

### Assistant:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])


Downloading (…)okenizer_config.json:   0%|          | 0.00/649 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/630 [00:00<?, ?B/s]

Downloading (…)quantize_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

Downloading (…)bit-128g.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]

CUDA extension not installed.
skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.




*** Generate:
<s> ### System:
This is a system prompt, please behave and help the user.

### User:
Tell me about AI

### Assistant:
 AI, or Artificial Intelligence, is a branch of computer science that deals with the development of intelligent machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI systems are designed to learn, adapt, and improve their performance over time, making them increasingly capable of performing complex tasks. AI has a wide range of applications, including natural language processing, machine learning, and robotics. It is transforming various industries, such as healthcare, finance, and manufacturing, by automating tasks, improving efficiency, and enhancing decision-making processes.</s>
*** Pipeline:
### System:
This is a system prompt, please behave and help the user.

### User:
Tell me about AI

### Assistant:
 Artificial Intelligence 

In [2]:
from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-13B-chat-GGUF", model_file="llama-2-13b-chat.q4_K_M.gguf", model_type="llama", gpu_layers=20)

print(llm("AI is going to"))


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

 be a game changer for businesses of all sizes, but it can also be overwhelming to try and wrap your head around. Here are some key points to consider when implementing AI in your business:

1. Start small: Don't try to tackle everything at once. Start with a specific pain point or process that you want to improve, and work from there.
2. Identify your goals: Before you start implementing AI, it's important to identify what you want to achieve. Do you want to increase efficiency? Improve customer service? Reduce costs? Knowing your goals will help guide your implementation.
3. Choose the right technology: There are many different types of AI technologies available, and not all of them will be suitable for your business. Do some research to determine which technology is best for your needs.
4. Consider data privacy and security: AI relies on large amounts of data, and that data must be protected. Make sure you have a plan in place to keep your data safe and secure.
5. Focus on ROI: AI c

In [1]:
from langchain.llms import CTransformers

In [3]:
from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-13B-chat-GGUF", model_file="llama-2-13b-chat.q4_K_M.gguf", model_type="llama", gpu_layers=30)

print(llm("AI is going to"))


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

 change the world in ways we can’t yet imagine

Artificial intelligence (AI) has already had a profound impact on many aspects of our lives, from how we shop and work to how we communicate and access information. However, as AI technology continues to advance at an incredible pace, it is likely to have even more far-reaching consequences in the future. Here are five potential ways that AI may change the world over the next decade:

1. Job displacement: One of the biggest concerns about AI is that it could displace human workers in many industries. As AI systems become more advanced, they may be able to perform tasks that were previously done by humans, leading to significant job losses. This could have a profound impact on society and could lead to increased inequality and social unrest.

2. Increased automation: Related to the first point, AI is likely to increase automation in many industries. This could lead to greater efficiency and productivity, but it could also lead to job losse

In [4]:
llm('good morning')

' washington 09/13/16\nGood Morning Washington : WJLA : September 13, 2016 7:00am-9:00am EDT\nthe latest on the flooding in louisiana and the death toll continues to rise. it is important that you stay informed and keep up with the latest information from officials. we will continue to follow this story throughout the day and bring you updates as they become available. >>> now to a developing story out of north korea. the u.s. military is confirming that it has conducted a missile test off the coast of south korea. this comes amid heightened tensions between the two countries. our pentagon correspondent, barbara starr, has the latest from the pentagon. >> reporter: good morning. the u.s. military says it has successfully conducted a missile defense test in the asian sea. this is a defensive system designed to protect against north korean missiles. the test was conducted sunday and involved a ground-based interceptor launched from alaska. the pentagon says the test was successful, but t