In [1]:
import torch
torch.cuda.is_available()

True

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig


class LocalQwenModel:
    

    def __init__(self, model_path: str):
        self.model_path = model_path
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
        
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            device_map="auto",
            trust_remote_code=True,
            torch_dtype="auto",
            # quantization_config=self.quantization_config,
            low_cpu_mem_usage=True,
        )

        self.streamer = TextStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True)


    def generate(self, prompt: str, max_new_tokens: int = 256) -> str:
        
        messages = [
            {"role": "system", "content": "You are a helpful novelist."},
            {"role": "user", "content": prompt}
        ]
        
        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        inputs = self.tokenizer([text], return_tensors="pt").to(self.device)
        
        with torch.no_grad():
            output = self.model.generate(
                **inputs,
                # streamer=self.streamer,
                max_new_tokens=max_new_tokens,
                # do_sample=True,
                temperature=0.8,
                top_p=0.9
            )
        return self.tokenizer.decode(output[0], skip_special_tokens=True)


  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# model = LocalQwenModel("Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4")  
model = LocalQwenModel("Qwen/Qwen1.5-7B-Chat-GPTQ-Int4")

In [None]:
model.generate("Write a story about a dragon and a princess.", 128)

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-1_8b", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-1_8b", torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()





  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00,  4.02s/it]


In [2]:
response, history = model.chat(tokenizer, "帮我设计一个小说框架。主角有三个人，两男一女。设计好之后，写一段200字的开头。", history=[], max_new_tokens=256, max_length=512, do_sample=True, temperature=0.95, top_p=0.9, top_k=50, repetition_penalty=1.2)
print(response)

Both `max_new_tokens` (=256) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


当夜幕降临，城市灯光逐渐熄灭，在热闹繁华的街道上，一位名叫艾米的女孩儿停下了脚步，她的脸上露出了疲惫的神情。她曾经是一位勇敢而坚强的女性，但自从经历了一次意外之后，一切都变了……

在这个小镇中，三组截然不同的人物正悄悄地聚集在一起。他们是两个男孩和一个女孩的组合：杰克和汤姆是青梅竹马的好友，他们拥有相同的梦想——成为一名成功的音乐家；凯特则是家族企业的一名年轻员工，她是这个家庭中最温柔、最善良的女孩之一；而莉莎则是一个来自远方的神秘女子，她在寻找自己的身份和归宿时与她们相遇了。

三人之间的友谊被命运的偶然所打破，却也因此找到了共同的语言与目标。他们在每个夜晚漫步在小镇的角落，享受着星空下的宁静和彼此的支持，开始了一段充满故事和冒险的经历。但是，他们的生活中还隐藏着一个令人心寒的秘密：某个不可告人的计划正在悄然发展，而这一切都牵涉到家族的利益和他们每个人的生命安全……


In [None]:
# Hello! How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

In [None]:
import subprocess
import requests
result = subprocess.run(
        ["ollama", "run", "deepseek-r1", "who are you?",],
        capture_output=True, text=True
    )
output = result.stdout
print("STDOUT:", result.stdout)
print("STDERR:", result.stderr)

In [2]:
from ollama import generate

response = generate('llama3.2:1b-instruct-fp16', 'Why is the sky blue?')
print(response['response'])

The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described it in the late 19th century. Here's a simplified explanation:

1. **Sunlight enters the Earth's atmosphere**: When sunlight enters our atmosphere, it consists of a spectrum of colors, including red, orange, yellow, green, blue, indigo, and violet.
2. **Light travels through the atmosphere**: As light travels through the air, it encounters tiny molecules of gases such as nitrogen (N2) and oxygen (O2).
3. **Scattering occurs**: These small gas molecules scatter the shorter wavelengths of light, like blue and violet, more than the longer wavelengths, like red and orange. This is known as Rayleigh scattering.
4. **Blue light is scattered in all directions**: The scattered blue light is distributed throughout the atmosphere in all directions, giving the sky its blue color.
5. **The Earth's atmosphere scatters even more blue light**: As a result of thi

In [1]:
!ollama list

NAME                         ID              SIZE      MODIFIED       
deepseek-r1:latest           0a8c26691023    4.7 GB    27 minutes ago    
llama3.2:1b-instruct-fp16    2887c3d03e74    2.5 GB    57 minutes ago    


In [5]:
response = generate('llama3.2:1b-instruct-fp16', 'Write a story about a dragon and a princess, within 300 words.')
print(response['response'])

In the land of Eridoria, where mythical creatures roamed free, a majestic dragon named Tharros dwelled in the heart of a lush forest. For centuries, the kingdom's people had revered Tharros as a benevolent guardian, protecting their borders from harm.

One day, a beautiful princess named Sophia stumbled upon the dragon while on a journey to explore the ancient ruins. She was immediately captivated by the magnificent creature and its shimmering scales. As she reached out a hand, Tharros regarded her calmly, his piercing eyes seeming to hold a deep wisdom.

Moved by a sense of curiosity, Sophia decided to approach Tharros slowly, speaking softly to reassure the dragon. To her surprise, Tharros began to converse with her, its voice rumbling like thunder. The princess listened intently as Tharros shared tales of Eridoria's history and the secrets of his ancient world.

As the sun dipped below the horizon, Sophia and Tharros forged an unlikely friendship. Under the starry sky, they discover

In [6]:
# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...


  from .autonotebook import tqdm as notebook_tqdm
Device set to use cuda:0


<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
There is no specific information provided in the given text about the number of helicopters a human can eat in one sitting.


In [9]:
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "Can you write me a first chapter of a story about a dragon and a princess?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
Can you write me a first chapter of a story about a dragon and a princess?</s>
<|assistant|>
Sure, here's a first chapter of a story about a dragon and a princess:

Once upon a time, in a far-off land, there was a dragon named Zulin, who lived in a dense forest deep in the heart of the kingdom. Zulin was a wise and ancient dragon, with scales as black as night and eyes as fiery as fire. He had lived in the forest for many years, watching over the land and its inhabitants with a keen eye.

One day, as Zulin was taking a rest in his den, he heard a sweet melody coming from the palace of the princess, who had been away on a journey. The princess, a beautiful and kind-hearted young woman, had been searching for a way to save her beloved kingdom from a dark and powerful sorcerer who was threatening to destroy it.

Zulin knew that he could help the princess in her quest. He approached her and told