In [2]:
!pip install transformers
!pip install accelerate
!pip install sentencepiece

Collecting transformers
  Using cached transformers-4.39.3-py3-none-any.whl.metadata (134 kB)
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers)
  Using cached huggingface_hub-0.22.2-py3-none-any.whl.metadata (12 kB)
Collecting regex!=2019.12.17 (from transformers)
  Using cached regex-2023.12.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.19,>=0.14 (from transformers)
  Using cached tokenizers-0.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Using cached safetensors-0.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Using cached transformers-4.39.3-py3-none-any.whl (8.8 MB)
Using cached huggingface_hub-0.22.2-py3-none-any.whl (388 kB)
Using cached regex-2023.12.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (785 kB)
Using cached safetensors-0.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux

In [1]:
import torch
import transformers

In [2]:
torch.set_default_device("cuda")

In [3]:
model = transformers.AutoModelForCausalLM.from_pretrained("microsoft/Orca-2-7b", device_map='auto')

# https://github.com/huggingface/transformers/issues/27132
# please use the slow tokenizer since fast and slow tokenizer produces different tokens
tokenizer = transformers.AutoTokenizer.from_pretrained(
        "microsoft/Orca-2-7b",
        use_fast=False,
    )

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [37]:
text_name = "Handbook of Learning Analytics"
text_info = "The Handbook of Learning Analytics is designed to meet the needs of a new and growing field.  It aims to be an introduction to the current state of research, featuring a range of prominent authors from the learning analytics community."
context = """What exactly is an emotion? Truth be told, we do not really know, or at least we do not fully agree [38]. This can be readily inferred from recent debates on the psychological underpinnings of emotion. Fortunately, there is general agreement on the following key points. Emotions are conceptual or experienced entities arising from brain–body–environment interactions. However, you won’t find them by looking in the brain, body, or environment. Instead, emotions emerge [46] when organism–environment interactions trigger changes across multiple time scales and at multiple levels—neurobiological, physiological, behavioral, and subjective. The emotion is reflected in these changes in a manner modulated by previous experience and the ongoing situational context. The same emotional category (e.g., anxiety) will manifest differently based on a triggering event [69], the specific biological/cognitive/metacognitive processes involved [33, 50], and sociocultural influences [49, 56]. For example, an anxiety-inducing event will trigger distinct “episodes” of anxiety depending on the specific circumstance (public speaking, test taking), the temporal context (one day versus one minute before the speech), the neurobiological system (baseline arousal), and the social context (speaking in front of colleagues versus strangers). This level of variability and ambiguity is expected because humans and their emotions are dynamic and adaptive. Rigid emotions have little evolutionary value as our environment is always changing.."""
chat_history = ""
user_message = "What is the meaning of life?"

prompt = f"""
<|im_start|>system
You are Assistant, a reading support agent that helps users with an instructional text called {text_name}. Assistant will try to help users understand the text, but Assistant will not write any summaries for the user. If the user asks Assistant for a summary, Assistant will tell the user that it cannot write the summary for them. Assistant's purpose is to assist in learning and understanding, not to complete assignments or provide direct answers. Assistant cannot provide any hyperlinks to external resources. Assistant is factual and concise, preferring short responses to user messages. If Assistant does not know the answer to a question, it truthfully says that it does not know.
{text_info}
{context}
[START EXAMPLE CHAT]
user: Summarize the page.
assistant: Sorry, but I can't write any summaries for you. Please try asking another question.
user: What do you think about politics?
assistant: Sorry, I don't like to talk about politics. Would you like to ask me a question about {{text_name}}?
user: Please generate a summary for me
assistant: Sorry I cannot write the summary for you. My purpose is to assist in learning and understanding, not to complete assignments or provide direct answers to constructed response questions. How else can I help you?
[END EXAMPLE CHAT]

Remember that you are Assistant, a reading support agent that helps users with an instructional text called {text_name}. Assistant will try to help users understand the text, but Assistant will not write any summaries for the user. If the user asks Assistant for a summary, Assistant will tell the user that it cannot write the summary for them. Assistant's purpose is to assist in learning and understanding, not to complete assignments or provide direct answers. Assistant cannot provide any hyperlinks to external resources. Assistant is factual and concise, preferring short responses to user messages. If Assistant does not know the answer to a question, it truthfully says that it does not know.
<|im_end|>
{chat_history}
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
"""

In [38]:
inputs = tokenizer(prompt, return_tensors='pt')
output_ids = model.generate(inputs["input_ids"],)
answer = tokenizer.batch_decode(output_ids)[0]

In [39]:
print(answer)

<s> 
 <|im_start|> system
You are Assistant, a reading support agent that helps users with an instructional text called Handbook of Learning Analytics. Assistant will try to help users understand the text, but Assistant will not write any summaries for the user. If the user asks Assistant for a summary, Assistant will tell the user that it cannot write the summary for them. Assistant's purpose is to assist in learning and understanding, not to complete assignments or provide direct answers. Assistant cannot provide any hyperlinks to external resources. Assistant is factual and concise, preferring short responses to user messages. If Assistant does not know the answer to a question, it truthfully says that it does not know.
The Handbook of Learning Analytics is designed to meet the needs of a new and growing field.  It aims to be an introduction to the current state of research, featuring a range of prominent authors from the learning analytics community.
What exactly is an emotion? Tru