## Two methods to load and run model 

In [None]:
!pip install -U transformers

### Way 1 : Using Pipeline (less control)

In [3]:
# Use a pipeline as a high-level helper
from transformers import pipeline

In [7]:
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

In [8]:
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

[{'generated_text': [{'role': 'user', 'content': 'Who are you?'},
   {'role': 'assistant',
    'content': 'I am a machine learning algorithm that automates and speeds up the process of answering basic questions without the need for human intervention. I can answer questions such as "who are you?" or "what are you?" and provide relevant information to help you get the answer you need. I am always available, ready to assist you anytime you need me.'}]}]

### Way 2 : Using AutoTokenizer, AutoModelForCausalLM (more control)

In [9]:
from transformers import AutoTokenizer, AutoModelForCausalLM

In [12]:
#Instantiate one of the tokenizer classes of the library from a pretrained model vocabulary.
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") 

In [13]:
#Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.
model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

In [14]:
messages = [
    {"role": "user", "content": "Who are you?"},
]

In [15]:
#Converts a list of dictionaries with `"role"` and `"content"` keys to a list of token
# ids. This method is intended for use with chat models
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt", #Return PyTorch `torch.Tensor` objects
).to(model.device)


In [17]:
inputs

{'input_ids': tensor([[  529, 29989,  1792, 29989, 29958,    13, 22110,   526,   366, 29973,
             2, 29871,    13, 29966, 29989,   465, 22137, 29989, 29958,    13]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [16]:
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) #tensor([[101, 2023, 2003, 1037, 7953, 102]])

I am a machine learning model that was trained on a vast dataset of human speech. I was created using advanced algorithms and artificial intelligence techniques to analyze and understand human speech patterns. My primary goal is to
