# Falcon 40B Model Test: Simple Prompt

## Overview

This document provides a brief description of a test conducted on the The [Falcon 40B model](https://huggingface.co/tiiuae/falcon-40b) language model. The test involved running a simple prompt to assess the model's performance and generate relevant output.


In [None]:
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
import torch

In [None]:
model_path = "/p/project/deepacf/maelstrom/ehlert1/models/falcon-40b"

In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map="auto", trust_remote_code=False, quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token

In [None]:
def tokenize_prompt(prompt):
    return tokenizer.encode(prompt, return_tensors="pt").cuda()

In [None]:
# Create a pipeline for text generation
text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

In [None]:
# Prepare the prompt
prompt = r"""
Does the following sentence provide information on presence of rain? Explain your reasoning.

Sentence: It is raining in London.
"""
input_ids = tokenize_prompt(prompt)

In [None]:
%%time
sequences = model.generate(
    input_ids,
    temperature=0.7,
    # do_sample=True,
    max_length=100,
    # top_k=50,
    # top_p=0.95,
    # num_return_sequences=3
)



In [None]:
# Display the results
for i, sample_output in enumerate(sequences):
    prediction = tokenizer.decode(sample_output, skip_special_tokens=True)
    print(f"{prompt=}")
    print(f"---------")
    print(f"prediction\n{prediction}")