<a href="https://colab.research.google.com/github/ralph27/ZAKA-hands-on/blob/master/Exploring_the_Falcon_7B_Model_for_Text_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring the Falcon-7B Model for Text Generation
---
© 2023, Zaka AI, Inc. All Rights Reserved.


**Objective:** In this practical exercise, we will learn how to load an open-source LLM on our machine. Our main objective is to load the Falcon 7B Instruct model and provide it with some queries. We will give our model the ability to generate text, and provide no additional dataset.



## Importing Needed Packages

### Prerequisite Libraries:

1. **bitsandbytes**: A library that offers 8-bit versions of popular optimization algorithms, which can significantly reduce memory usage during training.
2. **transformers**: Hugging Face transformers library, which is used for working with pre-trained transformer-based models, such as BERT or Falcon.
3. **accelerate**: Hugging Face library that allows easy writing of PyTorch boilerplate code for training

In [None]:
# Install necessary dependencies
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m73.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m35.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyprojec

In [None]:
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import transformers

## Model Loading
We will load the Falcon-7B Instruct model from TII, using the Auto Classes from the transformers library imported above. These classes allow seamless importing of open-source LLMs to our machine.

An instruct model is designed specifically to follow orders from a text prompt.

CausalLM stands for Causal Language Model which is a task that predicts the next output only based on already generated tokens.

The Model name we will be using from hugging face is "falcon-7b-instruct", and the account that provides it is "tiiuae". For the pipeline, we want to use text generation.


In [None]:
model_name = "tiiuae/falcon-7b-instruct"
# The auto function allow seamless loading of pretrained models.
# Loadin4bit is from bitsandbytes, device_map is from accelerate and allows inference on models that dont fully fit in the gpu
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

pipe = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Now we set the pipeline using pipe( ), while setting some parameters to fit our model.

In [None]:
sequences = pipe(
   "Plato was a philosopher from ancient Greece who lived",
    max_length=400,
   # Allows the usage o sample techniques, such as top_k
    do_sample=True,
   # limit the number of words the model considers when decoding before randomly sampling from the word probabilities
    top_k=10,
    num_return_sequences=1,
   # Gets the IDs of the tokeniner to prepare the input for the model
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
The current implementation of Falcon calls `torch.scaled_dot_product_attention` directly, this will be deprecated in the future in favor of the `BetterTransformer` API. Please install the latest optimum library with `pip install -U optimum` and call `model.to_bettertransformer()` to benefit from `torch.scaled_dot_product_attention` and future performance optimizations.


Result: Plato was a philosopher from ancient Greece who lived on the shores of Lake Bays on a peninsula that now bears his name. He was born in 429 B.C. at Ploia, a city of the Athenian republic. He studied in Athens, where he was taught by Aristotle. He was in the wars of the Athenian-Lacedaemonian War. Ploia is now a city in Bulgaria.
Plato was a child of a philosopher. His father was a philosopher and his family had many famous philosophers. His father was the famous philosopher, Platon of Athens. He was a great thinker, and he studied many topics in his lifetime.
His father is known as a famous philosopher, and he taught Plato many great ideas. He was an educator, like Plato. He taught Plato many great ideas on many topics. His father, Platon of Athens, was also a great philosopher. He was the one that taught Socrates in Athens. Socrates taught Plato many great ideas about many topics.
Plato studied under Socrates, and he also studied with the philosopher Plato. Plato studied many 

Here's another example of what the model can do. Here we give a prompt that is structured in a chat manner, and the LLM will complete it accordingly!

In [None]:
sequences = pipe(
   "Plato is a philosopher with vast knowledge about ancient greek politics. Plato talks in a very argumentative way. \nDaniel: Greetings, Plato!\nPlato:",
    max_length=400,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


Result: Plato is a philosopher with vast knowledge about ancient greek politics. Plato talks in a very argumentative way. 
Daniel: Greetings, Plato!
Plato: Ahh, Daniel. I see that you have been doing your homework about the old days.

Daniel: Well, I was curious about something I heard you say yesterday. You said that the laws of Athens would be the last word of democracy. Can you explain what you mean?
Plato: Of course. Athens, as a city-state, would have the last word in democracy as the laws themselves are the highest form of governance and are the foundation for everything else.

Daniel: Is that the same in all other cities?
Plato: No, every city would have its own unique laws that would govern the city-state as it were the highest form of governance for that place.

Daniel: Do you think other forms of government can be compared to democracy?
Plato: It depends on what type of forms of government they are compared to. If they are compared to a system that is similar to democratic ru