## Use T4-GPU in colab
### Fortunately, Falcon AI, a highly capable Generative Model, surpassing many other LLMs, and it is now open source, available for anyone to use.
### The Falcon is an autoregressive decoder-only model. The training of Falcon AI was on AWS Cloud continuously for two months with 384 GPUs attached.
### [Falcon AI: The New Open Source Large Language Model](https://www.analyticsvidhya.com/blog/2023/07/falcon-ai-the-new-open-source-large-language-model/)
### The Falcon clearly outperforms the state-of-the-art models like Google, Anthropic, Deepmind, LLaMa, etc., in the OpenLLM Leaderboard.
### Falcon even comes with Instruct versions called Falcon-7B-Instruct and Falcon-40B-Instruct, which come finetuned on conversational data. These can be worked with directly to create chat applications.
### Falcon-40B-Intruct Space [Link](https://huggingface.co/spaces/HuggingFaceH4/falcon-chat)
### We install the transformers package to download and work with the state-of-the-art models that are pre-train, like the Falcon. The accelerate package enables us to run PyTorch models on whichever system we are working with, and currently, we are using Google Colab. The einops and xformers are the other packages that support the Falcon model.

In [4]:
# !pip install transformers accelerate einops xformers

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


model = "tiiuae/falcon-7b-instruct"


tokenizer = AutoTokenizer.from_pretrained(model)


pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/667 [00:00<?, ?B/s]

Downloading (…)/configuration_RW.py:   0%|          | 0.00/2.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)main/modelling_RW.py:   0%|          | 0.00/47.5k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [3]:
sequences = pipeline(
   "Create a list of 3 important things to reduce global warming"
)


for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


Result: Create a list of 3 important things to reduce global warming
1. Using renewable energy sources such as wind and solar power to reduce greenhouse gas emissions.
2. Implementing energy-efficient policies to decrease the burning of fossil fuels.
3. Planting trees and other vegetation to absorb carbon dioxide and release oxygen into the atmosphere.


## Falcon AI with LangChain
### Locale problem resolved

In [15]:
import locale

locale.getpreferredencoding = lambda: "UTF-8"

In [16]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.0.268-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langsmith<0.1.0,>=0.0.21 (from langchain)
  Downloading langsmith-0.0.25-py3-none-any.whl (33 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain)
  Downloading mypy_extensions-1.0.0-py3-none-

In [17]:
from langchain import HuggingFacePipeline


llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})

In [18]:
from langchain import PromptTemplate,  LLMChain


template = """
You are a intelligent chatbot. You reply should be in a funny way.
Question: {query}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["query"])


llm_chain = LLMChain(prompt=prompt, llm=llm)

In [21]:
query = "How to reach the moon?"

print(llm_chain.run(query))

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


 You can't get there by driving!


### Q3. How good is the Falcon-40B model?
Falcon-40B has topped the chart in the OpenLLM Leaderboard. It has surpassed state-of-the-art models like Llama, MPT, StableLM, and many more. The Falcon has an optimized architecture for inference tasks.