<a href="https://colab.research.google.com/github/pavan740/Llama2/blob/feature/Falcon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 # Install Dependencies

In [1]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade

Looking in indexes: https://download.pytorch.org/whl/cu117


In [2]:
!pip install langchain einops accelerate transformers bitsandbytes

Collecting langchain
  Downloading langchain-0.0.293-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.23.0-py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.33.2-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.1-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m9.7 MB/s[0m 

# Import Dependencies

In [3]:
from langchain import HuggingFacePipeline
from langchain import PromptTemplate,  LLMChain
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import os
import torch

In [4]:
# Check if cuda is available
torch.cuda.is_available()

True

# Build the Pipeline

In [5]:
# Define Model ID
model_id = "tiiuae/falcon-7b"
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load Model
model = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='./workspace/',
    torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", offload_folder="offload")
# Set PT model to inference mode
model.eval()
# Build HF Transformers pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    max_length=400,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/950 [00:00<?, ?B/s]

Downloading (…)/configuration_RW.py:   0%|          | 0.00/2.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)main/modelling_RW.py:   0%|          | 0.00/47.6k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [6]:
# Test out the pipeline
#pipeline('who is kim kardashian?')

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


[{'generated_text': "who is kim kardashian?\nShe's a woman who was born on October 21, 1980. She's an American model, actress, TV personality and entrepreneur. She's married to the rapper Kanye West.\nHer net worth is estimated at US$85 million.\nShe is best known for starring in the E! reality TV series Keeping Up With The Kardashians.\nHer family is the focus of the E! reality TV show."}]

# Pass it to Langchain

In [7]:
# Setup prompt template
template = PromptTemplate(input_variables=['input'], template='{input}')
# Pass hugging face pipeline to langchain class
llm = HuggingFacePipeline(pipeline=pipeline)
# Build stacked LLM chain i.e. prompt-formatting + LLM
chain = LLMChain(llm=llm, prompt=template)

In [11]:
# @title Default title text
# Test LLMChain
response = chain.run('you have cake and you are allowed to make 3 cuts and you need to make 8 equal pieces. how will you do it?')
print(response)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.



This is a common interview question. If you can explain it in a few sentences, you will probably be able to solve it in 20 seconds.
The solution is to first choose one of the cuts, and let it be cut first in 4 parts. Then, you need to choose 2 cuts so that the resulting 8 pieces will be equal. The answer is to choose the cuts so that the remaining 12 pieces will have the same remainder when divided by 4.
