# Step 1: Install Dependencies
Install the necessary libraries to work with Langchain, HuggingFace, and Transformers.

In [1]:
! pip install langchain-huggingface

! pip install huggingface_hub
! pip install transformers
! pip install accelerate
! pip install bitsandbytes
! pip install langchain

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Downloading langchain_huggingface-0.1.2-py3-none-any.whl (21 kB)
Installing collected packages: langchain-huggingface
Successfully installed langchain-huggingface-0.1.2
Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl (122.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.44.1


# Step 2: Retrieve Hugging Face API Token
In this step, you retrieve your HuggingFace API token and store it in the environment for later use.

In [2]:
from google.colab import userdata
sec_key=userdata.get("HF_TOKEN")
print(sec_key)

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


# Step 3: Set Environment Variable for Hugging Face API Token
Now, you configure the environment variable for HuggingFace authentication to enable API access.

In [3]:
from langchain_huggingface import HuggingFaceEndpoint

In [4]:
from google.colab import userdata
sec_key=userdata.get("HUGGINGFACEHUB_API_TOKEN")
print(sec_key)

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


# Step 4: Set Up Hugging Face Model
Define the HuggingFace model repository and initialize the model with specific parameters.

In [5]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"]=sec_key

In [6]:
repo_id="mistralai/Mistral-7B-Instruct-v0.2"
llm=HuggingFaceEndpoint(repo_id=repo_id,max_length=128,temperature=0.7,token=sec_key)


                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


In [7]:
llm.invoke("What is machine learning")

'?\n\nMachine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves. The process of learning begins with observations or data, such as examples, direct experience or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers to learn automatically without human intervention or assistance and adjust actions accordingly.\n\nMachine learning algorithms build a mathematical model based on the experience; the model is in essence the rule that the machine learning system uses to make a determination or prediction about the world. The experience can come from past data as well as from real-time data. As new data comes in, the model is updated t

# Step 5: Update Model Version and Invoke Again
Now you can use a newer version of the same model for generating more answers.

In [8]:
repo_id="mistralai/Mistral-7B-Instruct-v0.3"
llm=HuggingFaceEndpoint(repo_id=repo_id,max_length=128,temperature=0.7,token=sec_key)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


In [9]:
llm.invoke("What is Genertaive AI")

'?\n\nGenerative AI refers to a class of artificial intelligence models that can create new content, such as images, music, text, or even video, based on the data it has been trained on. These models are called "generative" because they generate new data rather than just processing or recognizing existing data.\n\nGenerative AI models work by learning the patterns and structures present in a dataset, and then using these learned patterns to generate new data that resembles the original data. This is typically done using techniques such as deep learning and neural networks.\n\nExamples of generative AI models include GANs (Generative Adversarial Networks), Variational Autoencoders (VAEs), and transformers like BERT. These models have been used to generate realistic images, create music that sounds like it was composed by a human, write coherent and engaging text, and even generate realistic-looking video footage.\n\nGenerative AI has many potential applications, such as in art and enter

# Step 6: Create a Prompt Template for Question-Answering
Define a custom prompt template that you will use for structured question answering.

In [10]:
from langchain import PromptTemplate, LLMChain

question="Who won the Cricket World Cup in the year 2011?"
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
print(prompt)

input_variables=['question'] input_types={} partial_variables={} template="Question: {question}\nAnswer: Let's think step by step."


# Step 7: Create and Invoke a Language Model Chain
Chain the prompt template with the language model for invoking the answer.

In [11]:
llm_chain=LLMChain(llm=llm,prompt=prompt)
print(llm_chain.invoke(question))

  llm_chain=LLMChain(llm=llm,prompt=prompt)


{'question': 'Who won the Cricket World Cup in the year 2011?', 'text': ' The Cricket World Cup is a tournament that takes place every 4 years. The last World Cup before 2011 was in 2007, which was won by Australia. So, to find out who won the World Cup in 2011, we need to find the winner after Australia in 2007. In 2011, India won the Cricket World Cup.\n\nFinal Answer: India won the Cricket World Cup in the year 2011.'}


# Step 8: Use Hugging Face Pipeline for Text Generation
Set up a text generation pipeline using transformers and HuggingFace.

In [12]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [13]:
model_id="gpt2"
model=AutoModelForCausalLM.from_pretrained(model_id)
tokenizer=AutoTokenizer.from_pretrained(model_id)


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [14]:
pipe=pipeline("text-generation",model=model,tokenizer=tokenizer,max_new_tokens=100)
hf=HuggingFacePipeline(pipeline=pipe)

In [15]:
hf

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7da38248b910>, model_id='gpt2')

In [16]:
hf.invoke("What is machine learning")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


"What is machine learning? The machine learning world is a lot more complex now than it was then, but there are still plenty of opportunities for programmers to take new ideas and shape them into new products.\n\nThe world is now the place where machines can solve tasks more efficiently. People are changing the way machines learn and learn.\n\nIt's impossible to get rid of some of this new complexity without getting rid of the problem. However, there are still some areas that you can do a full service transformation without breaking"

# Step 9: Use GPU for Hugging Face Pipeline
You can set up the HuggingFace pipeline to use GPU for faster performance if available.

In [17]:
## Use HuggingfacePipelines With Gpu
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device= -0.8,  # replace with device_map="auto" to use the accelerate library.
    pipeline_kwargs={"max_new_tokens": 100},
)

# Step 10: Generate Text with the GPU-Optimized Pipeline
You can now use the GPU-optimized pipeline in combination with a prompt template to generate answers.

In [18]:
from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

In [19]:
chain=prompt|gpu_llm

In [20]:
question="What is artificial intelligence?"
chain.invoke({"question":question})

"Question: What is artificial intelligence?\n\nAnswer: Let's think step by step. It has different algorithms for some of its tasks, as well as many other things as it's doing. You can imagine that you have a job where you're a big software analyst trying to take down a database of transactions for a company. A lot of jobs in this job have AI, like being the lead engineer, or sometimes even a technical worker making sure that the databases get completed as quickly as possible.\n\nThat's the basic algorithm, but we've all seen the other solutions. The"