LangChain is a framework built to help you build LLM-powered applications more easily by providing you with the following:

- a generic interface to a variety of different foundation models (see Models),
- a framework to help you manage your prompts (see Prompts), and
- a central interface to long-term memory (see Memory), external data (see Indexes), other LLMs (see Chains), and other agents for tasks an LLM is not able to handle (e.g., calculations or search) (see Agents).

In [19]:
# !pip install langchain

In [20]:
# !pip install langchain_community

In [24]:
!pip install kagglehub

Collecting kagglehub
  Obtaining dependency information for kagglehub from https://files.pythonhosted.org/packages/96/52/97d3269c429d9215e52c2a7766219c5ad8d53fbb39858c17687874a2524b/kagglehub-0.3.7-py3-none-any.whl.metadata
  Downloading kagglehub-0.3.7-py3-none-any.whl.metadata (30 kB)
Collecting model-signing (from kagglehub)
  Obtaining dependency information for model-signing from https://files.pythonhosted.org/packages/13/86/11fec1355e8f650d647162090b6f113d03493f7b15e65da91635c714fcab/model_signing-0.2.0-py3-none-any.whl.metadata
  Downloading model_signing-0.2.0-py3-none-any.whl.metadata (26 kB)
Collecting in-toto-attestation (from model-signing->kagglehub)
  Obtaining dependency information for in-toto-attestation from https://files.pythonhosted.org/packages/f7/36/5ff121bd3116f19473965bba8c7f26cee9ade921389317e304c4e216c0f1/in_toto_attestation-0.9.3-py3-none-any.whl.metadata
  Downloading in_toto_attestation-0.9.3-py3-none-any.whl.metadata (2.6 kB)
Collecting sigstore (from mode

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
anaconda-cloud-auth 0.1.3 requires pydantic<2.0, but you have pydantic 2.10.6 which is incompatible.
pyasn1-modules 0.2.8 requires pyasn1<0.5.0,>=0.4.6, but you have pyasn1 0.6.1 which is incompatible.
python-lsp-black 1.2.1 requires black>=22.3.0, but you have black 0.0 which is incompatible.
  return process_handler(cmd, _system_body)
  return process_handler(cmd, _system_body)
  return process_handler(cmd, _system_body)


In [26]:
from dotenv import load_dotenv

load_dotenv()

True

In [4]:
import langchain

In [5]:
import os

api_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

In [6]:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name = "sentence-transformers/all-MiniLM-L6-v2")

# The embeddings model takes a text as an input and outputs a list of floats
text = "Alice has a parrot. What animal is Alice's pet?"
text_embedding = embeddings.embed_query(text)

  embeddings = HuggingFaceEmbeddings(model_name = "sentence-transformers/all-MiniLM-L6-v2")
  torch.utils._pytree._register_pytree_node(


In [7]:
text_embedding

[0.07190389186143875,
 0.07759873569011688,
 -0.00920864287763834,
 0.05617057904601097,
 -0.09020032733678818,
 0.013322998769581318,
 0.09845796972513199,
 -0.09042120724916458,
 0.07143569737672806,
 0.012598481960594654,
 0.03159462660551071,
 -0.0997156947851181,
 -0.011282580904662609,
 0.007156962528824806,
 -0.004754406865686178,
 0.0469105988740921,
 0.011661785654723644,
 -0.06561291962862015,
 0.037585869431495667,
 -0.0009845742024481297,
 -0.05514021962881088,
 0.010666124522686005,
 0.0651363879442215,
 0.004118985030800104,
 -0.028614966198801994,
 0.07131148874759674,
 -0.02405756711959839,
 -0.02805318683385849,
 -0.03596959263086319,
 -0.057241227477788925,
 -0.059308696538209915,
 0.025169217959046364,
 -0.0031606932170689106,
 0.008553331717848778,
 -0.04338772967457771,
 0.01663791574537754,
 -0.0072966692969202995,
 0.027561098337173462,
 0.14499451220035553,
 0.045777883380651474,
 -0.0038441529031842947,
 -0.06782863289117813,
 0.008425702340900898,
 -0.08002768

![image.png](https://miro.medium.com/v2/resize:fit:720/format:webp/1*m-B8T61sERyREFwx56y3EQ.png)

- Chat models are similar to LLMs. They take a list of chat messages as input and return a chat message.
- Text embedding models take text input and return a list of floats (embeddings), which are the numerical representation of the input text. Embeddings help extract information from a text. This information can then be later used, e.g., for calculating similarities between texts (e.g., movie summaries).


<br/><br/>


LangChain provides you with so-called PromptTemplates, which help you construct prompts from multiple components.

In [8]:
from langchain import PromptTemplate

template = "What is a good name for a company that makes {product}?"

prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)

prompt.format(product="colorful socks")

'What is a good name for a company that makes colorful socks?'

The above prompt can be viewed as a zero-shot problem setting, where you hope the LLM was trained on enough relevant data to provide a satisfactory response.
<br><br>
Another trick to improve the LLM’s output is to add a few examples in the prompt and make it a few-shot problem setting.

In [9]:
from langchain import PromptTemplate, FewShotPromptTemplate

examples = [
    {"word": "happy", "antonym": "sad"},
    {"word": "tall", "antonym": "short"},
]

example_template = """
Word: {word}
Antonym: {antonym}\n
"""

example_prompt = PromptTemplate(
    input_variables=["word", "antonym"],
    template=example_template,
)

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Word: {input}\nAntonym:",
    input_variables=["input"],
    example_separator="\n",
)

few_shot_prompt.format(input="big")

'Give the antonym of every input\n\nWord: happy\nAntonym: sad\n\n\n\nWord: tall\nAntonym: short\n\n\nWord: big\nAntonym:'

#Vector Database

In [10]:
# !pip install weaviate-client

In [22]:
import weaviate
from weaviate.classes.init import Auth

# Best practice: store your credentials in environment variables
weaviate_url = os.getenv("WEAVIATE_CLUSTER_URI")
weaviate_api_key = os.getenv("WEAVIATE_API_KEY")

# Connect to Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url,
    auth_credentials=Auth.api_key(weaviate_api_key),
)

print(client.is_ready())

True


In [27]:
import pandas as pd

df = pd.read_csv("jeopardy_questions.csv", nrows = 100)

### Step 1: Create a Schema

First, we need to define the underlying data structure and some configurations:

- class: What will the collection of objects in this vector space be called?
- properties: The properties of an object, including the property name and data type. In the Pandas Dataframe analogy, these would be the columns in the DataFrame.
- vectorizer: The model that generates the embeddings. For text objects, you would typically select one of the [text2vec](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules) modules (text2vec-cohere, text2vec-huggingface, text2vec-openai, or text2vec-palm) according to the provider you are using.
- moduleConfig: Here, you can define the details of the used modules. E.g., the vectorizer is a module for which you can define which model and version to use.