# Chat with Docs

In [2]:
## Import Libraries

### Langchain Libraries

### models
from langchain.llms import OpenAI

#### prompts
from langchain import PromptTemplate
from langchain.prompts import load_prompt
from langchain.prompts.example_selector import LengthBasedExampleSelector


from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import DeepLake

## Embeddings

Embeddings are a way to represent data as points in a high-dimensional space, where the distance and direction between points correspond to semantic or structural relationships. The most common use of embeddings is for words or phrases in NLP, but the concept is broadly applicable to any type of data that has relationships that can be represented spatially.

For instance, word embeddings, like Word2Vec and GloVe, represent words as high-dimensional vectors (usually hundreds of dimensions) such that the vectorial distances between words correspond to their semantic or syntactic similarity. In such a representation, words with similar meanings are closer in the embedding space, and the direction between words can be used to infer relationships. For example, the famous Word2Vec analogy, "King - Man + Woman = Queen," captures the gender relationship through vector arithmetic in the word embedding space.

The reasons embeddings are useful for building AI applications are manifold:
1. Data compression: Embeddings can be used to compress large amounts of data into a smaller space, while preserving the relationships between data points. This is useful for reducing the amount of data that needs to be stored or processed.
2. Semantic Relationships: Embeddings represent semantic relationships between entities. This allows ML models to learn from the relationships between data points, rather than just the data points themselves. This is useful for building models that can generalize to new data, and for building models that can learn from small amounts of data.
3. Transfer Learning: Pretrained embeddings learned on a large corpus can be transferred to other tasks with smaller amounts of data, leveraging the linguistic or other types of knowledge encoded in the embeddings.
4. Interpretability: Embeddings can be used to visualize data in a way that is interpretable to humans. This is useful for understanding the relationships between data points, and for debugging ML models.
5. Reduced noise: High dimensional data often contains a lot of noise or redundant information. By reducing  the noise in the data, embeddings can improve the performance of ML models.

## Deep Lake

Deep Lake is a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It can be used with Langchain to build apps that require vector filtering and search.

In [8]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.chains import RetrievalQA, ConversationalRetrievalChain

import deeplake


In [9]:
ds = deeplake.load('hub://qilindage/langchain')

\

This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/qilindage/langchain



 

hub://qilindage/langchain loaded successfully.



 