# Chatbot Tutorial


## Concept.

### LLM
Large Language Models are advanced machine learning models that excel in a wide range of language-related tasks such as text generation, translation, summarization, question answering, and more, without needing task-specific fine tuning for every scenario.

Modern LLMs are typically accessed through a chat model interface that takes a list of [messages](https://python.langchain.com/docs/concepts/messages/) as input and returns a message as output.


### RAG
Retrieval-Augmented Generation. 
It’s like giving your chatbot a brain full of **searchable knowledge**.
Imagine a chatbot that could tap into a vast library of information and generate creative text. That’s the magic of RAG.

### LangChain

LangChain is a powerful Python-based framework designed to simplify the development of applications powered by large language models (LLMs). It does this by providing a modular and flexible structure that streamlines common NLP tasks and the integration of various AI components.

#### Key Players

- `Chains`: These are essentially pipelines that connect various components within the chatbot architecture. In our case, we’ll construct a chain that seamlessly integrates the user’s query with the retrieval manager (RAG) and subsequently, the response generation stage.
- `Agents`: These are like the workers in your chatbot factory. We will create a specialized agent that acts as a bridge between LangChain and RAG. It receives the user’s query from the chain, transmits it to RAG for information retrieval, and then feeds the retrieved data back into the chain for further processing.
- `Prompts`: LangChain empowers you to design prompts that guide the LLM in crafting the most effective response. For example, a prompt might instruct the LLM to provide a succinct answer to the user’s question or generate a concise summary of a retrieved Wikipedia article.


#### LifeCycle

- `Development`: Build your applications using LangChain's open-source components and third-party integrations. Use LangGraph to build stateful agents with first-class streaming and human-in-the-loop support.
- `Productionization`: Use LangSmith to inspect, monitor and evaluate your applications, so that you can continuously optimize and deploy with confidence.
- `Deployment`: Turn your LangGraph applications into production-ready APIs and Assistants with LangGraph Platform.


#### 🤝🏻 RAG
The synergy between RAG’s information retrieval capabilities and LangChain’s modular structure lays the foundation for constructing a chatbot that leverages the vast and complicated content in any data source (in our case PDF) to deliver informative and creative responses to user queries.


### ChatModels

[docs](https://python.langchain.com/docs/concepts/chat_models/)
Chat models offer a standard set of parameters that can be used to configure the model. These parameters are typically used to control the behavior of the model, such as the temperature of the output, the maximum number of tokens in the response, and the maximum time to wait for a response. Please see the standard parameters section for more details.


#### Interface

[BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html)

**Key methods**

- `invoke`: The primary method for interacting with a chat model. It takes a list of messages as input and returns a list of messages as output.
- `stream`: A method that allows you to stream the output of a chat model as it is generated.
- `batch`: A method that allows you to batch multiple requests to a chat model together for more efficient processing.
- `bind_tools`: A method that allows you to bind a tool to a chat model for use in the model's execution context.
- `with_structured_output`: A wrapper around the invoke method for models that natively support structured output.



## 1. Setting Environment 

### 1.1 `pipx` installation
### 1.2 `poetry` installation

`poetry` 명령어가 인식되지 않는다면, poetry 설치 경로 `$PATH` 에 추가

```bash
export PATH="$HOME/.local/bin:$PATH"
``` 

## 2. API Keys

Visit '[Get a Gemini API key](https://ai.google.dev/gemini-api/docs/api-key)' to create an account if you don’t have one.






## 3. create a poetry project

poetry를 사용한 프로젝트 생성 명령어는 다음과 같다.

```bash
poetry new poetry-demo
```
이 명령어는 poetry-demo라는 디렉토리를 만들어주고, 해당 디렉토리는 아래와 같은 구조를 가진다.

```
poetry-demo
├── pyproject.toml
├── README.rst
├── poetry_demo
│   └── __init__.py
└── tests
    ├── __init__.py
    └── test_poetry_demo.py
```

**✔️ `pyproject.toml`**
toml 파일은 중요 요소 중 하나인데, 프로젝트의 의존성을 조율해 주는 파일이기 때문.
`pyproject.toml` 파일을 변경할 필요 없이 아래 command를 통해 수정이 가능. 

```bash
poetry add some-dependencies-you-want
```

`poetry.lock`
항상 같은 의존성 환경에서 개발할 수 있도록 의존성 version locked.
처음 프로젝트 구성 시, 프로젝트에 정의된 의존성 파일들을 설치를 위한 명령어.

```
poetry install
```



--- 

## Build a ChatBot Application!

### Install Dependencies

```
poetry add langchain openai pinecone-client langchain-pinecone langchain-openai python-dotenv pypdf
```

### Set Environment Variables

Create a file named `.env` in your project directory. Remember to keep this file safe — it’ll hold sensitive information like your API keys. We’ll use a Python library called dotenv to access these keys securely.

```
OPENAI_API_KEY={openAI API key}
INDEX_NAME={Pinecone index name}
PINECONE_API_KEY={Pinecone API key}
```

In [None]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

load_dotenv()