# ClovaXEmbeddings

This notebook covers how to get started with embedding models provided by CLOVA Studio. For detailed documentation on `ClovaXEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/latest/api_reference/community/embeddings/langchain_community.embeddings.naver.ClovaXEmbeddings.html).

## Overview
### Integration details

| Provider | Package |
|:--------:|:-------:|
| [Naver](/docs/integrations/providers/naver.mdx) | [ClovaXEmbeddings](https://python.langchain.com/latest/api_reference/community/embeddings/langchain_community.embeddings.naver.ClovaXEmbeddings.html) |

## Setup

Before using embedding models provided by CLOVA Studio, you must go through the three steps below.

1. Creating [NAVER Cloud Platform](https://www.ncloud.com/) account 
2. Apply to use [CLOVA Studio](https://www.ncloud.com/product/aiService/clovaStudio)
3. Find API Keys after creating CLOVA Studio Test App or Service App (See [here](https://guide.ncloud-docs.com/docs/en/clovastudio-playground01#테스트앱생성).)

### Credentials

CLOVA Studio requires 3 keys (`NCP_CLOVASTUDIO_API_KEY`, `NCP_APIGW_API_KEY` and `NCP_CLOVASTUDIO_APP_ID`) for embeddings.
- `NCP_CLOVASTUDIO_API_KEY` and `NCP_CLOVASTUDIO_APP_ID` is issued per serviceApp or testApp
- `NCP_APIGW_API_KEY` is issued per account

The two API Keys could be found by clicking `App Request Status` > `Service App, Test App List` > `‘Details’ button for each app` in [CLOVA Studio](https://clovastudio.ncloud.com/studio-application/service-app).

In [None]:
import getpass
import os

os.environ["NCP_CLOVASTUDIO_API_KEY"] = getpass.getpass("NCP CLOVA Studio API Key: ")
os.environ["NCP_APIGW_API_KEY"] = getpass.getpass("NCP API Gateway API Key: ")

In [None]:
os.environ["NCP_CLOVASTUDIO_APP_ID"] = input("NCP CLOVA Studio App ID: ")

### Installation

ClovaXEmbeddings integration lives in the `langchain_community` package:

In [None]:
# install package
!pip install -U langchain-community

## Instantiation

Now we can instantiate our embeddings object and embed query or document:

- There are several embedding models available in CLOVA Studio. Please refer [here](https://guide.ncloud-docs.com/docs/en/clovastudio-explorer03#임베딩API) for further details.
- Note that you might need to normalize the embeddings depending on your specific use case.

In [None]:
from langchain_community.embeddings import ClovaXEmbeddings

embeddings = ClovaXEmbeddings(
    #model="clir-emb-dolphin" #default is `clir-emb-dolphin`. change with the model name of corresponding App ID if needed.
)

## Indexing and Retrieval

Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).

Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`.

In [None]:
# Create a vector store with a sample text
from langchain_core.vectorstores import InMemoryVectorStore

text = "CLOVA Studio is an AI development tool that allows you to customize your own HyperCLOVA X models."

vectorstore = InMemoryVectorStore.from_texts(
    [text],
    embedding=embeddings,
)

# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()

# Retrieve the most similar text
retrieved_documents = retriever.invoke("What is CLOVA Studio?")

# show the retrieved document's content
retrieved_documents[0].page_content

## Direct Usage

Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.

You can directly call these methods to get embeddings for your own use cases.

### Embed single texts

You can embed single texts or documents with `embed_query`:

In [None]:
embeddings.embed_query("My query to look up")

### Embed multiple texts

You can embed multiple texts with `embed_documents`:

In [None]:
embeddings.embed_documents(
    ["This is a content of the document", "This is another document"]
)

### Embed with async

There are also async functionalities:


In [None]:
# async embed query
await embeddings.aembed_query("My query to look up")

In [None]:
# async embed documents
await embeddings.aembed_documents(
    ["This is a content of the document", "This is another document"]
)

## Additional functionalities

### Service App

When going live with production-level application using CLOVA Studio, you should apply for and use Service App. (See [here](https://guide.ncloud-docs.com/docs/en/clovastudio-playground01#서비스앱신청).)

For a Service App, corresponding `NCP_CLOVASTUDIO_API_KEY` and `NCP_CLOVASTUDIO_APP_ID` are issued and can only be called with the API Keys.

In [None]:
#### Update environment variables

os.environ["NCP_CLOVASTUDIO_API_KEY"] = getpass.getpass("NCP CLOVA Studio API Key for Service App: ")
os.environ["NCP_CLOVASTUDIO_APP_ID"] = input("NCP CLOVA Studio Service App ID: ")

In [None]:
embeddings = ClovaXEmbeddings(service_app=True)

## API Reference

For detailed documentation on `ClovaXEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/latest/api_reference/community/embeddings/langchain_community.embeddings.naver.ClovaXEmbeddings.html).