Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an example of setting up local embedding model #322

Merged
merged 7 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions docs/sphinx_doc/en/source/tutorial/210-rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,109 @@ RAG agent is an agent that can generate answers based on the retrieved knowledge
Your agent will be equipped with a list of knowledge according to the `knowledge_id_list`.
You can decide how to use the retrieved content and even update and refresh the index in your agent's `reply` function.

## (Optional) Setting up a local embedding model service

For those who are interested in setting up a local embedding service, we provide the following example based on the
`sentence_transformers` package, which is a popular specialized package for embedding models (based on the `transformer` package and compatible with both HuggingFace and ModelScope models).
In this example, we will use one of the SOTA embedding models, `gte-Qwen2-7B-instruct`.

* Step 1: follow the instruction on [HuggingFace](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) or [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct ) to download the embedding model.
DavdGao marked this conversation as resolved.
Show resolved Hide resolved
* Step 2: Set up the server. The following code is for reference.

```python
import datetime
import argparse

from flask import Flask
from flask import request
from sentence_transformers import SentenceTransformer

def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str:
"""Get current timestamp."""
return datetime.datetime.now().strftime(format_)

app = Flask(__name__)

@app.route("/embedding/", methods=["POST"])
def get_embedding() -> dict:
"""Receive post request and return response"""
json = request.get_json()

inputs = json.pop("inputs")

global model

if isinstance(inputs, str):
inputs = [inputs]

embeddings = model.encode(inputs)

return {
"data": {
"completion_tokens": 0,
"messages": {},
"prompt_tokens": 0,
"response": {
"data": [
{
"embedding": emb.astype(float).tolist(),
}
for emb in embeddings
],
"created": "",
"id": create_timestamp(),
"model": "flask_model",
"object": "text_completion",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0,
},
},
"total_tokens": 0,
"username": "",
},
}

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_name_or_path", type=str, required=True)
parser.add_argument("--device", type=str, default="auto")
parser.add_argument("--port", type=int, default=8000)
args = parser.parse_args()

global model

print("setting up for embedding model....")
model = SentenceTransformer(
args.model_name_or_path
)

app.run(port=args.port)
```

* Step 3: start server.
```bash
python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct}
```


Testing whether the model is running successfully.
```python
from agentscope.models.post_model import PostAPIEmbeddingWrapper


model = PostAPIEmbeddingWrapper(
config_name="test_config",
api_url="http://127.0.0.1:8000/embedding/",
json_args={
"max_length": 4096,
"temperature": 0.5
}
)

print(model("testing"))
```

[[Back to the top]](#210-rag-en)

Expand Down
106 changes: 106 additions & 0 deletions docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,112 @@ RAG 智能体是可以基于检索到的知识生成答案的智能体。
**自己搭建 RAG 智能体.** 只要您的智能体配置具有`knowledge_id_list`,您就可以将一个agent和这个列表传递给`KnowledgeBank.equip`;这样该agent就是被装配`knowledge_id`。
您可以在`reply`函数中自己决定如何从`Knowledge`对象中提取和使用信息,甚至通过`Knowledge`修改知识库。


## (拓展) 架设自己的embedding model服务

我们在此也对架设本地embedding model感兴趣的用户提供以下的样例。
以下样例基于在embedding model范围中很受欢迎的`sentence_transformers` 包(基于`transformer` 而且兼容HuggingFace和ModelScope的模型)。
这个样例中,我们会使用当下最好的文本向量模型之一`gte-Qwen2-7B-instruct`。


* 第一步: 遵循在 [HuggingFace](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) 或者 [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct )的指示下载模型。
* 第二步: 设置服务器。以下是一段参考代码。

```python
import datetime
import argparse

from flask import Flask
from flask import request
from sentence_transformers import SentenceTransformer

def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str:
"""Get current timestamp."""
return datetime.datetime.now().strftime(format_)

app = Flask(__name__)

@app.route("/embedding/", methods=["POST"])
def get_embedding() -> dict:
"""Receive post request and return response"""
json = request.get_json()

inputs = json.pop("inputs")

global model

if isinstance(inputs, str):
inputs = [inputs]

embeddings = model.encode(inputs)

return {
"data": {
"completion_tokens": 0,
"messages": {},
"prompt_tokens": 0,
"response": {
"data": [
{
"embedding": emb.astype(float).tolist(),
}
for emb in embeddings
],
"created": "",
"id": create_timestamp(),
"model": "flask_model",
"object": "text_completion",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 0,
"total_tokens": 0,
},
},
"total_tokens": 0,
"username": "",
},
}

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_name_or_path", type=str, required=True)
parser.add_argument("--device", type=str, default="auto")
parser.add_argument("--port", type=int, default=8000)
args = parser.parse_args()

global model

print("setting up for embedding model....")
model = SentenceTransformer(
args.model_name_or_path
)

app.run(port=args.port)
```

* 第三部:启动服务器。
```bash
python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct}
```


测试服务是否成功启动。
```python
from agentscope.models.post_model import PostAPIEmbeddingWrapper


model = PostAPIEmbeddingWrapper(
config_name="test_config",
api_url="http://127.0.0.1:8000/embedding/",
json_args={
"max_length": 4096,
"temperature": 0.5
}
)

print(model("testing"))
```

[[回到顶部]](#210-rag-zh)


Expand Down