Skip to content

Commit 0e69fc9

Browse files
author
Sam Partee
authored
Provider Documentation (#24)
New Jupyter Notebook for the provider documentation
1 parent 02e9470 commit 0e69fc9

File tree

5 files changed

+286
-5
lines changed

5 files changed

+286
-5
lines changed

README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,14 @@
33

44
[![Codecov](https://img.shields.io/codecov/c/github/RedisVentures/RedisVL/dev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https://codecov.io/gh/RedisVentures/RedisVL)
55
[![License](https://img.shields.io/badge/License-BSD-3--blue.svg)](https://opensource.org/licenses/mit/)
6-
6+
![Language](https://img.shields.io/github/languages/top/RedisVentures/RedisVL)
7+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
8+
![GitHub last commit](https://img.shields.io/github/last-commit/RedisVentures/RedisVL)
9+
![GitHub deployments](https://img.shields.io/github/deployments/RedisVentures/RedisVL/github-pages?label=doc%20build)
710

811
RedisVL provides a powerful Python client library for using Redis as a Vector Database. Leverage the speed and reliability of Redis along with vector-based semantic search capabilities to supercharge your application!
912

10-
**Note:** This project is rapidly evolving, and the API may change frequently. Always refer to the most recent [documentation](https://redisvl.com/docs).
13+
**Note:** This project is rapidly evolving, and the API may change frequently. Always refer to the most recent [documentation](https://www.redisvl.com).
1114
## 🚀 What is RedisVL?
1215

1316
Vector databases have become increasingly popular in recent years due to their ability to store and retrieve vectors efficiently. However, most vector databases are complex to use and require a lot of time and effort to set up. RedisVL aims to solve this problem by providing a simple and intuitive interface for using Redis as a vector database.
@@ -32,7 +35,7 @@ RedisVL has a host of powerful features designed to streamline your vector datab
3235

3336
Please note that this library is still under heavy development, and while you can quickly try RedisVL and deploy it in a production environment, the API may be subject to change at any time.
3437

35-
`pip install redisvl`
38+
`pip install redisvl` (Coming Soon)
3639

3740
## Example Usage
3841

docs/user_guide/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ hybrid_queries_02
2525
:caption: Providers
2626
:maxdepth: 3
2727
28-
embedding_creation
28+
providers_03
2929
```
3030

3131
```{toctree}

docs/user_guide/providers_03.ipynb

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
{
2+
"cells": [
3+
{
4+
"attachments": {},
5+
"cell_type": "markdown",
6+
"metadata": {},
7+
"source": [
8+
"# Embedding Providers\n",
9+
"\n",
10+
"In this notebook, we will show how to use RedisVL to create embeddings using the built-in Providers. Today RedisVL supports:\n",
11+
"1. OpenAI\n",
12+
"2. HuggingFace\n",
13+
"\n",
14+
"Before running this notebook, be sure to\n",
15+
"1. Have installed ``redisvl`` and have that environment active for this notebook.\n",
16+
"2. Have a running Redis instance with RediSearch > 2.4 running.\n"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 22,
22+
"metadata": {},
23+
"outputs": [],
24+
"source": [
25+
"# import necessary modules\n",
26+
"import os\n",
27+
"from redisvl.utils.utils import array_to_buffer"
28+
]
29+
},
30+
{
31+
"cell_type": "markdown",
32+
"metadata": {},
33+
"source": [
34+
"## Creating Embeddings\n",
35+
"\n",
36+
"This example will show how to create an embedding from 3 simple sentences with a number of different providers\n",
37+
"\n",
38+
"- \"That is a happy dog\"\n",
39+
"- \"That is a happy person\"\n",
40+
"- \"Today is a nice day\"\n"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"metadata": {},
46+
"source": [
47+
"### Huggingface\n",
48+
"\n",
49+
"Huggingface is a popular NLP library that has a number of pre-trained models. RedisVL supports using Huggingface to create embeddings from these models. To use Huggingface, you will need to install the ``sentence-transformers`` library.\n",
50+
"\n",
51+
"```bash\n",
52+
"pip install sentence-transformers\n",
53+
"```"
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": 32,
59+
"metadata": {},
60+
"outputs": [
61+
{
62+
"data": {
63+
"text/plain": [
64+
"[0.00037813105154782534,\n",
65+
" -0.05080341547727585,\n",
66+
" -0.03514720872044563,\n",
67+
" -0.023251093924045563,\n",
68+
" -0.04415826499462128,\n",
69+
" 0.020487893372774124,\n",
70+
" 0.0014619074063375592,\n",
71+
" 0.03126181662082672,\n",
72+
" 0.056051574647426605,\n",
73+
" 0.0188154224306345]"
74+
]
75+
},
76+
"execution_count": 32,
77+
"metadata": {},
78+
"output_type": "execute_result"
79+
}
80+
],
81+
"source": [
82+
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
83+
"from redisvl.providers import HuggingfaceProvider\n",
84+
"\n",
85+
"\n",
86+
"# create a provider\n",
87+
"hf = HuggingfaceProvider(model=\"sentence-transformers/all-mpnet-base-v2\")\n",
88+
"\n",
89+
"# embed a sentence\n",
90+
"test = hf.embed(\"This is a test sentence.\")\n",
91+
"test[:10]"
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": 24,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"# You can also create many embeddings at once\n",
101+
"\n",
102+
"sentences = [\n",
103+
" \"That is a happy dog\",\n",
104+
" \"That is a happy person\",\n",
105+
" \"Today is a sunny day\"\n",
106+
"]\n",
107+
"\n",
108+
"embeddings = hf.embed_many(sentences)\n"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"## Search with Provider Embeddings\n",
116+
"\n",
117+
"Now that we've created our embeddings, we can use them to search for similar sentences. We will use the same 3 sentences from above and search for similar sentences.\n",
118+
"\n",
119+
"First, we need to create the schema for our index.\n",
120+
"\n",
121+
"Here's what the schema for the example looks like in yaml for the HuggingFace Provider\n",
122+
"\n",
123+
"```yaml\n",
124+
"index:\n",
125+
" name: providers\n",
126+
" prefix: rvl\n",
127+
" storage_type: hash\n",
128+
"\n",
129+
"fields:\n",
130+
" text:\n",
131+
" - name: sentence\n",
132+
" vector:\n",
133+
" - name: embedding\n",
134+
" dims: 768\n",
135+
" algorithm: flat\n",
136+
" distance_metric: cosine\n",
137+
"```"
138+
]
139+
},
140+
{
141+
"cell_type": "code",
142+
"execution_count": 11,
143+
"metadata": {},
144+
"outputs": [],
145+
"source": [
146+
"from redisvl.index import SearchIndex\n",
147+
"\n",
148+
"# construct a search index from the schema\n",
149+
"index = SearchIndex.from_yaml(\"./schema.yaml\")\n",
150+
"\n",
151+
"# connect to local redis instance\n",
152+
"index.connect(\"redis://localhost:6379\")\n",
153+
"\n",
154+
"# create the index (no data yet)\n",
155+
"index.create(overwrite=True)"
156+
]
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": 12,
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"name": "stdout",
165+
"output_type": "stream",
166+
"text": [
167+
"\u001b[32m15:50:34\u001b[0m \u001b[35msam.partee-NW9MQX5Y74\u001b[0m \u001b[34mredisvl.cli.index[33382]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
168+
"\u001b[32m15:50:34\u001b[0m \u001b[35msam.partee-NW9MQX5Y74\u001b[0m \u001b[34mredisvl.cli.index[33382]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. providers\n"
169+
]
170+
}
171+
],
172+
"source": [
173+
"# use the CLI to see the created index\n",
174+
"!rvl index listall"
175+
]
176+
},
177+
{
178+
"cell_type": "code",
179+
"execution_count": 21,
180+
"metadata": {},
181+
"outputs": [],
182+
"source": [
183+
"# load expects an iterable of dictionaries where\n",
184+
"# the vector is stored as a bytes buffer\n",
185+
"\n",
186+
"data = [{\"text\": t,\n",
187+
" \"embedding\": array_to_buffer(v)}\n",
188+
" for t, v in zip(sentences, embeddings)]\n",
189+
"\n",
190+
"index.load(data)"
191+
]
192+
},
193+
{
194+
"cell_type": "code",
195+
"execution_count": 31,
196+
"metadata": {},
197+
"outputs": [
198+
{
199+
"name": "stdout",
200+
"output_type": "stream",
201+
"text": [
202+
"That is a happy dog\n",
203+
"0.160862445831\n",
204+
"That is a happy person\n",
205+
"0.273598074913\n",
206+
"Today is a sunny day\n",
207+
"0.744559526443\n"
208+
]
209+
}
210+
],
211+
"source": [
212+
"from redisvl.query import VectorQuery\n",
213+
"\n",
214+
"# use the HuggingFace Provider again to create a query embedding\n",
215+
"query_embedding = hf.embed(\"That is a happy cat\")\n",
216+
"\n",
217+
"query = VectorQuery(\n",
218+
" vector=query_embedding,\n",
219+
" vector_field_name=\"embedding\",\n",
220+
" return_fields=[\"text\"],\n",
221+
" num_results=3\n",
222+
")\n",
223+
"\n",
224+
"results = index.search(query.query, query_params=query.params)\n",
225+
"for doc in results.docs:\n",
226+
" print(doc.text)\n",
227+
" print(doc.vector_distance)"
228+
]
229+
},
230+
{
231+
"cell_type": "code",
232+
"execution_count": null,
233+
"metadata": {},
234+
"outputs": [],
235+
"source": []
236+
}
237+
],
238+
"metadata": {
239+
"kernelspec": {
240+
"display_name": "Python 3.8.13 ('redisvl2')",
241+
"language": "python",
242+
"name": "python3"
243+
},
244+
"language_info": {
245+
"codemirror_mode": {
246+
"name": "ipython",
247+
"version": 3
248+
},
249+
"file_extension": ".py",
250+
"mimetype": "text/x-python",
251+
"name": "python",
252+
"nbconvert_exporter": "python",
253+
"pygments_lexer": "ipython3",
254+
"version": "3.8.13"
255+
},
256+
"orig_nbformat": 4,
257+
"vscode": {
258+
"interpreter": {
259+
"hash": "9b1e6e9c2967143209c2f955cb869d1d3234f92dc4787f49f155f3abbdfb1316"
260+
}
261+
}
262+
},
263+
"nbformat": 4,
264+
"nbformat_minor": 2
265+
}

docs/user_guide/schema.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
2+
index:
3+
name: providers
4+
prefix: rvl
5+
storage_type: hash
6+
7+
fields:
8+
text:
9+
- name: sentence
10+
vector:
11+
- name: embedding
12+
dims: 768
13+
algorithm: flat
14+
distance_metric: cosine

redisvl/index.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,6 @@ def load(self, data: Iterable[Dict[str, Any]], **kwargs):
262262
raise TypeError("data must be an iterable of dictionaries")
263263

264264
for record in data:
265-
# TODO don't use colon if no prefix
266265
key = f"{self._prefix}:{self._get_key_field(record)}"
267266
self._redis_conn.hset(key, mapping=record) # type: ignore
268267

0 commit comments

Comments
 (0)