Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.0.11 #116

Closed
17 tasks done
henomis opened this issue Aug 31, 2023 · 4 comments
Closed
17 tasks done

v0.0.11 #116

henomis opened this issue Aug 31, 2023 · 4 comments
Assignees

Comments

@henomis
Copy link
Owner

henomis commented Aug 31, 2023

@henomis
Copy link
Owner Author

henomis commented Sep 12, 2023

SimpleVectorIndex refactor

The internal structure should be like this:

type data struct {
	ID       string     `json:"id"`
	Metadata types.Meta `json:"metadata"`
	Embedding embedder.Embedding `json:"embedding"`
}

@henomis
Copy link
Owner Author

henomis commented Sep 19, 2023

Redis integration

create index

"FT.CREATE" "idx" "SCHEMA" "item_keyword_vector" "VECTOR" "FLAT" "10" "TYPE" "FLOAT32" "DIM" "768" "DISTANCE_METRIC" "COSINE" "INITIAL_CAP" "1000" "BLOCK_SIZE" "1000" "product_type" "TAG" "SEPARATOR" "," "item_name" "TEXT" "WEIGHT" "1.0" "item_keywords" "TEXT" "WEIGHT" "1.0" "country" "TAG" "SEPARATOR" ","

insert vector with medatada

"HSET" "product:pippo:pippo" "primary_key" "pippo" "product_type" "bike" "item_name" "bike" "item_keywords" "bike" "country" "Italy" "item_keyword_vector" "\xed$C?\xc9\xce\xf0>\x1a\xf1\xea>?\x81H?T\x95\xea>\\\x06\x1c?\t!p?\x11\x18\xdb>\xc2\xd1\x96>\x0e\x8di<\xdcn\x0f?\x9b\xaa\xb8<\x85=\x11?\xef\xf4\xc9>\x81R\xca=:1\xce;\x0e\xa8\xae>\xba\xdf\x86>B\x80,?\xd0\x81h>\x19\x88\x80>\x01\x1c\x01?\xe8\x12\xa9=\xa3\x00\x0b>Xo\xc5>5\xdd.?\x02g\xf7>\xd3\x01f>\x02\xd4b>\x91\x16\x1e>\xb2)\x12>eHN?\xf0\xa1(?H\xe4\xca>\x91i\x1f?\x9d\xfe\x8e>\xc2\x0b\xaf>\x95zQ?\x9f\xd4R?&\xe3\a?\xd3\x00`?\x00{5?HK+?7=\x95>F\x9c\xe6>\x9c\xa6m>\xbd\x88\xa9>\xe2\xbeY?&\xceR>8uj>'7x?\x95\x98\xf6>\x03\xe6\xba=\x8a*j?\xcf\xbd'?\xc1\x89\n?\x14\x91P?\xe0\xb8g?d\x164?\xa6b\xb3>\xc8\x1e\x97=['\x01?\xe1\xef3?\x9d\x85\\=X\x0cD=\xb1\xd6\x8e>\xa8\x1f3?*J\x1b?\xf9\xb5\x8b=)\x93\xbc>`\x85d?\xc9\x93n>\xf1\a\x8d>\xd1r\xd0>\x9c\\U?/U%?XVu=\xf2\x15 ?U\xd0b?\x00\x00\x14?\x19/\xa0>!\xaeS?\x92\x98O=nX%?L\x18T?\x02n\xac>y\xf8\xf5>\x1e\x10Y>*{\a=wLP?\x16\"\x9c=\xc7~\xfb=3\x1d\xf7>R\x85\xcd>k\x1d\xca>s\x88\xd3=\x83w\xf0>\xc0Y\x19?{\x9b\xee>\xe8\x05r?1jK?\xd2.1?\x0e\xdc5?\x9c\xc8.?\xd4\x99J?\xc6\xf1\x03?\xb4\xf5(>k\x94W?\xeb\x84)?!\x84q?dA\xc9=\xd8\x13\xd4>\xb9\x84H?ym\xde>\xea\xcf`?\x1a-*?\"h\xd7>\xe76n=;{/?\x13/\xf7;\t\xfc.?\xe1F!?w_\r?-\xe4U>dt\xe6>JZ\x12>*\xb3\x0b?Hq(?1~r?\x80A\xdf>UQ\x16?\x8f\"s?f\xdd]>\xb3gY?\xb6\xa6#>\xfc\xcb\x1d?$\xd2e?\x06\xf6\x02>\x91H\xf6>\x94\x90\xd0>O\xbcK?\xff\xeb1?\xa2K>?\x99\xee\xa1=\x1e\x7f??\x0c\x13\xcf>\xa9E\x82>\x91\xe6\x84=%\xb4\n?\xd8=^?\xa3PJ?\xfc\xdb|?\x8eVY>T^\x84>\x86\x1d\x05?&\xf9U>\x18#~?\x1b\xb5\x05?\x84x<>\xe8\xb8W?\xf12\xee>@sz=\x9d$\x0c>\xc3\xdej?\x99L\x7f>\x9e\xc4\x18?1\xa9F?\xa7\x9c\xbe>\x06\r\xcd=)=\xce=b3\xe1>1*\xee>8O\x89>\xf7\x0b@?\xf7\xd9R?N\x1c0>\x96\xb3\xd5>=n\xb1>\xfa\xde\t?7\x90s?\xed\x18R=\xf6f#?\xb9\xcfe?\xe9\x9a3?\xce\xbbe?!\x9e\xa3>\xec\x9c\x11>\x96[u?\xe7\x14u>\x8c\x86K>A\x85N>\x90\xd0\xa4>\x93\x17\x12?\xdcrg?}\xfb\a??m\x0c?=\xe0R?\x94\x11i?\xc6\x8e\x98>\xa2\x14\xf9>_\xa6\xf2>\xcf\x0e\xf3>`\xe5_>\xa5\x0bp?>\x9c\x98>\xff\xe2\x8f>\t\x0bO>x5\x03?'\xef\xd4>\x80\x14K?\x84\x89\xd6=\x86\xd1.?\xb8\xcf\xa9=\x81^e?l\xe8\xd2>}\x84\xb9>\x91x[=7\xe4+?\xddb\x85>\xa1\x8c\b>\xcf\xcb\xc5<\xf2\xcb\x04?\x9e\xf0\x9c>\x88W^?\x03jv?\xb7\x0b\xc1=\xe2\x03m?\xca\x1d>?\xea\x01\r?BO\x85>\x91\x15\x9e>\xf7\xa3\xf1>\x15\x9dJ?3\xed\t?\xee\x19\x9f>?@\xd9<\xa5\x98\x13>\xed\xb9\x05?\xd6e\xc6=y\x13\xc7>O\x87\xce>\x01\xb7\xc8>W\x99/?\xfc\xa0\x9e>\xff]\xeb>\xce\x92=>i\xb6~>\x1b\xef\xe6;\x8b\xa8\xb0>\xd6Q\x1b?\xa13P?|\xf5/>QrT>\xbf\x18\x06=\x01\xf3 ?\x8433?`@y?\xcd\xb6\xcb>\x03.O>\xe8\x12p>\xcdZz?\xee\xdb~?'76?\x9foC?r\xd0\x1a?\xe8\x8f)?\xa8\xfd\x94>\xf1\xe3D?Y\xc0\x02?\xd4WX?\x82v ?\xd3\x83\x16=\xe9\xb8!>\xee%3?l\xc6\x85>\x17\xeb\xf0=)f\x1e?\x0b~\x17?J\xe9\x9e=.V%?\xfd\xbb\xfc>\x98\xa4\xc4>\xc3\xf0\x1b?\x96\xaa\xa5>\x19E\xfb>\xf3\x95G?\xf8[\xd7>\x97\xdf\x82>B\xd8\x8d>\x86\xac\xa3>g\xe7\x9d>\x18\xbeo?;T\xbc>\x0f\x9af?\xee{\xbf>\x94h\xca>\xad\x1eP?Co\xb0>{\xae-?,\xb6\xd1>l\x94\xed>u>\xc0>\xdd\xb6n?[\xb2}>\xc9\x9au?-\x7f]?JE.?\xa2\x16\r>HSk?\xca\x9an?\xaf\xd3i?\xae\xcdD?x\x0c\x10?R\xfd\"?Y\xd0\xd3>\x18]1?\xf2\xa4\x96>JK\xcb=\x82\x87$?\xebgZ?Okr?\x8b\x16'?\x94<@>\xab\xfe%?\xce+\x02?\xb9\xf0\x9c=\x17x=>yP|?\xfe\xb8\x0c?\xe9+\xb0>\xf0E\x15?-\x8b\x1d?\x89&h?\x919\x06>c9\x12>\r\xf0J?\x19\x9fm?\xfbR\xd3=r\x84>?A\xeeL?\x9f\xc94?\xd9\xe3V?e^E?r\xc8\xe8>o\xcbb?X\xf8\x83>\xe8\xcbZ?8\x06e>\xa0\x9e\xe8>\x99\x0e\xcd>\xaf\x00@=\xb9\xbbc?C\a\x1d>R\\9?6\xad\\?\x1e\x13\xbe>8\xabW?\x90q\x0b?\xd1\x04B?\x98\"\xd0>\xcc\xf4\xb1=\x06\xccE?U\xea\x04?\x8f\xdb\x9e=\\k\x02?\xb9\xe8\xcb>\xb8s\xc7>\xf5\x83\xed>\xa3\x01\xb5<\x06mt?<@\xb2<*E\xe6>Yc\xe1>W\x994>\tG\x0f?6\x9ev?\x86l\x03?\xf6\xbc\x1e?3M,>\xb2\xd1\x8f>\xf8\xa1\a?\xd5\xcbm>\xd6Wz?\xaf\xbc/?\xa6%\n?\xa3Sw?\xc8$\x03<\x04\x9f\x1d?\xc9\xdb\xb5>:\x90\x0e>\xd3\xce\xf8>,cV?\x1a\xf3\x1e?\x00\x1c\xb8>\xe2\xba\xb9>\xd4Oj>\xcd\xebq?b\x8eB?\xed4\x1c?\xa3\xe3X?6\xd2\x9f>\xf6z\x89>\x95\xb6\xa2>(,Y?n\xe2\xec<\xf8\x90M?\x91\x04P?\x8fR\x9c=\xff\x8ba?7\xeeb?F\x02&?Q\x9dS>\xcd\xefb?\xab\x80\x04?\x89\x91\xdd>\xfe\xa7\xf2=\x15\x0e[?)GG?pB\x12?W\x0b2?+\xc1w>\x99\xa8\xd7>\x88hn?U\xfa~?\x16\x93\xad=\x94,\x00>z$)=\x01\xb3\b?]7\x02?\x8d\x84\x8c<e\xcb^?\xc1\xc4_?.\xc6$?\xe3\x8d >'/\x18?\x13\x93,>\x0e`\t?%\xf9/?\x12\x80\x1b?G\x82\xe7>\x15\xba\x0b?|h\x0e>!o\x02?\n\xefY?\x9c\x0b\n>GUz=\xe5=n?\xda\xa0a>\xe0%7?M\xbf\xf7>\x9a\xefA?\xfe\xfbX?~\xf9\x0b?z\r\xf6>\xe1qm?\t\xce\xbe>\x95\xa2\x13?\xe0\xb1V?e\x9cH>\xcd\xe1\xe9>\xd0\xbcZ?\\\xad\x17>\x9c\xeb\xe0>\xed\x8fm?}D8?)\xf9\xea>\xce\xff\x97;@_\x12?\xf3\xf7>?qrj=\x0b\xb2\xc2>/%\r?\xfe}\x17>\xf5\x9d\x15?0\xf8\xdf>\xfc\r)?\x85\x89K?\xb1</>le\xde>\xfb_\x86>\xc2\x0b\x05?/\xcd\x1b?\xd3\x02\xc2>\xad\xd8\xa0>\xc5\xc2\x81<\xdbW\xf7>\xcbH\xb5>\xbfs@>\xee\x1do?\xe0h/?s\xf6\xc3>\xcd\xcf\x90>{\x8a\x19>E\xe2\x0e>\x154\xab>YXV?\x8b\xcb\xac>;'\x13?\xb2\xb5\xe3>\x99J\xfa>\xd9\x806>{ob?Y\xd6\xdf=JZ\x81>[\tE?\xc4\xe2\xa7>g\x90\x8a=\x95\xa1\x1f>\x9b\x82\xe2>,\n8>\xa7\xfcr?x\xe5>>\xd7\x9b'?\x16nQ?)P\xcc>\x9e@\xe3>4\xa5\x13>\xc4\x8a\x1e>6FF?\xe6\xd4[?\x88Vw;\xc2G\b?\xb2\xa4H=\n\xbc\xd2>\xcd%q?\x85F\x1e?\xe8u\xb3>\x1d\xeaL>\xa6\x96\x98>\x97\xdd&?\xae!+?\xe9\x99\\?\x16\x8a\x18?qNW?I\x1b\xa9>\xf5&C=:\xc3\x02?U\x9cW?c\xb1$?\xc7w9?\xdcA\x9f>\xd5\x11\x1e>\x97\xb4y?a\x9c\"?;\xcb)?\xd8\xc5\xf6>\xe8\xe8\xf3>7\x1a^?\xb4N6=\xf4Qa?o\x82\xcb>&y%?\xdfI3?Q3\x04>^\x9d\xf0>\xbeF\x00?Z\xcd\x06?T\xbc]? o\x0e?\xe7F\xe5>\"p1>\x88y\xad>\xa9\xc4O?\xe3\x0c\xcb>\x12V\x0e?\xfdu??>\xd2:>,\x80F?\xaeUa?\xcdSL?D@S>m\x9c >\xe9\x80\xf4>\x10\x80n>\x15\x7fE?4\x90@?\xc5\xc5.?\xfa\"D?4Ed>\xe1\x0f\xa0=\x8a}\x1e>\x98\xdeZ?t\xd0O?\xd8A6?\xe4\x9e[=\xe1Z(??&=>\xd9\x00E=SD,>W\x87c?\x1a\x17\x04?\"5\xc7>)\tM?~\xd5$?\x11Y\xfe=w\xd7\x10>\x99<\xd9=\x86r\xee>D\xa1\xc2<\xbb\xe6\x86>|B\x8f>\x13\xedB?\xb4t\x96>\x8c3\xc0>\xfbx\xf2=\x9aX\xe9>`\x06\xdf>Lc7?\xb9\xad\x18?\x90\xe3q?\xcfs\xad>\xca>{?\x94\xd72?\x8f&\\?\xc9/9?\x98\xdd_>\a\x9a5?7\x87c?R\xdd\xf2=%6->\xc0z\xa9<\x8b\x0b;=\xdd\xc1\xd7>:\xeeu?jM*?X\xed\\?}Q0=\x84\xf1z?I\xef\xbf=+\x00\x8d>\xae\xe1\xc8<\xdd\xd5-?\xa16\xae=\xb7\xb8h?\x19n\xd3>\x97Gc?\xc0\xbf\xed>q\x8f\xaf>W\x00!<E?\x9b>\xef\xba\xd5>\xd1\xd8\xcc=J\xa7\xc3>\xc0\xf97<\xbb\x06\x83=h\x8fP>\xce\a\xe0=g&\xa0>)\x93T?@V*?\xc6\xc1I?\xce\xe4\xb7>c&N?<\xfa\x0b>\b\x11\x1b?m\xadD>\x9c\xdeC?N\xa4\xa7>j`\xcc>D\xf4:?f{\x13?\x0f\xdb$?|\xc8m?b\xccP?z\a\x1d?\x0cJp?'\x97\xc0>\bd\x7f?\xce\x1b\x8b=w\xa2E?gO\x0e?0\xeb\xae>\x84K\x89>{Yg?\xff~/=\xd5\a\x8e=r\xaay?\xfa\xcd\x1d?\x12W]?\xea\xa5\x15?*{\x1a?\x83c\xe8>b\x84\x0e?a\xf4\x1b?\x1c\xcb\x1d?d\x82t=\xff\xba\x85>\xa2\xce\xca>\xe6\b\x16?\x1b\xc0&?\x8e\x7f\xe9>\xb8\x96N=\xc1[B?\xd3b\x14>.\x810?\x13\x83S?Mri?W\xc8]?\x9e\x9d_=}OT=\xe2]6>C\xb8\x98=\xf8\x13\x99>y\xed,?\x94\xbb\x15?\xae\xb7\xb0> \xf0S?V\xcdG?\x16K\xf9>\x8f\xf7X>\x11v\r>N\x861?\xe5\xca\x94>KcO?\xf4\xb2 ?\xdd\x0b\x7f?,F\x97=\xb7)\xfe>\x1e\xa4H?\x8e\xd6V?\x0e\xbfG?\x8a\xa2p?oF7?1H\xec>\xc3\x1a:>J\xe55=\x98\x0eA?%t7?\xe8\x97\xe2=$\x15V?\x80\x9c\x0f??0%?\x93\x83]>Z\xb8j>\xb8\x0c\xf6>D\xb0m?M~L?\x8a\xc9\xd5=\xaf\xa5\xa2>\x0eEp?\x96O'>\xbd\x177?0R\xec>4\xeb\xcf>\x81\x15\xfb>\xba\x82\x11?,\x1d\xfb>\xab\x17\xe9>\xe2Q\x1a?o\x8aO?<\xa4K?\x89\xe1=?\x01\x96\x80>\x11*\x05?\xcaY^?\"z\x8b=\xafe\x1f?\x1a;k?~_\xc8>\xcf\xd9\xe8>\xd9J\x14>\xcc\xdf&?\x87\xe7~?\xaa\xa2\x05?\xf7}Y?\xd1\xd4\xa2>\xa9z\x9f>\xc7\xeeW?" 

search with vector

"FT.SEARCH" "idx" "*=>[KNN 1 @item_keyword_vector $vec_param AS vector_score]" "RETURN" "3" "vector_score" "item_name" "item_keywords" "SORTBY" "vector_score" "ASC" "DIALECT" "2" "LIMIT" "0" "1" "params" "2" "vec_param" "\xcd\xcc\xcc=\xcd\xccL>\x9a\x99\x99>\xcd\xcc\xcc>"

Go string vector encoding

a := []float32{0.1,0.2,0.3}
e := fmt.Sprintf("%q",a)

@henomis
Copy link
Owner Author

henomis commented Sep 19, 2023

import json
import time

import numpy as np
import pandas as pd
import redis
import requests
from redis.commands.search.field import (
    NumericField,
    TagField,
    TextField,
    VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
# from sentence_transformers import SentenceTransformer

url = "https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started/main/data/bikes.json"
response = requests.get(url)
bikes = response.json()


client = redis.Redis(host="localhost", port=6379, decode_responses=True)

res = client.ping()
# >>> True

client.flushall()

pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
    redis_key = f"bikes:{i:03}"
    pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]

res = client.json().get("bikes:001", "$.model")
# >>> ['Summit']

keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']

descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
# embedder = SentenceTransformer("msmarco-distilbert-base-v4")
# embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
embeddings = np.random.rand(len(bikes), 4).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768

pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
    pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]

res = client.json().get("bikes:0")


# >>>
# {
#   "model": "Summit",
#   "brand": "nHill",
#   "price": 1200,
#   "type": "Mountain Bike",
#   "specs": {
#     "material": "alloy",
#     "weight": "11.3"
#   },
#   "description": "This budget mountain bike from nHill performs well..."
#   "description_embeddings": [
#     -0.538114607334137,
#     -0.49465855956077576,
#     -0.025176964700222015,
#     ...
#   ]
# }

schema = (
    TextField("$.model", no_stem=True, as_name="model"),
    TextField("$.brand", no_stem=True, as_name="brand"),
    NumericField("$.price", as_name="price"),
    TagField("$.type", as_name="type"),
    TextField("$.description", as_name="description"),
    VectorField(
        "$.description_embeddings",
        "FLAT",
        {
            "TYPE": "FLOAT32",
            "DIM": VECTOR_DIMENSION,
            "DISTANCE_METRIC": "COSINE",
        },
        as_name="vector",
    ),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(
    fields=schema, definition=definition
)
# >>> 'OK'


query = (
    Query('(*)=>[KNN 3 @vector $query_vector AS vector_score]')
    .sort_by('vector_score')
    .return_fields('vector_score', 'id', 'brand', 'model', 'description')
    .dialect(2)
)

encoded_query = [0.1, 0.2, 0.3, 0.4]

docs = client.ft("idx:bikes_vss").search(query, {'query_vector': np.array(
    encoded_query, dtype=np.float32).tobytes()}).docs

print(docs)

@henomis
Copy link
Owner Author

henomis commented Sep 20, 2023

Metrics

Here are the typical ranges for the following mathematical functions:

  1. Cosine Similarity:

    • Range: [-1, 1]
    • Explanation: Cosine similarity measures the cosine of the angle between two vectors and can take values between -1 and 1.
    • A value of 1 indicates that the two vectors are identical and have the same direction.
    • A value of -1 indicates that the two vectors are diametrically opposed, pointing in opposite directions.
    • A value of 0 indicates that the two vectors are orthogonal, meaning they are perpendicular to each other.
  2. Dot Product:

    • Range: (-∞, +∞)
    • Explanation: The dot product is a scalar product of two vectors and can range from negative infinity to positive infinity. There are no constraints on its value, and it can be positive, negative, or zero.
    • The dot product is used in various mathematical operations and is not bounded within a specific range.
  3. Euclidean Distance:

    • Range: [0, +∞)
    • Explanation: Euclidean distance measures the straight-line distance between two points in Euclidean space. It is always a non-negative value.
    • When the two points are identical, the Euclidean distance is 0.
    • As the points move further apart, the distance increases and approaches positive infinity if there is no upper bound on the space.

Keep in mind that these are the general ranges for these mathematical functions, but their specific application and interpretation can vary depending on the context in which they are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant