Skip to content

Commit

Permalink
2nd changes review
Browse files Browse the repository at this point in the history
  • Loading branch information
vga91 committed May 24, 2024
1 parent dd5ed21 commit b9a9e63
Show file tree
Hide file tree
Showing 22 changed files with 1,107 additions and 514 deletions.
Original file line number Diff line number Diff line change
@@ -1,27 +1,41 @@

== Chroma

The list and the signature procedures are consistent with the Qdrant ones:
== ChromaDB

Here is a list of all available ChromaDB procedures,
note that the list and the signature procedures are consistent with the others, like the Qdrant ones:

[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/api/v1/collections`.
| apoc.vectordb.chroma.deleteCollection(hostOrKey, collection, $config) |
Deletes a collection with the name specified in the 2nd parameter
Deletes a collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>`.
| apoc.vectordb.chroma.upsert(hostOrKey, collection, vectors, $config) |
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/upsert`.
| apoc.vectordb.chroma.delete(hostOrKey, collection, ids, $config) |
Delete the vectors with the specified `ids`.
Deletes the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/delete`.
| apoc.vectordb.chroma.get(hostOrKey, collection, ids, $config) |
Get the vectors with the specified `ids`.
Gets the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`.
| apoc.vectordb.chroma.query(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter.
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`.
| apoc.vectordb.chroma.getAndUpdate(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified `ids`, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`.
| apoc.vectordb.chroma.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`.
| apoc.vectordb.chroma.info(keyConfig) | Given the `keyConfig` returns the current configuration, created with the `apoc.vectordb.configure('CHROMA', keyConfig, ...)`
|===

where the 1st parameter can be a key defined by the apoc config `apoc.chroma.<key>.host=myHost`.
With hostOrKey=null, the default is 'http://localhost:8000'.

=== Examples

Expand Down Expand Up @@ -109,29 +123,29 @@ CALL apoc.vectordb.chroma.query($host,

[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.qdrant.query and the `apoc.vectordb.qdrant.get` procedures.
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====


In the same way as other procedures, we can define a mapping, to auto-create one/multiple nodes and relationships,
In the same way as other procedures, we can define a mapping, to fetch the associated nodes and relationships and optionally create them,
by leveraging the vector metadata. For example:

.Query vectors
[source,cypher]
----
CALL apoc.vectordb.chrome.query($host, '<collection_id>',
CALL apoc.vectordb.chroma.query($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingProp: "vect",
label: "Test",
prop: "myId",
id: "foo"
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
}), text
})
----


Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@

== Custom (i.e. other vector databases)

Here is a list of all available Qdrant procedures:
We can also interface with other db vectors that do not (yet) have dedicated procedures.
For example, with https://docs.pinecone.io/guides/getting-started/overview[Pinecone], as we will see later.

Here is a list of all available custom procedures:

[opts=header, cols="1, 3"]
|===
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ APOC provides these set of procedures, which leverages the Rest APIs, to interac
- `apoc.vectordb.qdrant.*` (to interact with https://qdrant.tech/documentation/overview/[Qdrant])
- `apoc.vectordb.chroma.*` (to interact with https://docs.trychroma.com/getting-started[Chroma])
- `apoc.vectordb.weaviate.*` (to interact with https://weaviate.io/developers/weaviate[Weaviate])
- `apoc.vectordb.custom.*` (to interact with other vector databases)
- `apoc.vectordb.store` (to store host, credentials and mapping into the system database)
- `apoc.vectordb.custom.*` (to interact with other vector databases).
- `apoc.vectordb.configure` (to store host, credentials and mapping into the system database)

All the procedures, except the `apoc.vectordb.store` one, can have, as a final parameter,
a configuration map with these possible parameters:
All the procedures, except the `apoc.vectordb.configure` one, can have, as a final parameter,
a configuration map with these optional parameters:

.config parameters

Expand All @@ -20,10 +20,10 @@ a configuration map with these possible parameters:
| headers | additional HTTP headers
| method | HTTP method
| endpoint | endpoint key,
can be used to override the default endpoint created via the 1st parameter of the `apoc.vectordb.qdrant.*` and `apoc.vectordb.qdrant.*`,
can be used to override the default endpoint created via the 1st parameter of the procedures,
to handle potential endpoint changes.
| body | body HTTP request
| jsonPath | To customize https://github.com/json-path/JsonPath[JSONPath] of the response. The default is `null`.
| jsonPath | To customize https://github.com/json-path/JsonPath[JSONPath] parsing of the response. The default is `null`.
|===


Expand All @@ -33,62 +33,105 @@ Besides the above config, the `apoc.vectordb.<type>.get` and the `apoc.vectordb.

|===
| key | description
| mapping | to auto-create entities. See examples below.
| mapping | to fetch the associated entities and optionally create them. See examples below.
| allResults | if true, returns the vector, metadata and text (if present), otherwise returns null values for those columns.
| vectorKey, metadataKey, scoreKey, textKey | used with the `apoc.vectordb.custom.get` procedure.
To let the procedure know which key in the restAPI (if present) corresponds to the one that should be populated as respectively the vector/metadata/score/text result.
Defaults are "vector", "metadata", "score", "text".
See examples below.
|===

include::./qdrand.adoc[]

include::./chroma.adoc[]
== Ad-hoc procedures

include::./weaviate.adoc[]
See the following pages for more details on specific vector db procedures

include::./custom.adoc[]
- xref:./qdrand.adoc[Qdrant]
- xref:./chroma.adoc[ChromaDB]
- xref:./weaviate.adoc[Weaviate]

== Store Vector db info (i.e. `apoc.vectordb.store`)

== Store Vector db info (i.e. `apoc.vectordb.configure`)

We can save some info in the System Database to be reused later, that is the host, login credentials, and mapping,
to be used in `*.get` and `.*query` procedures, except for the `apoc.vectordb.custom.get` one.

Therefore, to store the vector info, we can execute the `CALL apoc.vectordb.store(vectorName, host, credentialsValue, mapping)`,
Therefore, to store the vector info, we can execute the `CALL apoc.vectordb.configure(vectorName, keyConfig, databaseName, $configMap)`,
where `vectorName` can be "QDRANT", "CHROMA" or "WEAVIATE",
that indicates info to be reused respectively by `apoc.vectordb.qdrant.*`, `apoc.vectordb.chroma.*` and `apoc.vectordb.weaviate.*`.
Then `host` is the host base name, `credentialsValue` is the API key and `mapping` is a map that can be used instead of the homonym `embeddingConfig` parameter.

NOTE:: this procedure is only executable by a user with admin permissions
Then `keyConfig` is the configuration name, `databaseName` is the database where the config will be set,

and finally the `configMap`, that can have:

- `host` is the host base name
- `credentialsValue` is the API key
- `mapping` is a map that can be used by the `apoc.vectordb.\*.getAndUpdate` and `apoc.vectordb.*.queryAndUpdate` procedures

NOTE:: this procedure is only executable by a user with admin permissions and against the system database

For example:
[source,cypher]
----
CALL apoc.vectordb.store('QDRANT', 'custom-host-name', '<apiKey>',
{embeddingProp: "vect", label: "Test", prop: "myId", id: "foo"}
// -- within the system database or using the Cypher clause `USE SYSTEM ..` as a prefix
CALL apoc.vectordb.configure('QDRANT', 'qdrant-config-test', 'neo4j',
{
mapping: { embeddingKey: "vect", nodeLabel: "Test", entityKey: "myId", metadataKey: "foo" },
host: 'custom-host-name',
credentials: '<apiKey>'
}
)
----

and then we can execute e.g. the following procedure:
and then we can execute e.g. the following procedure (within the `neo4j` database):

[source,cypher]
----
CALL apoc.vectordb.qdrant.query(null, 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5)
CALL apoc.vectordb.qdrant.query('qdrant-config-test', 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5)
----

instead of:

[source,cypher]
----
CALL apoc.vectordb.qdrant.query(null, 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5,
CALL apoc.vectordb.qdrant.query($host, 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5,
{ mapping: {
embeddingProp: "vect",
label: "Test",
prop: "myId",
id: "foo"
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
},
headers: {Authorization: 'Bearer <apiKey>'},
endpoint: 'custom-host-name'
})
----

We can get the current configuration by executing the following procedure:

[source,cypher]
----
CALL apoc.vectordb.qdrant.info('qdrant-config-test')
----

.Example results
[opts="header"]
|===
| value
| {endpoint: '',
headers: {Authorization: 'Bearer <apiKey>'},
mapping: {embeddingKey: "vect", nodeLabel: "Test", entityKey: "myId", metadataKey: "foo"}
}
|===


which, in case of configuration key not found, just returns the baseUrl, for example:
[source,cypher]
----
CALL apoc.vectordb.qdrant.info('qdrant-config-test')
----
.Example results
[opts="header"]
|===
| value
| {endpoint: 'http://qdrant-config-test:6333'}
|===
Loading

0 comments on commit b9a9e63

Please sign in to comment.