Skip to content

Commit e198efb

Browse files
Lev Kokotovgitbook-bot
authored andcommitted
GITBOOK-67: Document storage
1 parent bb6752e commit e198efb

File tree

2 files changed

+93
-2
lines changed

2 files changed

+93
-2
lines changed

pgml-docs/docs/guides/SUMMARY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@
5252
* [Data Storage & Retrieval](data-storage-and-retrieval/README.md)
5353
* [Tabular data](data-storage-and-retrieval/tabular-data.md)
5454
* [Vectors](data-storage-and-retrieval/vectors.md)
55-
* [Partitioning](data-storage-and-retrieval/partitioning.md)
5655
* [Documents](data-storage-and-retrieval/documents.md)
56+
* [Partitioning](data-storage-and-retrieval/partitioning.md)
5757
* [Deploying PostgresML](deploying-postgresml/README.md)
5858
* [PostgresML Cloud](deploying-postgresml/postgresml-cloud/README.md)
5959
* [Plans](deploying-postgresml/postgresml-cloud/plans/README.md)
Lines changed: 92 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,94 @@
11
# Documents
22

3-
WIP: [https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset)
3+
Documents are a type of data that has no predefined schema. Typically stored as JSON, documents can be used to allow users of your application to store, retrieve and work with arbitrary data.
4+
5+
In Postgres, documents are normally stored in regular tables using the `JSONB` data type. `JSONB` supports compression, indexing and various JSON operators that make it useful and performant.
6+
7+
### Storage & retrieval
8+
9+
If you're used to document databases like Mongo or Couch, you can replicate the same format and API in Postgres with just a single table:
10+
11+
```sql
12+
CREATE TABLE documents (
13+
id BIGSERIAL PRIMARY KEY,
14+
document JSONB
15+
);
16+
```
17+
18+
#### Inserting a document
19+
20+
To insert a document into our table, you can just use a regular insert query:
21+
22+
```sql
23+
INSERT INTO documents (
24+
document
25+
) VALUES ('{"hello": "world", "values": [1, 2, 3, 4]}')
26+
RETURNING id;
27+
```
28+
29+
This query will insert the document `{"hello": "world"}` and return its ID to the application. You can then pass this ID to your users or store it elsewhere for reference.
30+
31+
#### Fetching by ID
32+
33+
To get a document by it's ID, you can just select it from the same table, for example:
34+
35+
```sql
36+
SELECT document FROM documents WHERE id = 1;
37+
```
38+
39+
The `id` column is a primary key, which gives it an index automatically. Any fetch by ID will be very quick and can easily retrieve documents from a table storing millions and even billions of documents.
40+
41+
#### Fetching by value
42+
43+
`JSONB` supports many operators to access the data stored in the column:
44+
45+
| Operator | Description | Example |
46+
| -------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
47+
| `->` | Get the value referenced by the key and return it as JSON. | `document->'hello'` will return `"world"` which is a valid JSON string. |
48+
| `->>` | Get the value referenced by the key and return it as text. | `document->>'hello'` will return `world` which is a PostgreSQL `VARCHAR`. |
49+
| `#>` | Get a nested value by the key and return it as JSON. | `document #> {values, 0}` will get the first value in the `values` array and return it as JSON. |
50+
| `#>>` | Get a nested value by the key and return it as text. | `document #>> {values, 0}` will get the first value in the `values` array and return it as PostgreSQL `VARCHAR`. |
51+
| `@>` | Checks if the document contains a key/value match. | `document @> {"hello": "world"}` will return true if the `document` has a key `hello` and a value `world`. |
52+
53+
For example, if we want to fetch all documents that have a key `hello` and the value of that key `world`, we can do so:
54+
55+
```sql
56+
SELECT
57+
id,
58+
document->>'values'
59+
FROM documents
60+
WHERE
61+
document @> '{"hello": "world"}';
62+
```
63+
64+
or if we wanted to fetch the first value inside an array stored in a `values` key, we can:
65+
66+
```sql
67+
SELECT
68+
document #>> '{values, 0}'
69+
FROM documents
70+
WHERE
71+
document @> '{"hello": "world"}';
72+
```
73+
74+
`JSONB` handles empty, null, or non-existent keys and values without any errors. If the key doesn't exist, a `null` will be returned, just like if we were to access the JSON object from JavaScript.
75+
76+
### Indexing documents
77+
78+
Most key/value databases expect its users to only use primary keys for retrieval. In the real world, things are not always that easy. Postgres makes very few assumptions about how its users interact with JSON data, and allows indexing its top level data structure for fast access:
79+
80+
```sql
81+
CREATE INDEX ON documents USING gin(document jsonb_path_ops);
82+
```
83+
84+
When searching the documents for matches, Postgres will now use a much faster GIN index and give us results quickly:
85+
86+
```sql
87+
SELECT
88+
*
89+
FROM
90+
documents
91+
WHERE document @> '{"hello": "world"}';
92+
```
93+
94+
We're using the `@>` operator which checks if the `document` column top level JSON structure contains a key `hello` with the value `world`.

0 commit comments

Comments
 (0)