Skip to content

Commit c382b37

Browse files
authored
V2 docs (#397)
1 parent 9eac347 commit c382b37

File tree

23 files changed

+535
-314
lines changed

23 files changed

+535
-314
lines changed

pgml-docs/docs/about/faq.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ Postgres is widely considered mission critical, and some of the most [reliable](
1010

1111
*How good are the models?*
1212

13-
Model quality is often a tradeoff between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several different algorithms to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production.
13+
Model quality is often a trade-off between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several different algorithms to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production.
1414

1515
PostgresML doesn't help with reformulating a business problem into a machine learning problem. Like most things in life, the ultimate in quality will be a concerted effort of experts working over time. PostgresML is intended to establish successful patterns for those experts to collaborate around while leveraging the expertise of open source and research communities.
1616

1717
*Is PostgresML fast?*
1818

19-
Colocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. We're working on [benchmarks](https://github.com/postgresml/postgresml/blob/master/pgml-extension/sql/benchmarks.sql).
19+
Collocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. We're working on [benchmarks](https://github.com/postgresml/postgresml/blob/master/pgml-extension/sql/benchmarks.sql).

pgml-docs/docs/about/roadmap.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Roadmap
1+
# Road map
22
This project is currently a proof of concept. Some important features, which we are currently thinking about or working on, are listed below.
33

44
## Production deployment

pgml-docs/docs/developer_guide/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## General
44

5-
[Use unix line endings](https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings).
5+
[Use Unix line endings](https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings).
66

77

88
## Setup your development environment
116 KB
Loading
115 KB
Loading

pgml-docs/docs/user_guides/dashboard/overview.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,38 @@
11
# Dashboard
22

3-
PostgresML comes with an app to provide visibility into models and datasets in your database. If you're running the standard docker container, you can view it running on [http://localhost:8000/](http://localhost:8000/). Since your `pgml` schema starts empty, there isn't much to see. If you'd like to generate some examples, you can run the test suite against your database.
3+
PostgresML comes with a web app to provide visibility into models and datasets in your database. If you're running [our Docker container](/user_guides/setup/quick_start_with_docker/), you can view it running on [http://localhost:8000/](http://localhost:8000/).
4+
45

56
## Generate example data
67

7-
The test suite for PostgresML is composed by running the sql files in the [examples directory](https://github.com/postgresml/postgresml/tree/master/pgml-extension/examples). You can use these examples to populate your local installation with some seed data. The test suite only operates on the `pgml` schema, and is otherwise isolated from the rest of the Postgres cluster.
8+
The test suite for PostgresML is composed by running the SQL files in the [examples directory](https://github.com/postgresml/postgresml/tree/master/pgml-extension/examples). You can use these examples to populate your local installation with some test data. The test suite only operates on the `pgml` schema, and is otherwise isolated from the rest of the PostgresML installation.
89

910
```bash
10-
$ psql -f pgml-extension/sql/test.sql -P pager postgres://postgres@127.0.0.1:5433/pgml_development
11+
psql -f pgml-extension/sql/test.sql \
12+
-P pager \
13+
postgres://postgres@127.0.0.1:5433/pgml_development
1114
```
1215

13-
## Overview
14-
Now there should be something to see in your local dashboard.
15-
1616
### Projects
17+
1718
Projects organize Models that are all striving toward the same task. They aren't much more than a name to group a collection of models. You can see the currently deployed model for each project indicated by :material-star:.
1819

1920
![Project](/images/dashboard/project.png)
2021

2122
### Models
22-
Models are the result of training an algorithm on a Snapshot of a dataset. They record `metrics` depending on their projects task, and are scored accordingly. Some models are the result of a hyperparameter search, and include additional analysis on the range of hyperparameters they are tested against.
23+
24+
Models are the result of training an algorithm on a snapshot of a dataset. They record metrics depending on their projects task, and are scored accordingly. Some models are the result of a hyperparameter search, and include additional analysis on the range of hyperparameters they are tested against.
2325

2426
![Model](/images/dashboard/model.png)
2527

2628
### Snapshots
27-
A Snapshot is created during training runs to record the data used for further analysis, or to train additional models against identical data.
29+
30+
A snapshot is created during training runs to record the data used for further analysis, or to train additional models against identical data.
2831

2932
![Snapshot](/images/dashboard/snapshot.png)
3033

3134
### Deployments
35+
3236
Every deployment is recorded to track models over time.
3337

3438
![Deployment](/images/dashboard/deployment.png)
Lines changed: 60 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,64 @@
11
# Deployments
22

3-
Models are automatically deployed if their key metric (__R__<sup>2</sup> for regression, __F__<sub>1</sub> for classification) is improved over the currently deployed version during training. If you want to manage deploys manually, you can always change which model is currently responsible for making predictions.
3+
A model is automatically deployed and used for predictions if its key metric (__R__<sup>2</sup> for regression, __F__<sub>1</sub> for classification) is improved during training over the previous version. Alternatively, if you want to manage deploys manually, you can always change which model is currently responsible for making predictions.
44

55

66
## API
77

8-
```sql linenums="1" title="pgml.deploy"
8+
```postgresql title="pgml.deploy()"
99
pgml.deploy(
10-
project_name TEXT, -- Human-friendly project name
11-
strategy pgml.strategy DEFAULT 'best_score', -- 'rollback', 'best_score', or 'most_recent'
12-
algorithm pgml.algorithm DEFAULT NULL -- filter candidates to a particular algorithm, NULL = all qualify
10+
project_name TEXT,
11+
strategy TEXT DEFAULT 'best_score',
12+
algorithm TEXT DEFAULT NULL
1313
)
1414
```
1515

16-
## Strategies
17-
There are 3 different deployment strategies available
16+
### Parameters
1817

19-
strategy | description
20-
--- | ---
21-
most_recent | The most recently trained model for this project
22-
best_score | The model that achieved the best key metric score
23-
rollback | The model that was previously deployed for this project
18+
| Parameter | Description | Example |
19+
|-----------|-------------|---------|
20+
| `project_name` | The name of the project used in `pgml.train()` and `pgml.predict()`. | `My First PostgresML Project` |
21+
| `strategy` | The deployment strategy to use for this deployment. | `rollback` |
22+
| `algorithm` | Restrict the deployment to a specific algorithm. Useful when training on multiple algorithms and hyperparameters at the same time. | `xgboost` |
2423

25-
The default deployment behavior allows any algorithm to qualify.
24+
25+
#### Strategies
26+
27+
There are 3 different deployment strategies available:
28+
29+
| Strategy | Description |
30+
|----------|-------------|
31+
| `most_recent` | The most recently trained model for this project is immediately deployed, regardless of metrics. |
32+
| `best_score` | The model that achieved the best key metric score is immediately deployed. |
33+
| `rollback` | The model that was last deployed for this project is immediately redeployed, overriding the currently deployed model. |
34+
35+
The default deployment behavior allows any algorithm to qualify. It's automatically used during training, but can be manually executed as well:
2636

2737
=== "SQL"
2838

29-
```sql linenums="1"
30-
SELECT * FROM pgml.deploy('Handwritten Digit Image Classifier', 'best_score');
39+
```postgresql
40+
SELECT * FROM pgml.deploy(
41+
'Handwritten Digit Image Classifier',
42+
strategy => 'best_score'
43+
);
3144
```
3245

3346
=== "Output"
3447

35-
```sql linenums="1"
36-
project_name | strategy | algorithm
37-
------------------------------------+----------------+----------------
38-
Handwritten Digit Image Classifier | classification | linear
48+
```
49+
project | strategy | algorithm
50+
------------------------------------+------------+-----------
51+
Handwritten Digit Image Classifier | best_score | xgboost
3952
(1 row)
4053
```
4154

42-
## Specific Algorithms
43-
Deployment candidates can be restricted to a specific algorithm by including the `algorithm` parameter.
55+
#### Specific Algorithms
56+
57+
Deployment candidates can be restricted to a specific algorithm by including the `algorithm` parameter. This is useful when you're training multiple algorithms using different hyperparameters and want to restrict the deployment a single algorithm only:
4458

4559
=== "SQL"
4660

47-
```sql linenums="1"
61+
```postgresql
4862
SELECT * FROM pgml.deploy(
4963
project_name => 'Handwritten Digit Image Classifier',
5064
strategy => 'best_score',
@@ -54,47 +68,49 @@ Deployment candidates can be restricted to a specific algorithm by including the
5468

5569
=== "Output"
5670

57-
```sql linenums="1"
71+
```
5872
project_name | strategy | algorithm
5973
------------------------------------+----------------+----------------
6074
Handwritten Digit Image Classifier | classification | svm
6175
(1 row)
6276
```
6377

6478

65-
## Rolling back to a specific algorithm
66-
Rolling back creates a new deployment for the model that was deployed before the current one. Multiple rollbacks in a row will effectively oscillate between the two most recently deployed models, making rollbacks a relatively safe operation.
79+
## Rolling Back
6780

68-
=== "SQL"
81+
In case the new model isn't performing well in production, it's easy to rollback to the previous version. A rollback creates a new deployment for the old model. Multiple rollbacks in a row will oscillate between the two most recently deployed models, making rollbacks a safe and reversible operation.
82+
83+
=== "Rollback 1"
6984

7085
```sql linenums="1"
71-
SELECT * FROM pgml.deploy('Handwritten Digit Image Classifier', 'rollback', 'svm');
86+
SELECT * FROM pgml.deploy(
87+
'Handwritten Digit Image Classifier',
88+
strategy => 'rollback'
89+
);
7290
```
7391

7492
=== "Output"
7593

76-
```sql linenums="1"
77-
project_name | strategy | algorithm
78-
------------------------------------+----------------+----------------
79-
Handwritten Digit Image Classifier | classification | svm
94+
```
95+
project | strategy | algorithm
96+
------------------------------------+----------+-----------
97+
Handwritten Digit Image Classifier | rollback | linear
8098
(1 row)
8199
```
82100

83-
## Manual Deploys
84-
85-
You can also manually deploy any previously trained model by inserting a new record into `pgml.deployments`. You will need to query the `pgml.projects` and `pgml.models` tables to find the desired IDs.
86-
87-
!!! note
88-
Deployed models are cached at the session level to improve prediction times. Manual deploys created this way will not invalidate those caches, so active sessions will not use manual deploys until they reconnect.
89-
90-
=== "SQL"
91-
92-
```sql linenums="1"
93-
INSERT INTO pgml.deploys (project_id, model_id, strategy,) VALUES (1, 1, 'rollback');
101+
=== "Rollback 2"
102+
```postgresql
103+
SELECT * FROM pgml.deploy(
104+
'Handwritten Digit Image Classifier',
105+
strategy => 'rollback'
106+
);
94107
```
95108

96109
=== "Output"
97110

98-
```sql linenums="1"
99-
INSERT 0 1
111+
```
112+
project | strategy | algorithm
113+
------------------------------------+----------+-----------
114+
Handwritten Digit Image Classifier | rollback | xgboost
115+
(1 row)
100116
```
Lines changed: 48 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,71 @@
1-
# Predictions
1+
# Making Predictions
22

3-
The predict function is the key value proposition of PostgresML. It provides online predictions using the actively deployed model for a project.
3+
The `pgml.predict()` function is the key value proposition of PostgresML. It provides online predictions using the best, automatically deployed model for a project.
44

55
## API
66

7-
```sql linenums="1" title="pgml.predict"
7+
The API for predictions is very simple and only requires two arguments: the project name and the features used for prediction.
8+
9+
```postgresql title="pgml.predict()"
810
pgml.predict (
9-
project_name TEXT, -- Human-friendly project name
10-
features DOUBLE PRECISION[] -- Must match the training data column order
11+
project_name TEXT,
12+
features REAL[]
1113
)
1214
```
1315

16+
### Parameters
17+
18+
| Parameter | Description | Example |
19+
|-----------|-------------|---------|
20+
| `project_name`| The project name used to train models in `pgml.train()`. | `My First PostgresML Project` |
21+
| `features` | The feature vector used to predict a novel data point. | `ARRAY[0.1, 0.45, 1.0]` |
22+
1423
!!! example
15-
Once a model has been trained for a project, making predictions is as simple as:
1624

17-
```sql linenums="1"
25+
```postgresql
1826
SELECT pgml.predict(
19-
'Human-friendly project name',
20-
ARRAY[...]
21-
) AS prediction_score;
27+
'My Classification Project',
28+
ARRAY[0.1, 2.0, 5.0]
29+
) AS prediction;
2230
```
2331

24-
where `ARRAY[...]` is the same list of features for a sample used in training. This score can be used in normal queries, for example:
32+
where `ARRAY[0.1, 2.0, 5.0]` is the same type of features used in training, in the same order as in the training data table or view. This score can be used in other regular queries.
2533

2634
!!! example
27-
```sql linenums="1"
35+
```postgresql
2836
SELECT *,
2937
pgml.predict(
30-
'Probability of buying our products',
31-
ARRAY[user.location, NOW() - user.created_at, user.total_purchases_in_dollars]
32-
) AS likely_to_buy_score
38+
'Buy it Again',
39+
ARRAY[
40+
user.location_id,
41+
NOW() - user.created_at,
42+
user.total_purchases_in_dollars
43+
]
44+
) AS buying_score
3345
FROM users
34-
WHERE comapany_id = 5
35-
ORDER BY likely_to_buy_score
46+
WHERE tenant_id = 5
47+
ORDER BY buying_score
3648
LIMIT 25;
3749
```
3850

3951

40-
## Making Predictions
52+
### Example
4153

42-
If you've already been through the [training guide](/user_guides/training/overview/), you can see the results of those efforts:
54+
If you've already been through the [Training Overview](/user_guides/training/overview/), you can see the results of those efforts:
4355

4456
=== "SQL"
4557

46-
```sql linenums="1"
47-
SELECT target, pgml.predict('Handwritten Digit Image Classifier', image) AS prediction
58+
```postgresql
59+
SELECT
60+
target,
61+
pgml.predict('Handwritten Digit Image Classifier', image) AS prediction
4862
FROM pgml.digits
4963
LIMIT 10;
5064
```
5165

5266
=== "Output"
5367

54-
```sql linenums="1"
68+
```
5569
target | prediction
5670
--------+------------
5771
0 | 0
@@ -67,20 +81,25 @@ If you've already been through the [training guide](/user_guides/training/overvi
6781
(10 rows)
6882
```
6983

70-
## Checking the deployed algorithm
71-
If you're ever curious about which deployed models will be used to make predictions, you can see them in the `pgml.deployed_models` VIEW.
84+
## Active Model
85+
86+
Since it's so easy to train multiple algorithms with different hyperparameters, sometimes it's a good idea to know which deployed model is used to make predictions. You can find that out by querying the `pgml.deployed_models` view:
7287

7388
=== "SQL"
7489

75-
```sql linenums="1"
90+
```postgresql
7691
SELECT * FROM pgml.deployed_models;
7792
```
7893

7994
=== "Output"
8095

81-
```sql linenums="1"
82-
id | name | task | algorithm | deployed_at
83-
----+------------------------------------+----------------+-----------+----------------------------
84-
1 | Handwritten Digit Image Classifier | classification | linear | 2022-05-10 15:28:53.383893
8596
```
97+
id | name | task | algorithm | runtime | deployed_at
98+
----+------------------------------------+----------------+-----------+---------+----------------------------
99+
4 | Handwritten Digit Image Classifier | classification | xgboost | rust | 2022-10-11 13:06:26.473489
100+
(1 row)
101+
```
102+
103+
PostgresML will automatically deploy a model only if it has better metrics than existing ones, so it's safe to experiment with different algorithms and hyperparameters.
86104

105+
Take a look at [Deploying Models](/user_guides/predictions/deployments/) documentation for more details.
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
# Deployments
22

3-
Deployments are an artifact of calls to `pgml.deploy`. See [deployments](/user_guides/predictions/deployments/) for ways to create new deployments.
3+
Deployments are an artifact of calls to `pgml.deploy()` and `pgml.train()`. See [Deployments](/user_guides/predictions/deployments/) for ways to create new deployments manually.
44

55
![Deployment](/images/dashboard/deployment.png)
66

77
## Schema
88

9-
```sql linenums="1"
10-
pgml.deployments(
9+
```postgresql
10+
CREATE TABLE IF NOT EXISTS pgml.deployments(
1111
id BIGSERIAL PRIMARY KEY,
1212
project_id BIGINT NOT NULL,
1313
model_id BIGINT NOT NULL,
1414
strategy pgml.strategy NOT NULL,
1515
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT clock_timestamp(),
16-
CONSTRAINT project_id_fk FOREIGN KEY(project_id) REFERENCES pgml.projects(id),
17-
CONSTRAINT model_id_fk FOREIGN KEY(model_id) REFERENCES pgml.models(id)
16+
CONSTRAINT project_id_fk FOREIGN KEY(project_id) REFERENCES pgml.projects(id) ON DELETE CASCADE,
17+
CONSTRAINT model_id_fk FOREIGN KEY(model_id) REFERENCES pgml.models(id) ON DELETE CASCADE
1818
);
1919
```

0 commit comments

Comments
 (0)