You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-docs/docs/about/faq.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,10 +10,10 @@ Postgres is widely considered mission critical, and some of the most [reliable](
10
10
11
11
*How good are the models?*
12
12
13
-
Model quality is often a tradeoff between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several different algorithms to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production.
13
+
Model quality is often a trade-off between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several different algorithms to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production.
14
14
15
15
PostgresML doesn't help with reformulating a business problem into a machine learning problem. Like most things in life, the ultimate in quality will be a concerted effort of experts working over time. PostgresML is intended to establish successful patterns for those experts to collaborate around while leveraging the expertise of open source and research communities.
16
16
17
17
*Is PostgresML fast?*
18
18
19
-
Colocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. We're working on [benchmarks](https://github.com/postgresml/postgresml/blob/master/pgml-extension/sql/benchmarks.sql).
19
+
Collocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. We're working on [benchmarks](https://github.com/postgresml/postgresml/blob/master/pgml-extension/sql/benchmarks.sql).
Copy file name to clipboardExpand all lines: pgml-docs/docs/user_guides/dashboard/overview.md
+12-8Lines changed: 12 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,34 +1,38 @@
1
1
# Dashboard
2
2
3
-
PostgresML comes with an app to provide visibility into models and datasets in your database. If you're running the standard docker container, you can view it running on [http://localhost:8000/](http://localhost:8000/). Since your `pgml` schema starts empty, there isn't much to see. If you'd like to generate some examples, you can run the test suite against your database.
3
+
PostgresML comes with a web app to provide visibility into models and datasets in your database. If you're running [our Docker container](/user_guides/setup/quick_start_with_docker/), you can view it running on [http://localhost:8000/](http://localhost:8000/).
4
+
4
5
5
6
## Generate example data
6
7
7
-
The test suite for PostgresML is composed by running the sql files in the [examples directory](https://github.com/postgresml/postgresml/tree/master/pgml-extension/examples). You can use these examples to populate your local installation with some seed data. The test suite only operates on the `pgml` schema, and is otherwise isolated from the rest of the Postgres cluster.
8
+
The test suite for PostgresML is composed by running the SQL files in the [examples directory](https://github.com/postgresml/postgresml/tree/master/pgml-extension/examples). You can use these examples to populate your local installation with some test data. The test suite only operates on the `pgml` schema, and is otherwise isolated from the rest of the PostgresML installation.
Now there should be something to see in your local dashboard.
15
-
16
16
### Projects
17
+
17
18
Projects organize Models that are all striving toward the same task. They aren't much more than a name to group a collection of models. You can see the currently deployed model for each project indicated by :material-star:.
18
19
19
20

20
21
21
22
### Models
22
-
Models are the result of training an algorithm on a Snapshot of a dataset. They record `metrics` depending on their projects task, and are scored accordingly. Some models are the result of a hyperparameter search, and include additional analysis on the range of hyperparameters they are tested against.
23
+
24
+
Models are the result of training an algorithm on a snapshot of a dataset. They record metrics depending on their projects task, and are scored accordingly. Some models are the result of a hyperparameter search, and include additional analysis on the range of hyperparameters they are tested against.
23
25
24
26

25
27
26
28
### Snapshots
27
-
A Snapshot is created during training runs to record the data used for further analysis, or to train additional models against identical data.
29
+
30
+
A snapshot is created during training runs to record the data used for further analysis, or to train additional models against identical data.
28
31
29
32

30
33
31
34
### Deployments
35
+
32
36
Every deployment is recorded to track models over time.
Models are automatically deployed if their key metric (__R__<sup>2</sup> for regression, __F__<sub>1</sub> for classification) is improved over the currently deployed version during training. If you want to manage deploys manually, you can always change which model is currently responsible for making predictions.
3
+
A model is automatically deployed and used for predictions if its key metric (__R__<sup>2</sup> for regression, __F__<sub>1</sub> for classification) is improved during training over the previous version. Alternatively, if you want to manage deploys manually, you can always change which model is currently responsible for making predictions.
4
4
5
5
6
6
## API
7
7
8
-
```sql linenums="1" title="pgml.deploy"
8
+
```postgresqltitle="pgml.deploy()"
9
9
pgml.deploy(
10
-
project_name TEXT,-- Human-friendly project name
11
-
strategy pgml.strategy DEFAULT 'best_score',-- 'rollback', 'best_score', or 'most_recent'
12
-
algorithm pgml.algorithm DEFAULT NULL-- filter candidates to a particular algorithm, NULL = all qualify
10
+
project_name TEXT,
11
+
strategy TEXT DEFAULT 'best_score',
12
+
algorithm TEXT DEFAULT NULL
13
13
)
14
14
```
15
15
16
-
## Strategies
17
-
There are 3 different deployment strategies available
16
+
### Parameters
18
17
19
-
strategy | description
20
-
--- | ---
21
-
most_recent | The most recently trained model for this project
22
-
best_score | The model that achieved the best key metric score
23
-
rollback | The model that was previously deployed for this project
18
+
| Parameter | Description | Example |
19
+
|-----------|-------------|---------|
20
+
|`project_name`| The name of the project used in `pgml.train()` and `pgml.predict()`. |`My First PostgresML Project`|
21
+
|`strategy`| The deployment strategy to use for this deployment. |`rollback`|
22
+
|`algorithm`| Restrict the deployment to a specific algorithm. Useful when training on multiple algorithms and hyperparameters at the same time. |`xgboost`|
24
23
25
-
The default deployment behavior allows any algorithm to qualify.
24
+
25
+
#### Strategies
26
+
27
+
There are 3 different deployment strategies available:
28
+
29
+
| Strategy | Description |
30
+
|----------|-------------|
31
+
|`most_recent`| The most recently trained model for this project is immediately deployed, regardless of metrics. |
32
+
|`best_score`| The model that achieved the best key metric score is immediately deployed. |
33
+
|`rollback`| The model that was last deployed for this project is immediately redeployed, overriding the currently deployed model. |
34
+
35
+
The default deployment behavior allows any algorithm to qualify. It's automatically used during training, but can be manually executed as well:
26
36
27
37
=== "SQL"
28
38
29
-
```sql linenums="1"
30
-
SELECT * FROM pgml.deploy('Handwritten Digit Image Classifier', 'best_score');
Deployment candidates can be restricted to a specific algorithm by including the `algorithm` parameter.
55
+
#### Specific Algorithms
56
+
57
+
Deployment candidates can be restricted to a specific algorithm by including the `algorithm` parameter. This is useful when you're training multiple algorithms using different hyperparameters and want to restrict the deployment a single algorithm only:
Rolling back creates a new deployment for the model that was deployed before the current one. Multiple rollbacks in a row will effectively oscillate between the two most recently deployed models, making rollbacks a relatively safe operation.
79
+
## Rolling Back
67
80
68
-
=== "SQL"
81
+
In case the new model isn't performing well in production, it's easy to rollback to the previous version. A rollback creates a new deployment for the old model. Multiple rollbacks in a row will oscillate between the two most recently deployed models, making rollbacks a safe and reversible operation.
82
+
83
+
=== "Rollback 1"
69
84
70
85
```sql linenums="1"
71
-
SELECT * FROM pgml.deploy('Handwritten Digit Image Classifier', 'rollback', 'svm');
Handwritten Digit Image Classifier | rollback | linear
80
98
(1 row)
81
99
```
82
100
83
-
## Manual Deploys
84
-
85
-
You can also manually deploy any previously trained model by inserting a new record into `pgml.deployments`. You will need to query the `pgml.projects` and `pgml.models` tables to find the desired IDs.
86
-
87
-
!!! note
88
-
Deployed models are cached at the session level to improve prediction times. Manual deploys created this way will not invalidate those caches, so active sessions will not use manual deploys until they reconnect.
89
-
90
-
=== "SQL"
91
-
92
-
```sql linenums="1"
93
-
INSERT INTO pgml.deploys (project_id, model_id, strategy,) VALUES (1, 1, 'rollback');
The predict function is the key value proposition of PostgresML. It provides online predictions using the actively deployed model for a project.
3
+
The `pgml.predict()` function is the key value proposition of PostgresML. It provides online predictions using the best, automatically deployed model for a project.
4
4
5
5
## API
6
6
7
-
```sql linenums="1" title="pgml.predict"
7
+
The API for predictions is very simple and only requires two arguments: the project name and the features used for prediction.
8
+
9
+
```postgresql title="pgml.predict()"
8
10
pgml.predict (
9
-
project_name TEXT,-- Human-friendly project name
10
-
features DOUBLE PRECISION[] -- Must match the training data column order
11
+
project_name TEXT,
12
+
features REAL[]
11
13
)
12
14
```
13
15
16
+
### Parameters
17
+
18
+
| Parameter | Description | Example |
19
+
|-----------|-------------|---------|
20
+
|`project_name`| The project name used to train models in `pgml.train()`. |`My First PostgresML Project`|
21
+
|`features`| The feature vector used to predict a novel data point. |`ARRAY[0.1, 0.45, 1.0]`|
22
+
14
23
!!! example
15
-
Once a model has been trained for a project, making predictions is as simple as:
16
24
17
-
```sql linenums="1"
25
+
```postgresql
18
26
SELECT pgml.predict(
19
-
'Human-friendly project name',
20
-
ARRAY[...]
21
-
) AS prediction_score;
27
+
'My Classification Project',
28
+
ARRAY[0.1, 2.0, 5.0]
29
+
) AS prediction;
22
30
```
23
31
24
-
where `ARRAY[...]` is the same list of features for a sample used in training. This score can be used in normal queries, for example:
32
+
where `ARRAY[0.1, 2.0, 5.0]` is the same type of features used in training, in the same order as in the training data table or view. This score can be used in other regular queries.
If you've already been through the [training guide](/user_guides/training/overview/), you can see the results of those efforts:
54
+
If you've already been through the [Training Overview](/user_guides/training/overview/), you can see the results of those efforts:
43
55
44
56
=== "SQL"
45
57
46
-
```sql linenums="1"
47
-
SELECT target, pgml.predict('Handwritten Digit Image Classifier', image) AS prediction
58
+
```postgresql
59
+
SELECT
60
+
target,
61
+
pgml.predict('Handwritten Digit Image Classifier', image) AS prediction
48
62
FROM pgml.digits
49
63
LIMIT 10;
50
64
```
51
65
52
66
=== "Output"
53
67
54
-
```sql linenums="1"
68
+
```
55
69
target | prediction
56
70
--------+------------
57
71
0 | 0
@@ -67,20 +81,25 @@ If you've already been through the [training guide](/user_guides/training/overvi
67
81
(10 rows)
68
82
```
69
83
70
-
## Checking the deployed algorithm
71
-
If you're ever curious about which deployed models will be used to make predictions, you can see them in the `pgml.deployed_models` VIEW.
84
+
## Active Model
85
+
86
+
Since it's so easy to train multiple algorithms with different hyperparameters, sometimes it's a good idea to know which deployed model is used to make predictions. You can find that out by querying the `pgml.deployed_models` view:
PostgresML will automatically deploy a model only if it has better metrics than existing ones, so it's safe to experiment with different algorithms and hyperparameters.
86
104
105
+
Take a look at [Deploying Models](/user_guides/predictions/deployments/) documentation for more details.
Deployments are an artifact of calls to `pgml.deploy`. See [deployments](/user_guides/predictions/deployments/) for ways to create new deployments.
3
+
Deployments are an artifact of calls to `pgml.deploy()` and `pgml.train()`. See [Deployments](/user_guides/predictions/deployments/) for ways to create new deployments manually.
4
4
5
5

6
6
7
7
## Schema
8
8
9
-
```sql linenums="1"
10
-
pgml.deployments(
9
+
```postgresql
10
+
CREATE TABLE IF NOT EXISTS pgml.deployments(
11
11
id BIGSERIAL PRIMARY KEY,
12
12
project_id BIGINT NOT NULL,
13
13
model_id BIGINT NOT NULL,
14
14
strategy pgml.strategy NOT NULL,
15
15
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT clock_timestamp(),
0 commit comments