postgresml
diff --git a/‎pgml-docs/docs/about/faq.md‎
Lines changed: 2 additions & 2 deletions b/‎pgml-docs/docs/about/faq.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pgml-docs/docs/about/roadmap.md‎
Lines changed: 1 addition & 1 deletion b/‎pgml-docs/docs/about/roadmap.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pgml-docs/docs/developer_guide/overview.md‎
Lines changed: 1 addition & 1 deletion b/‎pgml-docs/docs/developer_guide/overview.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pgml-docs/docs/images/dashboard/notebooks.png‎
116 KB b/‎pgml-docs/docs/images/dashboard/notebooks.png‎
116 KB
diff --git a/‎pgml-docs/docs/images/demos/htop.png‎
115 KB b/‎pgml-docs/docs/images/demos/htop.png‎
115 KB
diff --git a/‎pgml-docs/docs/user_guides/dashboard/overview.md‎
Lines changed: 12 additions & 8 deletions b/‎pgml-docs/docs/user_guides/dashboard/overview.md‎
Lines changed: 12 additions & 8 deletions
diff --git a/‎pgml-docs/docs/user_guides/predictions/deployments.md‎
Lines changed: 60 additions & 44 deletions b/‎pgml-docs/docs/user_guides/predictions/deployments.md‎
Lines changed: 60 additions & 44 deletions
diff --git a/‎pgml-docs/docs/user_guides/predictions/overview.md‎
Lines changed: 48 additions & 29 deletions b/‎pgml-docs/docs/user_guides/predictions/overview.md‎
Lines changed: 48 additions & 29 deletions
diff --git a/‎pgml-docs/docs/user_guides/schema/deployments.md‎
Lines changed: 5 additions & 5 deletions b/‎pgml-docs/docs/user_guides/schema/deployments.md‎
Lines changed: 5 additions & 5 deletions
@@ -10,10 +10,10 @@ Postgres is widely considered mission critical, and some of the most [reliable](
 
 *How good are the models?*
 
-Model quality is often a tradeoff between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several different algorithms to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production. 
+Model quality is often a trade-off between compute resources and incremental quality improvements. Sometimes a few thousands training examples and an off the shelf algorithm can deliver significant business value after a few seconds of training. PostgresML allows stakeholders to choose several different algorithms to get the most bang for the buck, or invest in more computationally intensive techniques as necessary. In addition, PostgresML automatically applies best practices for data cleaning like imputing missing values by default and normalizing data to prevent common problems in production. 
 
 PostgresML doesn't help with reformulating a business problem into a machine learning problem. Like most things in life, the ultimate in quality will be a concerted effort of experts working over time. PostgresML is intended to establish successful patterns for those experts to collaborate around while leveraging the expertise of open source and research communities.
 
 *Is PostgresML fast?*
 
-Colocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. We're working on [benchmarks](https://github.com/postgresml/postgresml/blob/master/pgml-extension/sql/benchmarks.sql).
+Collocating the compute with the data inside the database removes one of the most common latency bottlenecks in the ML stack, which is the (de)serialization of data between stores and services across the wire. Modern versions of Postgres also support automatic query parrellization across multiple workers to further minimize latency in large batch workloads. Finally, PostgresML will utilize GPU compute if both the algorithm and hardware support it, although it is currently rare in practice for production databases to have GPUs. We're working on [benchmarks](https://github.com/postgresml/postgresml/blob/master/pgml-extension/sql/benchmarks.sql).
@@ -1,4 +1,4 @@
-# Roadmap
+# Road map
 This project is currently a proof of concept. Some important features, which we are currently thinking about or working on, are listed below.
 
 ## Production deployment
 
@@ -2,7 +2,7 @@
 
 ## General
 
-[Use unix line endings](https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings).
+[Use Unix line endings](https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings).
 
 
 ## Setup your development environment
 
@@ -1,34 +1,38 @@
 # Dashboard
 
-PostgresML comes with an app to provide visibility into models and datasets in your database. If you're running the standard docker container, you can view it running on [http://localhost:8000/](http://localhost:8000/). Since your `pgml` schema starts empty, there isn't much to see. If you'd like to generate some examples, you can run the test suite against your database. 
+PostgresML comes with a web app to provide visibility into models and datasets in your database. If you're running [our Docker container](/user_guides/setup/quick_start_with_docker/), you can view it running on [http://localhost:8000/](http://localhost:8000/).
+
 
 ## Generate example data
 
-The test suite for PostgresML is composed by running the sql files in the [examples directory](https://github.com/postgresml/postgresml/tree/master/pgml-extension/examples). You can use these examples to populate your local installation with some seed data. The test suite only operates on the `pgml` schema, and is otherwise isolated from the rest of the Postgres cluster.
+The test suite for PostgresML is composed by running the SQL files in the [examples directory](https://github.com/postgresml/postgresml/tree/master/pgml-extension/examples). You can use these examples to populate your local installation with some test data. The test suite only operates on the `pgml` schema, and is otherwise isolated from the rest of the PostgresML installation.
 
 ```bash
-$ psql -f pgml-extension/sql/test.sql -P pager postgres://postgres@127.0.0.1:5433/pgml_development
+psql -f pgml-extension/sql/test.sql \
+     -P pager \
+     postgres://postgres@127.0.0.1:5433/pgml_development
 ```
 
-## Overview
-Now there should be something to see in your local dashboard.
-
 ### Projects
+
 Projects organize Models that are all striving toward the same task. They aren't much more than a name to group a collection of models. You can see the currently deployed model for each project indicated by :material-star:.
 
 ![Project](/images/dashboard/project.png)
 
 ### Models
-Models are the result of training an algorithm on a Snapshot of a dataset. They record `metrics` depending on their projects task, and are scored accordingly. Some models are the result of a hyperparameter search, and include additional analysis on the range of hyperparameters they are tested against.
+
+Models are the result of training an algorithm on a snapshot of a dataset. They record metrics depending on their projects task, and are scored accordingly. Some models are the result of a hyperparameter search, and include additional analysis on the range of hyperparameters they are tested against.
 
 ![Model](/images/dashboard/model.png)
 
 ### Snapshots
-A Snapshot is created during training runs to record the data used for further analysis, or to train additional models against identical data.
+
+A snapshot is created during training runs to record the data used for further analysis, or to train additional models against identical data.
 
 ![Snapshot](/images/dashboard/snapshot.png)
 
 ### Deployments
+
 Every deployment is recorded to track models over time.
 
 ![Deployment](/images/dashboard/deployment.png)
 
@@ -1,50 +1,64 @@
 # Deployments
 
-Models are automatically deployed if their key metric (__R__<sup>2</sup> for regression, __F__<sub>1</sub> for classification) is improved over the currently deployed version during training. If you want to manage deploys manually, you can always change which model is currently responsible for making predictions.
+A model is automatically deployed and used for predictions if its key metric (__R__<sup>2</sup> for regression, __F__<sub>1</sub> for classification) is improved during training over the previous version. Alternatively, if you want to manage deploys manually, you can always change which model is currently responsible for making predictions.
 
 
 ## API
 
-```sql linenums="1" title="pgml.deploy"
+```postgresql title="pgml.deploy()"
 pgml.deploy(
-	project_name TEXT,                            -- Human-friendly project name
-	strategy pgml.strategy DEFAULT 'best_score',  -- 'rollback', 'best_score', or 'most_recent'
-	algorithm pgml.algorithm DEFAULT NULL         -- filter candidates to a particular algorithm, NULL = all qualify
+	project_name TEXT,
+	strategy TEXT DEFAULT 'best_score',
+	algorithm TEXT DEFAULT NULL
 )
 ```
 
-## Strategies
-There are 3 different deployment strategies available
+### Parameters
 
-strategy | description
---- | ---
-most_recent | The most recently trained model for this project
-best_score | The model that achieved the best key metric score
-rollback | The model that was previously deployed for this project
+| Parameter | Description | Example |
+|-----------|-------------|---------|
+| `project_name` | The name of the project used in `pgml.train()` and `pgml.predict()`. | `My First PostgresML Project` |
+| `strategy` | The deployment strategy to use for this deployment. | `rollback` |
+| `algorithm`  | Restrict the deployment to a specific algorithm. Useful when training on multiple algorithms and hyperparameters at the same time. | `xgboost` |
 
-The default deployment behavior allows any algorithm to qualify.
+
+#### Strategies
+
+There are 3 different deployment strategies available:
+
+| Strategy | Description |
+|----------|-------------|
+| `most_recent` | The most recently trained model for this project is immediately deployed, regardless of metrics. |
+| `best_score` | The model that achieved the best key metric score is immediately deployed. |
+| `rollback` | The model that was last deployed for this project is immediately redeployed, overriding the currently deployed model. |
+
+The default deployment behavior allows any algorithm to qualify. It's automatically used during training, but can be manually executed as well:
 
 === "SQL"
 
-	```sql linenums="1"
-	SELECT * FROM pgml.deploy('Handwritten Digit Image Classifier', 'best_score');
+	```postgresql
+	SELECT * FROM pgml.deploy(
+		'Handwritten Digit Image Classifier',
+		strategy => 'best_score'
+	);
 	```
 
 === "Output"
 
-	```sql linenums="1"
-                project_name            |    strategy    | algorithm
-	------------------------------------+----------------+----------------
-	 Handwritten Digit Image Classifier | classification | linear
+	```
+                  project               |  strategy  | algorithm
+	------------------------------------+------------+-----------
+	 Handwritten Digit Image Classifier | best_score | xgboost
 	(1 row)
 	```
 
-## Specific Algorithms
-Deployment candidates can be restricted to a specific algorithm by including the `algorithm` parameter.
+#### Specific Algorithms
+
+Deployment candidates can be restricted to a specific algorithm by including the `algorithm` parameter. This is useful when you're training multiple algorithms using different hyperparameters and want to restrict the deployment a single algorithm only:
 
 === "SQL"
 
-	```sql linenums="1"
+	```postgresql
 	SELECT * FROM pgml.deploy(
         project_name => 'Handwritten Digit Image Classifier', 
         strategy => 'best_score', 
@@ -54,47 +68,49 @@ Deployment candidates can be restricted to a specific algorithm by including the
 
 === "Output"
 
-	```sql linenums="1"
+	```
                 project_name            |    strategy    | algorithm
 	------------------------------------+----------------+----------------
 	 Handwritten Digit Image Classifier | classification | svm
 	(1 row)
 	```
 
 
-## Rolling back to a specific algorithm
-Rolling back creates a new deployment for the model that was deployed before the current one. Multiple rollbacks in a row will effectively oscillate between the two most recently deployed models, making rollbacks a relatively safe operation. 
+## Rolling Back
 
-=== "SQL"
+In case the new model isn't performing well in production, it's easy to rollback to the previous version. A rollback creates a new deployment for the old model. Multiple rollbacks in a row will oscillate between the two most recently deployed models, making rollbacks a safe and reversible operation.
+
+=== "Rollback 1"
 
 	```sql linenums="1"
-	SELECT * FROM pgml.deploy('Handwritten Digit Image Classifier', 'rollback', 'svm');
+	SELECT * FROM pgml.deploy(
+		'Handwritten Digit Image Classifier',
+		strategy => 'rollback'
+	);
 	```
 
 === "Output"
 
-	```sql linenums="1"
-                project_name            |    strategy    | algorithm
-	------------------------------------+----------------+----------------
-	 Handwritten Digit Image Classifier | classification | svm
+	```
+                 project               | strategy | algorithm
+	------------------------------------+----------+-----------
+	 Handwritten Digit Image Classifier | rollback | linear
 	(1 row)
 	```
 
-## Manual Deploys
-
-You can also manually deploy any previously trained model by inserting a new record into `pgml.deployments`. You will need to query the `pgml.projects` and `pgml.models` tables to find the desired IDs.
-
-!!! note 
-    Deployed models are cached at the session level to improve prediction times. Manual deploys created this way will not invalidate those caches, so active sessions will not use manual deploys until they reconnect. 
-
-=== "SQL"
-
-	```sql linenums="1"
-	INSERT INTO pgml.deploys (project_id, model_id, strategy,) VALUES (1, 1, 'rollback');
+=== "Rollback 2"
+	```postgresql
+	SELECT * FROM pgml.deploy(
+		'Handwritten Digit Image Classifier',
+		strategy => 'rollback'
+	);
 	```
 
 === "Output"
 
-	```sql linenums="1"
-    INSERT 0 1
+	```
+	              project               | strategy | algorithm
+	------------------------------------+----------+-----------
+	 Handwritten Digit Image Classifier | rollback | xgboost
+	(1 row)
 	```
@@ -1,57 +1,71 @@
-# Predictions
+# Making Predictions
 
-The predict function is the key value proposition of PostgresML. It provides online predictions using the actively deployed model for a project.
+The `pgml.predict()` function is the key value proposition of PostgresML. It provides online predictions using the best, automatically deployed model for a project.
 
 ## API
 
-```sql linenums="1" title="pgml.predict"
+The API for predictions is very simple and only requires two arguments: the project name and the features used for prediction.
+
+```postgresql title="pgml.predict()"
 pgml.predict (
-	project_name TEXT,            -- Human-friendly project name
-	features DOUBLE PRECISION[]   -- Must match the training data column order
+	project_name TEXT,
+	features REAL[]
 )
 ```
 
+### Parameters
+
+| Parameter | Description | Example |
+|-----------|-------------|---------|
+| `project_name`| The project name used to train models in `pgml.train()`. | `My First PostgresML Project` |
+| `features` | The feature vector used to predict a novel data point. | `ARRAY[0.1, 0.45, 1.0]` |
+
 !!! example
-    Once a model has been trained for a project, making predictions is as simple as:
 
-    ```sql linenums="1"
+    ```postgresql
     SELECT pgml.predict(
-        'Human-friendly project name', 
-        ARRAY[...]
-    ) AS prediction_score;
+        'My Classification Project', 
+        ARRAY[0.1, 2.0, 5.0]
+    ) AS prediction;
     ```
 
-where `ARRAY[...]` is the same list of features for a sample used in training. This score can be used in normal queries, for example:
+where `ARRAY[0.1, 2.0, 5.0]` is the same type of features used in training, in the same order as in the training data table or view. This score can be used in other regular queries.
 
 !!! example
-    ```sql linenums="1"
+    ```postgresql
     SELECT *,
         pgml.predict(
-            'Probability of buying our products',
-            ARRAY[user.location, NOW() - user.created_at, user.total_purchases_in_dollars]
-        ) AS likely_to_buy_score
+            'Buy it Again',
+            ARRAY[
+                user.location_id,
+                NOW() - user.created_at,
+                user.total_purchases_in_dollars
+            ]
+        ) AS buying_score
     FROM users
-    WHERE comapany_id = 5
-    ORDER BY likely_to_buy_score
+    WHERE tenant_id = 5
+    ORDER BY buying_score
     LIMIT 25;
     ```
 
 
-## Making Predictions
+### Example
 
-If you've already been through the [training guide](/user_guides/training/overview/), you can see the results of those efforts:
+If you've already been through the [Training Overview](/user_guides/training/overview/), you can see the results of those efforts:
 
 === "SQL"
 
-    ```sql linenums="1"
-    SELECT target, pgml.predict('Handwritten Digit Image Classifier', image) AS prediction
+    ```postgresql
+    SELECT
+        target,
+        pgml.predict('Handwritten Digit Image Classifier', image) AS prediction
     FROM pgml.digits 
     LIMIT 10;
     ```
 
 === "Output"
 
-    ```sql linenums="1"
+    ```
      target | prediction
     --------+------------
           0 |          0
@@ -67,20 +81,25 @@ If you've already been through the [training guide](/user_guides/training/overvi
     (10 rows)
     ```
 
-## Checking the deployed algorithm
-If you're ever curious about which deployed models will be used to make predictions, you can see them in the `pgml.deployed_models` VIEW.
+## Active Model
+
+Since it's so easy to train multiple algorithms with different hyperparameters, sometimes it's a good idea to know which deployed model is used to make predictions. You can find that out by querying the `pgml.deployed_models` view:
 
 === "SQL"
 
-    ```sql linenums="1"
+    ```postgresql
     SELECT * FROM pgml.deployed_models;
     ```
 
 === "Output"
 
-    ```sql linenums="1"
-     id |                name                |      task      | algorithm |        deployed_at
-    ----+------------------------------------+----------------+-----------+----------------------------
-      1 | Handwritten Digit Image Classifier | classification | linear    | 2022-05-10 15:28:53.383893
     ```
+     id |                name                |      task      | algorithm | runtime |        deployed_at
+    ----+------------------------------------+----------------+-----------+---------+----------------------------
+      4 | Handwritten Digit Image Classifier | classification | xgboost   | rust    | 2022-10-11 13:06:26.473489
+    (1 row)
+    ```
+
+PostgresML will automatically deploy a model only if it has better metrics than existing ones, so it's safe to experiment with different algorithms and hyperparameters.
 
+Take a look at [Deploying Models](/user_guides/predictions/deployments/) documentation for more details.
@@ -1,19 +1,19 @@
 # Deployments
 
-Deployments are an artifact of calls to `pgml.deploy`. See [deployments](/user_guides/predictions/deployments/) for ways to create new deployments.
+Deployments are an artifact of calls to `pgml.deploy()` and `pgml.train()`. See [Deployments](/user_guides/predictions/deployments/) for ways to create new deployments manually.
 
 ![Deployment](/images/dashboard/deployment.png)
 
 ## Schema
 
-```sql linenums="1"
-pgml.deployments(
+```postgresql
+CREATE TABLE IF NOT EXISTS pgml.deployments(
 	id BIGSERIAL PRIMARY KEY,
 	project_id BIGINT NOT NULL,
 	model_id BIGINT NOT NULL,
 	strategy pgml.strategy NOT NULL,
 	created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT clock_timestamp(),
-	CONSTRAINT project_id_fk FOREIGN KEY(project_id) REFERENCES pgml.projects(id),
-	CONSTRAINT model_id_fk FOREIGN KEY(model_id) REFERENCES pgml.models(id)
+	CONSTRAINT project_id_fk FOREIGN KEY(project_id) REFERENCES pgml.projects(id) ON DELETE CASCADE,
+	CONSTRAINT model_id_fk FOREIGN KEY(model_id) REFERENCES pgml.models(id) ON DELETE CASCADE
 );
 ```
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Roadmap`
	`1`	`+# Road map`
`2`	`2`	`This project is currently a proof of concept. Some important features, which we are currently thinking about or working on, are listed below.`
`3`	`3`
`4`	`4`	`## Production deployment`