Skip to content

Commit f1e6f7a

Browse files
authored
Some more words in the quick start (#182)
* Some more words in the quick start * typo * number * wording
1 parent c9eb194 commit f1e6f7a

File tree

1 file changed

+33
-15
lines changed

1 file changed

+33
-15
lines changed

pgml-docs/docs/gym/quick_start.md

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
11
# Quick Start
22

3-
PostgresML is really easy to get started with. We'll use one of our example dataset to show you how to use it.
3+
PostgresML is easy to get started with. If you haven't already, sign up for our [Gym](https://gym.postgresml.org/signup/) to get a free hosted PostgresML instance you can use to follow this tutorial. You can also run one yourself by following the instructions in our Github repo.
4+
5+
<p align="center" markdown>
6+
[Sign Up for the Gym](https://gym.postgresml.org/signup/){ .md-button .md-button--primary .md-button }
7+
</p>
8+
9+
Once you have your PostgresML instance running, we'll be ready to get started.
410

511
## Get data
612

7-
Navigate to the IDE tab and run this query:
13+
The fisrt part of machine learning is getting your data in a format you can use. That's usually the hardest part, but thankfully we have a few example datasets we can use. To load one of them, navigate to the IDE tab and run this query:
814

915
```sql
1016
SELECT * FROM pgml.load_dataset('diabetes');
@@ -14,13 +20,13 @@ You should see something like this:
1420

1521
![IDE](/gym/ide.png)
1622

17-
We have more example Scikit datasets avaialble, e.g.:
23+
We have more example [Scikit datasets](https://scikit-learn.org/stable/datasets/toy_dataset.html) available:
1824

19-
- `iris`
20-
- `digits`
21-
- `wine`
25+
- `iris` (classification),
26+
- `digits` (classification),
27+
- `wine` (regression),
2228

23-
To load them into PostgresML, use the same function above with the desired dataset name as parameter. They will become available in the `pgml` schema, as `pgml.iris`, `pgml.digits` and `pgml.wine` respectively.
29+
To load them into PostgresML, use the same function above with the desired dataset name as parameter. They will become available in the `pgml` schema as `pgml.iris`, `pgml.digits` and `pgml.wine` respectively.
2430

2531
## Browse data
2632

@@ -33,7 +39,7 @@ SELECT * FROM pgml.diabetes LIMIT 5;
3339

3440
![Data](/gym/data.png)
3541

36-
The diabetes dataset is a toy (small, not realistic) dataset published by Scikit Learn. It contains 10 feature columns and one target column:
42+
The `diabetes` dataset is a toy (small, not realistic) dataset published by Scikit Learn. It contains ten feature columns and one label column:
3743

3844
| **Column** | **Description** | **Data type** |
3945
|------------|----------------------------------------------------------------------|---------------|
@@ -50,15 +56,14 @@ The diabetes dataset is a toy (small, not realistic) dataset published by Scikit
5056
| **target** | Quantitative measure of disease progression one year after baseline. | float |
5157

5258

53-
This dataset is not realistic because all data is perfectly arranged and normalized, which won't be the case with most datasets you'll run into in the real world, but it's perfect for our quick tutorial.
59+
This dataset is not realistic because all data is perfectly arranged and normalized, which won't be the case with most real world datasets you'll run into, but it's perfect for our quick tutorial.
5460

5561

5662
Alright, we're ready to do some machine learning!
5763

5864
## First project
5965

60-
PostgresML organizes itself into projects. A project is just a name for model(s) trained on a particular dataset. Let's create our first project by training an XGBoost
61-
model on our diabetes dataset.
66+
PostgresML organizes itself into projects. A project is just a name for model(s) trained on a particular dataset. Let's create our first project by training an XGBoost regression model on our diabetes dataset.
6267

6368
Using the IDE, run:
6469

@@ -79,13 +84,15 @@ By executing `pmgl.train()` we did the following:
7984

8085
- created a project called "My First Project",
8186
- snapshotted the table `pgml.diabetes` thus making the experiment reproducible (in case data changes, as it happens in the real world),
82-
- trained an XGBoost regression model on the data contained in the `pgml.diabetes` table, using the column `target` as the label,
87+
- trained an XGBoost regression model on the data contained in the `pgml.diabetes` table using the column `target` as the label,
8388
- deployed the model into production.
8489

8590
We're ready to predict novel data points!
8691

8792
## Inference
8893

94+
Inference is the act of predicting labels that we haven't necessarily used in training. That's the whole point of machine learning really: predict something we haven't seen before.
95+
8996
Let's try and predict some new values. Using the IDE, run:
9097

9198
```sql
@@ -110,7 +117,18 @@ You should see something like this:
110117

111118
![Prediction](/gym/predict.png)
112119

113-
Congratulations, you just did machine learning in just a few simple steps!
120+
The `prediction` column represents the possible value of the `target` column given the new features we just passed into the `pgml.predict()` function. You can just as easily predict multiple points and compare them to the actual labels in the dataset:
121+
122+
```sql
123+
SELECT
124+
pgml.predict('My First Project 2', ARRAY[
125+
age, sex, bmi, bp, s1, s3, s3, s4, s5, s6
126+
]),
127+
target
128+
FROM pgml.diabetes LIMIT 10;
129+
```
130+
131+
Sometimes the model will be pretty close, but sometimes it will be way off. That's why we'll be training several of them and comparing them next.
114132

115133
## Browse around
116134

@@ -140,10 +158,10 @@ If you navigate to the Models tab, you should see all three algorithms you just
140158

141159
Huh, apparently XGBoost isn't as good we originally thought! In this case, a simple linear regression did significantly better than all the others. It's hard to know which algorithm will perform best given a dataset; even experienced machine learning engineers get this one wrong.
142160

143-
With PostgresML, you needn't worry; you can train all of them and see which one does best for your data. PostgresML will automatically use the best one for inference.
161+
With PostgresML, you needn't worry: you can train all of them and see which one does best for your data. PostgresML will automatically use the best one for inference.
144162

145163
## Conclusion
146164

147165
Congratulations on becoming a Machine Learning engineer. If you thought ML was scary or mysterious, we hope that this small tutorial made it a little bit more approachable.
148166

149-
Keep exploring our other tutorials and try some things on your own. Happy machine learning!
167+
This is the first of many tutorials we'll publish, so stay tuned. Happy machine learning!

0 commit comments

Comments
 (0)