You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-docs/docs/gym/quick_start.md
+33-15Lines changed: 33 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,16 @@
1
1
# Quick Start
2
2
3
-
PostgresML is really easy to get started with. We'll use one of our example dataset to show you how to use it.
3
+
PostgresML is easy to get started with. If you haven't already, sign up for our [Gym](https://gym.postgresml.org/signup/) to get a free hosted PostgresML instance you can use to follow this tutorial. You can also run one yourself by following the instructions in our Github repo.
4
+
5
+
<palign="center"markdown>
6
+
[Sign Up for the Gym](https://gym.postgresml.org/signup/){ .md-button .md-button--primary .md-button }
7
+
</p>
8
+
9
+
Once you have your PostgresML instance running, we'll be ready to get started.
4
10
5
11
## Get data
6
12
7
-
Navigate to the IDE tab and run this query:
13
+
The fisrt part of machine learning is getting your data in a format you can use. That's usually the hardest part, but thankfully we have a few example datasets we can use. To load one of them, navigate to the IDE tab and run this query:
8
14
9
15
```sql
10
16
SELECT*FROMpgml.load_dataset('diabetes');
@@ -14,13 +20,13 @@ You should see something like this:
14
20
15
21

16
22
17
-
We have more example Scikit datasets avaialble, e.g.:
23
+
We have more example [Scikit datasets](https://scikit-learn.org/stable/datasets/toy_dataset.html) available:
18
24
19
-
-`iris`
20
-
-`digits`
21
-
-`wine`
25
+
-`iris` (classification),
26
+
-`digits` (classification),
27
+
-`wine` (regression),
22
28
23
-
To load them into PostgresML, use the same function above with the desired dataset name as parameter. They will become available in the `pgml` schema, as `pgml.iris`, `pgml.digits` and `pgml.wine` respectively.
29
+
To load them into PostgresML, use the same function above with the desired dataset name as parameter. They will become available in the `pgml` schema as `pgml.iris`, `pgml.digits` and `pgml.wine` respectively.
24
30
25
31
## Browse data
26
32
@@ -33,7 +39,7 @@ SELECT * FROM pgml.diabetes LIMIT 5;
33
39
34
40

35
41
36
-
The diabetes dataset is a toy (small, not realistic) dataset published by Scikit Learn. It contains 10 feature columns and one target column:
42
+
The `diabetes` dataset is a toy (small, not realistic) dataset published by Scikit Learn. It contains ten feature columns and one label column:
@@ -50,15 +56,14 @@ The diabetes dataset is a toy (small, not realistic) dataset published by Scikit
50
56
|**target**| Quantitative measure of disease progression one year after baseline. | float |
51
57
52
58
53
-
This dataset is not realistic because all data is perfectly arranged and normalized, which won't be the case with most datasets you'll run into in the real world, but it's perfect for our quick tutorial.
59
+
This dataset is not realistic because all data is perfectly arranged and normalized, which won't be the case with most real world datasets you'll run into, but it's perfect for our quick tutorial.
54
60
55
61
56
62
Alright, we're ready to do some machine learning!
57
63
58
64
## First project
59
65
60
-
PostgresML organizes itself into projects. A project is just a name for model(s) trained on a particular dataset. Let's create our first project by training an XGBoost
61
-
model on our diabetes dataset.
66
+
PostgresML organizes itself into projects. A project is just a name for model(s) trained on a particular dataset. Let's create our first project by training an XGBoost regression model on our diabetes dataset.
62
67
63
68
Using the IDE, run:
64
69
@@ -79,13 +84,15 @@ By executing `pmgl.train()` we did the following:
79
84
80
85
- created a project called "My First Project",
81
86
- snapshotted the table `pgml.diabetes` thus making the experiment reproducible (in case data changes, as it happens in the real world),
82
-
- trained an XGBoost regression model on the data contained in the `pgml.diabetes` table, using the column `target` as the label,
87
+
- trained an XGBoost regression model on the data contained in the `pgml.diabetes` table using the column `target` as the label,
83
88
- deployed the model into production.
84
89
85
90
We're ready to predict novel data points!
86
91
87
92
## Inference
88
93
94
+
Inference is the act of predicting labels that we haven't necessarily used in training. That's the whole point of machine learning really: predict something we haven't seen before.
95
+
89
96
Let's try and predict some new values. Using the IDE, run:
90
97
91
98
```sql
@@ -110,7 +117,18 @@ You should see something like this:
110
117
111
118

112
119
113
-
Congratulations, you just did machine learning in just a few simple steps!
120
+
The `prediction` column represents the possible value of the `target` column given the new features we just passed into the `pgml.predict()` function. You can just as easily predict multiple points and compare them to the actual labels in the dataset:
121
+
122
+
```sql
123
+
SELECT
124
+
pgml.predict('My First Project 2', ARRAY[
125
+
age, sex, bmi, bp, s1, s3, s3, s4, s5, s6
126
+
]),
127
+
target
128
+
FROMpgml.diabetesLIMIT10;
129
+
```
130
+
131
+
Sometimes the model will be pretty close, but sometimes it will be way off. That's why we'll be training several of them and comparing them next.
114
132
115
133
## Browse around
116
134
@@ -140,10 +158,10 @@ If you navigate to the Models tab, you should see all three algorithms you just
140
158
141
159
Huh, apparently XGBoost isn't as good we originally thought! In this case, a simple linear regression did significantly better than all the others. It's hard to know which algorithm will perform best given a dataset; even experienced machine learning engineers get this one wrong.
142
160
143
-
With PostgresML, you needn't worry; you can train all of them and see which one does best for your data. PostgresML will automatically use the best one for inference.
161
+
With PostgresML, you needn't worry: you can train all of them and see which one does best for your data. PostgresML will automatically use the best one for inference.
144
162
145
163
## Conclusion
146
164
147
165
Congratulations on becoming a Machine Learning engineer. If you thought ML was scary or mysterious, we hope that this small tutorial made it a little bit more approachable.
148
166
149
-
Keep exploring our other tutorials and try some things on your own. Happy machine learning!
167
+
This is the first of many tutorials we'll publish, so stay tuned. Happy machine learning!
0 commit comments