Skip to content

Commit 6300733

Browse files
authored
Gym Quick Start (#179)
* Gym Quick Start * accidental change
1 parent f1e15dc commit 6300733

File tree

10 files changed

+129
-1
lines changed

10 files changed

+129
-1
lines changed

pgml-docs/.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.10.4

pgml-docs/docs/gym/data.png

103 KB
Loading

pgml-docs/docs/gym/ide.png

49.2 KB
Loading

pgml-docs/docs/gym.md renamed to pgml-docs/docs/gym/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,4 @@ We'll be publishing a series of blog posts detailing common machine learning app
4848

4949
<p align="center" markdown>
5050
[Sign up for the Gym](https://gym.postgresml.org/){ .md-button .md-button--primary }
51-
</p>
51+
</p>

pgml-docs/docs/gym/predict.png

65.7 KB
Loading

pgml-docs/docs/gym/projects.png

48.7 KB
Loading

pgml-docs/docs/gym/quick_start.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Quick Start
2+
3+
PostgresML is really easy to get started with. We'll use one of our example dataset to show you how to use it.
4+
5+
## Get data
6+
7+
Navigate to the IDE tab and run this query:
8+
9+
```sql
10+
SELECT * FROM pgml.load_dataset('diabetes');
11+
```
12+
13+
You should see something like this:
14+
15+
![IDE](/gym/ide.png)
16+
17+
We have more example Scikit datasets avaialble, e.g.:
18+
19+
- `iris`
20+
- `digits`
21+
22+
## Browse data
23+
24+
The SQL editor you just used can run arbitrary queries on the PostgresML instance. For example,
25+
if we want to see what dataset we just loaded looks like, we can run:
26+
27+
```sql
28+
SELECT * FROM pgml.digits LIMIT 5;
29+
```
30+
31+
![Data](/gym/data.png)
32+
33+
34+
Alright, we're ready to do some machine learning!
35+
36+
## First project
37+
38+
PostgresML organizes itself into projects. A project is just a name for model(s) trained on a particular dataset. Let's create our first project by training an XGBoost
39+
model on our diabetes dataset.
40+
41+
Using the IDE, run:
42+
43+
```sql
44+
SELECT * FROM pgml.train(
45+
'My First Project',
46+
task => 'regression',
47+
relation_name => 'pgml.diabetes',
48+
y_column_name => 'target',
49+
algorithm => 'xgboost');
50+
```
51+
52+
You should see this:
53+
54+
![Train](/gym/train.png)
55+
56+
By executing `pmgl.train()` we did the following:
57+
58+
- created a project called "My First Project",
59+
- snapshotted the table `pgml.diabetes` thus making the experiment reproducible (in case data changes, as it happens in the real world),
60+
- trained an XGBoost regression model on the data contained in the `pgml.diabetes` table, using the column `target` as the label,
61+
- deployed the model into production.
62+
63+
We're ready to predict novel data points!
64+
65+
## Inference
66+
67+
Let's try and predict some new values. Using the IDE, run:
68+
69+
```sql
70+
SELECT pgml.predict(
71+
'My First Project',
72+
ARRAY[
73+
0.06, -- age
74+
0.05, -- sex
75+
0.05, -- bmi
76+
-0.0056, -- bp
77+
0.012191, -- s1
78+
-0.043401, -- s2
79+
0.034309, -- s3
80+
-0.031938, -- s4
81+
-0.061988, --s5
82+
-0.031988 -- s6
83+
]
84+
) AS prediction;
85+
```
86+
87+
You should see something like this:
88+
89+
![Prediction](/gym/predict.png)
90+
91+
Congratulations, you just did machine learning in just a few simple steps!
92+
93+
## Browse around
94+
95+
By creating our first project, we made the Dashboard a little bit more interesting. This is how the `pgml.diabetes` snapshot we just created looks like:
96+
97+
![Snapshot](/gym/snapshot.png)
98+
99+
As you can see, we automatically performed some analysis on the data. Visualizing the data is important to understand how it could potentially behave given different models, and maybe even predict how it could evolve in the future.
100+
101+
XGBoost is a good algorithm, but what if there are better ones? Let's try training a few more using the IDE. Run these one at a time:
102+
103+
```sql
104+
-- Simple linear regression.
105+
SELECT * FROM pgml.train(
106+
'My First Project',
107+
algorithm => 'linear');
108+
109+
-- The Lasso (much fancier linear regression).
110+
SELECT * FROM pgml.train(
111+
'My First Project',
112+
algorithm => 'lasso');
113+
```
114+
115+
If you navigate to the Models tab, you should see all three algorithms you just trained:
116+
117+
![Trained Algorithms](/gym/trained_models.png)
118+
119+
Huh, apparently XGBoost isn't as good we originally thought! In this case, a simple linear regression did significantly better than all the others. It's hard to know which algorithm will perform best given a dataset; even experienced machine learning engineers get this one wrong.
120+
121+
With PostgresML, you needn't worry; you can train all of them and see which one does best for your data. PostgresML will automatically use the best one for inference.
122+
123+
## Conclusion
124+
125+
Congratulations on becoming a Machine Learning engineer. If you thought ML was scary or mysterious, we hope that this small tutorial made it a little bit more approachable.
126+
127+
Keep exploring our other tutorials and try some things on your own. Happy machine learning!

pgml-docs/docs/gym/snapshot.png

150 KB
Loading

pgml-docs/docs/gym/train.png

58.7 KB
Loading
31.2 KB
Loading

0 commit comments

Comments
 (0)