|
| 1 | +# Quick Start |
| 2 | + |
| 3 | +PostgresML is really easy to get started with. We'll use one of our example dataset to show you how to use it. |
| 4 | + |
| 5 | +## Get data |
| 6 | + |
| 7 | +Navigate to the IDE tab and run this query: |
| 8 | + |
| 9 | +```sql |
| 10 | +SELECT * FROM pgml.load_dataset('diabetes'); |
| 11 | +``` |
| 12 | + |
| 13 | +You should see something like this: |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +We have more example Scikit datasets avaialble, e.g.: |
| 18 | + |
| 19 | +- `iris` |
| 20 | +- `digits` |
| 21 | + |
| 22 | +## Browse data |
| 23 | + |
| 24 | +The SQL editor you just used can run arbitrary queries on the PostgresML instance. For example, |
| 25 | +if we want to see what dataset we just loaded looks like, we can run: |
| 26 | + |
| 27 | +```sql |
| 28 | +SELECT * FROM pgml.digits LIMIT 5; |
| 29 | +``` |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +Alright, we're ready to do some machine learning! |
| 35 | + |
| 36 | +## First project |
| 37 | + |
| 38 | +PostgresML organizes itself into projects. A project is just a name for model(s) trained on a particular dataset. Let's create our first project by training an XGBoost |
| 39 | +model on our diabetes dataset. |
| 40 | + |
| 41 | +Using the IDE, run: |
| 42 | + |
| 43 | +```sql |
| 44 | +SELECT * FROM pgml.train( |
| 45 | + 'My First Project', |
| 46 | + task => 'regression', |
| 47 | + relation_name => 'pgml.diabetes', |
| 48 | + y_column_name => 'target', |
| 49 | + algorithm => 'xgboost'); |
| 50 | +``` |
| 51 | + |
| 52 | +You should see this: |
| 53 | + |
| 54 | + |
| 55 | + |
| 56 | +By executing `pmgl.train()` we did the following: |
| 57 | + |
| 58 | +- created a project called "My First Project", |
| 59 | +- snapshotted the table `pgml.diabetes` thus making the experiment reproducible (in case data changes, as it happens in the real world), |
| 60 | +- trained an XGBoost regression model on the data contained in the `pgml.diabetes` table, using the column `target` as the label, |
| 61 | +- deployed the model into production. |
| 62 | + |
| 63 | +We're ready to predict novel data points! |
| 64 | + |
| 65 | +## Inference |
| 66 | + |
| 67 | +Let's try and predict some new values. Using the IDE, run: |
| 68 | + |
| 69 | +```sql |
| 70 | +SELECT pgml.predict( |
| 71 | + 'My First Project', |
| 72 | + ARRAY[ |
| 73 | + 0.06, -- age |
| 74 | + 0.05, -- sex |
| 75 | + 0.05, -- bmi |
| 76 | + -0.0056, -- bp |
| 77 | + 0.012191, -- s1 |
| 78 | + -0.043401, -- s2 |
| 79 | + 0.034309, -- s3 |
| 80 | + -0.031938, -- s4 |
| 81 | + -0.061988, --s5 |
| 82 | + -0.031988 -- s6 |
| 83 | + ] |
| 84 | +) AS prediction; |
| 85 | +``` |
| 86 | + |
| 87 | +You should see something like this: |
| 88 | + |
| 89 | + |
| 90 | + |
| 91 | +Congratulations, you just did machine learning in just a few simple steps! |
| 92 | + |
| 93 | +## Browse around |
| 94 | + |
| 95 | +By creating our first project, we made the Dashboard a little bit more interesting. This is how the `pgml.diabetes` snapshot we just created looks like: |
| 96 | + |
| 97 | + |
| 98 | + |
| 99 | +As you can see, we automatically performed some analysis on the data. Visualizing the data is important to understand how it could potentially behave given different models, and maybe even predict how it could evolve in the future. |
| 100 | + |
| 101 | +XGBoost is a good algorithm, but what if there are better ones? Let's try training a few more using the IDE. Run these one at a time: |
| 102 | + |
| 103 | +```sql |
| 104 | +-- Simple linear regression. |
| 105 | +SELECT * FROM pgml.train( |
| 106 | + 'My First Project', |
| 107 | + algorithm => 'linear'); |
| 108 | + |
| 109 | +-- The Lasso (much fancier linear regression). |
| 110 | +SELECT * FROM pgml.train( |
| 111 | + 'My First Project', |
| 112 | + algorithm => 'lasso'); |
| 113 | +``` |
| 114 | + |
| 115 | +If you navigate to the Models tab, you should see all three algorithms you just trained: |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +Huh, apparently XGBoost isn't as good we originally thought! In this case, a simple linear regression did significantly better than all the others. It's hard to know which algorithm will perform best given a dataset; even experienced machine learning engineers get this one wrong. |
| 120 | + |
| 121 | +With PostgresML, you needn't worry; you can train all of them and see which one does best for your data. PostgresML will automatically use the best one for inference. |
| 122 | + |
| 123 | +## Conclusion |
| 124 | + |
| 125 | +Congratulations on becoming a Machine Learning engineer. If you thought ML was scary or mysterious, we hope that this small tutorial made it a little bit more approachable. |
| 126 | + |
| 127 | +Keep exploring our other tutorials and try some things on your own. Happy machine learning! |
0 commit comments