From 8cae8861b2d11b75295ed69836e84ae92501eb7d Mon Sep 17 00:00:00 2001 From: Lev Date: Fri, 26 Apr 2024 15:35:58 -0700 Subject: [PATCH] SQL api docs --- pgml-cms/docs/SUMMARY.md | 2 +- pgml-cms/docs/api/apis.md | 2 +- pgml-cms/docs/api/sql-extension/README.md | 211 ++++++++++++++---- .../src/components/cms/index_link/mod.rs | 2 +- 4 files changed, 172 insertions(+), 45 deletions(-) diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md index fd87411ad..fdfa3a683 100644 --- a/pgml-cms/docs/SUMMARY.md +++ b/pgml-cms/docs/SUMMARY.md @@ -15,7 +15,7 @@ ## API * [Overview](api/apis.md) -* [SQL Extension](api/sql-extension/README.md) +* [SQL extension](api/sql-extension/README.md) * [pgml.deploy()](api/sql-extension/pgml.deploy.md) * [pgml.embed()](api/sql-extension/pgml.embed.md) * [pgml.chunk()](api/sql-extension/pgml.chunk.md) diff --git a/pgml-cms/docs/api/apis.md b/pgml-cms/docs/api/apis.md index 70f3b1ed0..a73b2ba90 100644 --- a/pgml-cms/docs/api/apis.md +++ b/pgml-cms/docs/api/apis.md @@ -18,7 +18,7 @@ The PostgreSQL extension provides all of the ML & AI functionality, like trainin The following functions are implemented and maintained by the PostgresML extension: -| Function name | Description | +| Function | Description | |------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [pgml.embed()](sql-extension/pgml.embed) | Generate embeddings inside the database using open source embedding models from Hugging Face. | | [pgml.transform()](sql-extension/pgml.transform/) | Download and run latest Hugging Face transformer models, like Llama, Mixtral, and many more to perform various NLP tasks like text generation, summarization, sentiment analysis and more. | diff --git a/pgml-cms/docs/api/sql-extension/README.md b/pgml-cms/docs/api/sql-extension/README.md index 326deb140..59610a2d0 100644 --- a/pgml-cms/docs/api/sql-extension/README.md +++ b/pgml-cms/docs/api/sql-extension/README.md @@ -1,69 +1,196 @@ --- description: >- - The pgml extension for PostgreSQL provides Machine Learning and Artificial + The PostgresML extension for PostgreSQL provides Machine Learning and Artificial Intelligence APIs with access to algorithms to train your models, or download - SOTA open source models from HuggingFace. + state-of-the-art open source models from Hugging Face. --- -# SQL Extension +# SQL extension -## Open Source Models +PostgresML is a PostgreSQL extension which adds SQL functions to the database. Those functions provide access to AI models downloaded from Hugging Face, and classical machine learning algorithms like XGBoost and LightGBM. -PostgresML integrates [🤗 Hugging Face Transformers](https://huggingface.co/transformers) to bring state-of-the-art models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw inputs into useful results. Many LLMs have been published and made available for download. You will want to browse all the [models](https://huggingface.co/models) available to find the perfect solution for your [dataset](https://huggingface.co/dataset) and [task](https://huggingface.co/tasks). The pgml extension provides a few APIs for different use cases: +Our SQL API is stable and safe to use in your applications, while the models and algorithms we support continue to evolve and improve. -* [pgml.embed.md](pgml.embed.md "mention") returns vector embeddings for nearest neighbor searches and other vector database use cases -* [pgml.generate.md](pgml.generate.md "mention") returns streaming text responses for chatbots -* [pgml.transform](../../api/sql-extension/pgml.transform/ "mention") allows you to perform dozens of natural language processing (NLP) tasks with thousands of models, like sentiment analysis, question and answering, translation, summarization and text generation -* [pgml.tune.md](pgml.tune.md "mention") fine tunes an open source model on your own data +## Open-source LLMs -## Train & deploy your own models +PostgresML defines two SQL functions which use [🤗 Hugging Face](https://huggingface.co/transformers) transformers and embeddings models, running directly in the database: -PostgresML also supports more than 50 machine learning algorithms to train your own models for classification, regression or clustering. We organize a family of Models in Projects that are intended to address a particular opportunity. Different algorithms can be used in the same Project, to test and compare the performance of various approaches, and track progress over time, all within your database. +| Function | Description | +|---------------|-------------| +| [pgml.embed()](pgml.embed) | Generate embeddings using latest sentence transformers from Hugging Face. | +| [pgml.transform()](pgml.transform/) | Text generation using LLMs like Llama, Mixtral, and many more, with models downloaded from Hugging Face. | +| pgml.transform_stream() | Streaming version of [pgml.transform()](pgml.transform/), which enables to fetch partial responses as they are being generated by the model. | +| [pgml.tune()](pgml.tune) | Perform fine tuning tasks on Hugging Face models, using data stored in the database. | -### Train +### Example -Training creates a Model based on the data in your database. +Using a SQL function for interacting with open-source models makes things really easy: -```sql -SELECT pgml.train( - project_name = > 'Sales Forecast', - task => 'regression', - relation_name => 'hist_sales', - y_column_name => 'next_sales', - algorithm => 'xgboost' -); +{% tabs %} +{% tab title="SQL" %} + +```postgresql +SELECT pgml.embed( + 'intfloat/e5-small', + 'This text will be embedded using the intfloat/e5-small model.' +) AS embedding; +``` + +{% endtab %} +{% tab title="Output" %} + +``` + embedding +------------------------------------------- + {-0.028478337,-0.06275077,-0.04322059, [...] ``` -See [pgml.train](../../api/sql-extension/pgml.train/README.md) for more information. +{% endtab %} +{% endtabs %} + +Using the `pgml` SQL functions inside regular queries, it's possible to add embeddings and LLM-generated text inside any query, without the data ever leaving the database and the cost of a remote network call. + +## Classical machine learning + +PostgresML defines three SQL functions which allow training regression, classification, and clustering models on tabular data: + +| Function | Description | +|---------------|-------------| +| [pgml.train()](pgml.train/) | Train a model on PostgreSQL tables or views using any algorithm from Scikit-learn, with the additional support for XGBoost, LightGBM and Catboost. | +| [pgml.predict()](pgml.predict/) | Run inference on live application data using a model trained with [pgml.train()](pgml.train/). | +| [pgml.deploy()](pgml.deploy) | Deploy a specific version of a model trained with pgml.train(), using your own accuracy metrics. | +| pgml.load_dataset() | Load any of the toy datasets from Scikit-learn or any dataset from Hugging Face. | + +### Example + +#### Load data -### Deploy +Using `pgml.load_dataset()`, we can load an example classification dataset from Scikit-learn: -Deploy an active Model for a particular Project, using a deployment strategy to select the best model. +{% tabs %} +{% tab title="SQL" %} -```sql -SELECT pgml.deploy( - project_name => 'Sales Forecast', - strategy => 'best_score', - algorithm => 'xgboost' +```postgresql +SELECT * +FROM pgml.load_dataset('digits'); +``` + +{% endtab %} +{% tab title="Output" %} + +``` + table_name | rows +-------------+------ + pgml.digits | 1797 +(1 row) +``` + +{% endtab %} +{% endtabs %} + +#### Train a model + +Once we have some data, we can train a model on this data using [pgml.train()](pgml.train/): + +{% tabs %} +{% tab title="SQL" %} + +```postgresql +SELECT * +FROM pgml.train( + project_name => 'My project name', + task => 'classification', + relation_name =>'pgml.digits', + y_column_name => 'target', + algorithm => 'xgboost', ); ``` -See [pgml.deploy.md](pgml.deploy.md "mention") for more information. +{% endtab %} +{% tab title="Output" %} -### Predict +``` +INFO: Metrics: { + "f1": 0.8755124, + "precision": 0.87670505, + "recall": 0.88005465, + "accuracy": 0.87750554, + "mcc": 0.8645154, + "fit_time": 0.33504912, + "score_time": 0.001842427 +} + + project | task | algorithm | deployed +-----------------+----------------+-----------+---------- + My project name | classification | xgboost | t +(1 row) -Use your Model on novel data points not seen during training to infer a new data point. +``` + +{% endtab %} +{% endtabs %} + +[pgml.train()](pgml.train/) reads data from the table, using the `target` column as the label, automatically splits the dataset into test and train sets, and trains an XGBoost model. Our extension supports more than 50 machine learning algorithms, and you can train a model using any of them by just changing the name of the `algorithm` argument. + + +#### Real time inference + +Now that we have a model, we can use it to predict new data points, in real time, on live application data: -```sql -SELECT pgml.predict( - project_name => 'Sales Forecast', - features => ARRAY[ - last_week_sales, - week_of_year - ] +{% tabs %} +{% tab title="SQL" %} + +```postgresql +SELECT + target, + pgml.predict( + 'My project name', + image ) AS prediction -FROM new_sales -ORDER BY prediction DESC; +FROM + pgml.digits +LIMIT 1; +``` + +{% endtab %} +{% tab title="Output" %} + +``` + target | prediction +--------+------------ + 0 | 0 +(1 row) +``` + +{% endtab %} +{% endtabs %} + +#### Change model version + +The train function automatically deploys the best model into production, using the precision score relevant to the type of the model. If you prefer to deploy models using your own accuracy metrics, the [pgml.deploy()](pgml.deploy) function can manually change which model version is used for subsequent database queries: + +{% tabs %} +{% tab title="SQL" %} + +```postgresql +SELECT * +FROM + pgml.deploy( + 'My project name', + strategy => 'most_recent', + algorithm => 'xgboost' +); +``` + +{% endtab %} +{% tab title="Output" %} + +``` + project | strategy | algorithm +-----------------+-------------+----------- + My project name | most_recent | xgboost +(1 row) ``` -See[pgml.predict](../../api/sql-extension/pgml.predict/ "mention") for more information. +{% endtab %} +{% endtabs %} diff --git a/pgml-dashboard/src/components/cms/index_link/mod.rs b/pgml-dashboard/src/components/cms/index_link/mod.rs index 0572dfd47..376104f2f 100644 --- a/pgml-dashboard/src/components/cms/index_link/mod.rs +++ b/pgml-dashboard/src/components/cms/index_link/mod.rs @@ -73,7 +73,7 @@ impl IndexLink { self } - // Adds a suffix to this and all children ids. + // Adds a suffix to this and all children ids. // this prevents id collision with multiple naves on one screen // like d-none for mobile nav pub fn id_suffix(mut self, id_suffix: &str) -> IndexLink {