diff --git a/containers/bundled_querybook_config.yaml b/containers/bundled_querybook_config.yaml index de4ae4831..487738c3d 100644 --- a/containers/bundled_querybook_config.yaml +++ b/containers/bundled_querybook_config.yaml @@ -22,7 +22,7 @@ ELASTICSEARCH_HOST: http://elasticsearch:9200 # model_name: gpt-3.5-turbo-16k # temperature: 0 # context_length: 16384 -# query_summary: +# sql_summary: # model_args: # model_name: gpt-3.5-turbo-16k # temperature: 0 diff --git a/docs_website/docs/changelog/2023-09-20.md b/docs_website/docs/changelog/2023-09-20.md new file mode 100644 index 000000000..14ca6a263 --- /dev/null +++ b/docs_website/docs/changelog/2023-09-20.md @@ -0,0 +1,89 @@ +--- +id: sept_2023_9_20_0 +title: Sept 2023 (version 3.28.0) +sidebar_label: Sept 2023 (3.28.0) +--- + +Welcome to the latest release of Querybook 🎉. + +Following are the top new features we have added during the year 2023 thus far: + +- **AI assistant**: Support query cell title generation, text-to-sql and query auto fix, powered by LLM. +- **Vector table search**: Use natural language to search a table. +- **Data cell/table comment**: Users can leave comments for data cells and tables. +- **Query optimization suggestions**: Provide a tooltip of query optimization suggestions. +- **User group**: Introduce user groups, which can be used as table/datadoc owner/editor. +- **Data element**: Introdce a new metadata `data element`, which provides semantic data meaning and can be assigned to a table column. +- **Stats logging**: Add support of stats logging, like number of users, number of api requests and etc. + +## Feature highlights + +### AI Assistant + +The LLM powered AI assistant can help on + +- Query cell title generation +- Text to SQL +- Query error auto fix + +Please check the [guide](../user_guide/ai_assistant.md) for more details. + +### Vector Table Search + +Previously table searching is only keyword based search. Now we introduced [vector store plugin](../integrations/add_ai_assistant.md#vector-store-plugin) and added the support of searching a table by natural language. +![](/img/user_guide/table_vector_search.png) + +### Data Cell & Table Comment + +Users can leave comments to a data cell/table or view comments from other people. + +![](/changelog/20230920/cell_comment.png) +![](/changelog/20230920/table_comment.png) + +### Query Optimization Suggestions + +The query editor will provide optimization suggestions for some cases. Here are some predefined one for Presto/Trino + +- distinct count -> approx_distinct +- like 'a' or like 'b' -> regexp_like(column, 'a|b') +- union -> union all + +You can create you own suggestions by following the example of [PrestoOptimizingValidator](https://github.com/pinterest/querybook/blob/c8949b21c854b367d7bf54f08fbe1a12ad4a47c2/querybook/server/lib/query_analysis/validation/validators/presto_optimizing_validator.py#L177) + +Check the [PR](https://github.com/pinterest/querybook/pull/1302) for more details. + +### User Group + +We introduced the support of user group. Now a user in querybook can be a single user or a user group. A table could be owned by a user group, or a datadoc can be shared to a user group(haven't implemented, PR in progress). +![](https://user-images.githubusercontent.com/8308723/216733976-7c2c27cb-ec1b-4401-81e7-c5069798326e.png) + +### Data element + +A [data element](https://en.wikipedia.org/wiki/Data_element) is an atomic unit of data that has precise meaning or precise semantics, like country, age and etc. We added data element as a new metadata, which can be assigned to a table column to provide more meaningful info for the column. + +Note: it can only be synced from metastore. + +![](https://user-images.githubusercontent.com/8308723/224444625-067f1527-d936-409d-b99c-a25f4a676c21.png) + +### Stats logging + +Add support of stats logging, like number of users, number of api requests and etc. Please check the [plugin](../integrations/add_stats_logger.md) for more details. + +## Small Feature Improvements/Bug Fixes + +- Add username and password authentication for the trino client [#1315](https://github.com/pinterest/querybook/pull/1315) +- Add two new plugins: [monkey patch plugin](../integrations/plugins.md#monkey-patch-plugin) and [api plugin](../integrations/plugins.md#api-plugin) [#1266](https://github.com/pinterest/querybook/pull/1266) +- Fix the display of long table names in search modal [#1246](https://github.com/pinterest/querybook/pull/1246) +- Allow data doc deletion from sidebar [#1241](https://github.com/pinterest/querybook/pull/1241) +- Ensure meta_info is updated when an exception occurs [#1230](https://github.com/pinterest/querybook/pull/1230) +- Add helm deployment guide [#1183](https://github.com/pinterest/querybook/pull/1183) +- Add more metadata support [#1182](https://github.com/pinterest/querybook/pull/1182) +- Enable mssql transpiling [#1178](https://github.com/pinterest/querybook/pull/1178) +- Add ability to cancel dead queries [#1159](https://github.com/pinterest/querybook/pull/1159) +- Fix json-bigint hasOwnProperty undefined issue [#1129](https://github.com/pinterest/querybook/pull/1129) +- Add frontend context logging [#1115](https://github.com/pinterest/querybook/pull/1115) +- Add drag and drop for templated variables [#1112](https://github.com/pinterest/querybook/pull/1112) + +Querybook Team
+Pinterest
+🚀 diff --git a/docs_website/docs/integrations/add_ai_assistant.md b/docs_website/docs/integrations/add_ai_assistant.md index e069906fb..df45ff148 100644 --- a/docs_website/docs/integrations/add_ai_assistant.md +++ b/docs_website/docs/integrations/add_ai_assistant.md @@ -45,3 +45,13 @@ How to set up and host a vector store or use a cloud vector store solution is no 4. Enable it in `querybook/config/querybook_public_config.yaml` With vector store plugin enabled, text-to-sql will also use it to find tables if tables are not provided by the user. + +### Initilize the Vector Index + +In Docker based deployments, attach to `web` or `worker` component and run + + ```shell + python ./querybook/server/scripts/init_vector_store.py + ``` + +It will add summary for all tables and sample query summary of the tables to the vector store. If you'd like to only index part of the tables, you can follow the example of `ingest_vector_index` to create your own script. diff --git a/docs_website/sidebars.json b/docs_website/sidebars.json index d0ca6badc..5c36e240a 100755 --- a/docs_website/sidebars.json +++ b/docs_website/sidebars.json @@ -59,6 +59,7 @@ "Changelog": [ "changelog/breaking_changes", "changelog/security_advisories", + "changelog/sept_2023_9_20_0", "changelog/dec_2022_3_15_0", "changelog/nov_2020_2_4_2", "changelog/may_2020_2_3_0", diff --git a/docs_website/static/changelog/20230920/cell_comment.png b/docs_website/static/changelog/20230920/cell_comment.png new file mode 100644 index 000000000..fc7d43e8e Binary files /dev/null and b/docs_website/static/changelog/20230920/cell_comment.png differ diff --git a/docs_website/static/changelog/20230920/table_comment.png b/docs_website/static/changelog/20230920/table_comment.png new file mode 100644 index 000000000..981e8fd61 Binary files /dev/null and b/docs_website/static/changelog/20230920/table_comment.png differ