From c4837604c1451636874016c49e918a55f80469a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 09:50:26 +0100 Subject: [PATCH 01/10] Update introduction. --- docs/introduction.md | 50 +++++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/docs/introduction.md b/docs/introduction.md index b7b8eeab49d..9e0a8579d85 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -1,29 +1,51 @@ --- -title: What is Quickwit? +title: Introduction slug: / sidebar_position: 1 --- -Quickwit is a distributed search engine built from the ground up to offer cost-efficiency and high reliability. By mere mortals for mere mortals, Quickwit's architecture is as simple as possible[^1]. +Quickwit is a distributed search engine designed from the ground up to offer cost-efficiency and high reliability on large data sets. -Quickwit is written in Rust and built on top of the mighty [tantivy](https://github.com/tantivy-search/tantivy) library. We designed it to index large datasets. +Quickwit is particularly well-suited for dealing with large, immutable datasets and a low average QPS$^1$. Its benefits are most apparent in a multi-tenancy or multi-index setting. -## Why Quickwit? +Common use cases for Quickwit include: -Quickwit is born from the idea that today's search engines are hard to manage and uneconomical when dealing with large datasets and a low QPS[^2] rate. Its benefits are most apparent in a multitenancy or a multi-index setting. +- Searching through logs, from a small amount to TB of data. +- Adding full-text search capabilities to [OLAP databases such as ClickHouse](link to blog post). +- Searching through backups sitting on your cloud storage by adding Quickwit index files on your same storage. -Quickwit allows true decoupled compute and storage. -We designed it to search straight from object storage like Amazon S3 in a stateless manner. +# Key features of Quickwit -Imagine hosting an arbitrary amount of indexes on Amazon S3 for $25 per TB/month and querying them with the same pool of search servers and with a subsecond latency. +Quickwit allows true decoupled compute and storage and we designed it to search straight from deep storage (cloud storage, HDFS, local file system, ...) in a stateless manner. Here is a non-exhaustive list of Quickwit’s key features: -Not only is Quickwit more cost-efficient, but search clusters are also easier to operate. One can add or remove search instances in seconds. You can also effortlessly index a massive amount of historical data using your favorite batch technology. Last but not least, Multi-tenant search is now cheap and painless. +- **Scalable distributed search:** Host an arbitrary amount of indices on S3 and query them with a pool of search servers and with sub-second latency. +- **Stream indexing:** Plug in your distributed event streaming platform and ingest TB of data. As of today, Quickwit supports Kafka natively. +- **Fault-tolerant architecture that won't lose data:** Quickwit ensures **exactly-once** semantics for indexing and your data can be safely stored on highly reliable deep storage like S3. +- **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, you can add or remove search instances in seconds. +- **Sub-second full-text search on cloud / distributed storage:** Quickwit Search re-designed indexing and index data structure to open it in less than 60ms on Amazon S3**.** +- **Time-based sharding:** Quickwit shards data by time when enabled. And you can use a second dimension to shard data thanks to our [tags feature](../design/querying.md). Time-based queries only access splits (a data piece of the index) that match the time range of the query which leads to significant performance improvements. +- **Painless multi-tenant search:** Create indexes for each tenant without hurting query performance. Or group tenants into one index and use tagging to prune irrelevant splits for your tenant query to improve significantly performance. -- [Take a look at the feature set](overview/features.md) -- [Get started](getting-started/quickstart.md) +# When to use Quickwit +Quickwit should be a good match if your use case has some of the following characteristics: ---- -[^1] ... But not one bit simpler. +- Your documents are immutable. +- You are targeting query latencies of 100ms to a few seconds. +- You have a low average QPS$^1$, typically < 10 QPS on average over the month. This is the case for most search use cases as long as search is not public: enterprise search, log search, email search, security search, ... +- Your data has a time component. Quickwit includes optimizations and design choices specifically related to time. +- You want to load data from Kafka, local files (and soon directly from object storage like Amazon S3). +- You want full-text search in a multi-tenant environment. + +Use cases where you would likely *not* want to use Quickwit include: + +- You need a low-latency search for e-commerce websites. +- Your data are mutable. + +# Learn more + +- [Quickstart](./get-started/quickstart.md) +- [Architecture](./design/architecture.md) +- [0.2 Release](https://quickwit.io/blog/quickwit-0.2) -[^2] QPS stands for Queries per second. It is a standard measure of the amount of search traffic. +1: QPS stands for Queries per second. It is a standard measure of the amount of search traffic. Low average QPS is typically under 10. This is the case for most search use cases as long as search is not public: enterprise search, log search, email search, security search, ... \ No newline at end of file From 5bff581d67955558434af57227aead896d059101 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:02:30 +0100 Subject: [PATCH 02/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 9e0a8579d85..2ff31ca62fd 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -6,7 +6,7 @@ sidebar_position: 1 Quickwit is a distributed search engine designed from the ground up to offer cost-efficiency and high reliability on large data sets. -Quickwit is particularly well-suited for dealing with large, immutable datasets and a low average QPS$^1$. Its benefits are most apparent in a multi-tenancy or multi-index setting. +Quickwit is particularly well-suited for dealing with large, immutable datasets and relatively low average QPS$^1$. Its benefits are most apparent in multi-tenancy or multi-index settings. Common use cases for Quickwit include: From 52b0092ae9d4a9c36d95ad7f6fd899578597d587 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:02:37 +0100 Subject: [PATCH 03/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 2ff31ca62fd..5d3b92ec633 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -10,7 +10,7 @@ Quickwit is particularly well-suited for dealing with large, immutable datasets Common use cases for Quickwit include: -- Searching through logs, from a small amount to TB of data. +- Searching through logs, from small amounts of data to terabytes. - Adding full-text search capabilities to [OLAP databases such as ClickHouse](link to blog post). - Searching through backups sitting on your cloud storage by adding Quickwit index files on your same storage. From 98ff429fe7bc8b69e0d4c0ac1a3ee88ba6a07d45 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:08:21 +0100 Subject: [PATCH 04/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 5d3b92ec633..4c0e98190bf 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -21,7 +21,7 @@ Quickwit allows true decoupled compute and storage and we designed it to search - **Scalable distributed search:** Host an arbitrary amount of indices on S3 and query them with a pool of search servers and with sub-second latency. - **Stream indexing:** Plug in your distributed event streaming platform and ingest TB of data. As of today, Quickwit supports Kafka natively. - **Fault-tolerant architecture that won't lose data:** Quickwit ensures **exactly-once** semantics for indexing and your data can be safely stored on highly reliable deep storage like S3. -- **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, you can add or remove search instances in seconds. +- **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, add or remove search nodes within seconds. - **Sub-second full-text search on cloud / distributed storage:** Quickwit Search re-designed indexing and index data structure to open it in less than 60ms on Amazon S3**.** - **Time-based sharding:** Quickwit shards data by time when enabled. And you can use a second dimension to shard data thanks to our [tags feature](../design/querying.md). Time-based queries only access splits (a data piece of the index) that match the time range of the query which leads to significant performance improvements. - **Painless multi-tenant search:** Create indexes for each tenant without hurting query performance. Or group tenants into one index and use tagging to prune irrelevant splits for your tenant query to improve significantly performance. From 733264845a6d692ec14a47a44117af509675abb1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:09:18 +0100 Subject: [PATCH 05/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 4c0e98190bf..51332a75c1d 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -20,7 +20,7 @@ Quickwit allows true decoupled compute and storage and we designed it to search - **Scalable distributed search:** Host an arbitrary amount of indices on S3 and query them with a pool of search servers and with sub-second latency. - **Stream indexing:** Plug in your distributed event streaming platform and ingest TB of data. As of today, Quickwit supports Kafka natively. -- **Fault-tolerant architecture that won't lose data:** Quickwit ensures **exactly-once** semantics for indexing and your data can be safely stored on highly reliable deep storage like S3. +- **Fault-tolerant architecture that won't lose data:** Quickwit achieves **exactly-once** processing for indexing and safely stores your data on highly reliable object storage services such as Amazon S3. - **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, add or remove search nodes within seconds. - **Sub-second full-text search on cloud / distributed storage:** Quickwit Search re-designed indexing and index data structure to open it in less than 60ms on Amazon S3**.** - **Time-based sharding:** Quickwit shards data by time when enabled. And you can use a second dimension to shard data thanks to our [tags feature](../design/querying.md). Time-based queries only access splits (a data piece of the index) that match the time range of the query which leads to significant performance improvements. From dc24d7fa2d84c931f6af4b36a76234fdbc9bc270 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:11:10 +0100 Subject: [PATCH 06/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 51332a75c1d..69571d6297b 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -16,7 +16,7 @@ Common use cases for Quickwit include: # Key features of Quickwit -Quickwit allows true decoupled compute and storage and we designed it to search straight from deep storage (cloud storage, HDFS, local file system, ...) in a stateless manner. Here is a non-exhaustive list of Quickwit’s key features: +Quickwit is designed to search straight from object storage allowing true decoupled compute and storage. Here is a non-exhaustive list of Quickwit’s key features: - **Scalable distributed search:** Host an arbitrary amount of indices on S3 and query them with a pool of search servers and with sub-second latency. - **Stream indexing:** Plug in your distributed event streaming platform and ingest TB of data. As of today, Quickwit supports Kafka natively. From d1e0cd5d4ffd3a389377a22361bfbf4041313ba4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:11:18 +0100 Subject: [PATCH 07/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 69571d6297b..f28ec5735b8 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -19,7 +19,7 @@ Common use cases for Quickwit include: Quickwit is designed to search straight from object storage allowing true decoupled compute and storage. Here is a non-exhaustive list of Quickwit’s key features: - **Scalable distributed search:** Host an arbitrary amount of indices on S3 and query them with a pool of search servers and with sub-second latency. -- **Stream indexing:** Plug in your distributed event streaming platform and ingest TB of data. As of today, Quickwit supports Kafka natively. +- **Stream indexing:** Ingest TB of data from your favorite distributed event streaming service. As of today, Quickwit supports Apache Kafka natively. The next releases will bring support for more platforms. - **Fault-tolerant architecture that won't lose data:** Quickwit achieves **exactly-once** processing for indexing and safely stores your data on highly reliable object storage services such as Amazon S3. - **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, add or remove search nodes within seconds. - **Sub-second full-text search on cloud / distributed storage:** Quickwit Search re-designed indexing and index data structure to open it in less than 60ms on Amazon S3**.** From 5ef96a85e8a772450df4393160ff5034b232bcf0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:12:52 +0100 Subject: [PATCH 08/10] Update docs/introduction.md Co-authored-by: Adrien Guillo --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index f28ec5735b8..893402aa6f0 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -18,7 +18,7 @@ Common use cases for Quickwit include: Quickwit is designed to search straight from object storage allowing true decoupled compute and storage. Here is a non-exhaustive list of Quickwit’s key features: -- **Scalable distributed search:** Host an arbitrary amount of indices on S3 and query them with a pool of search servers and with sub-second latency. +- **Scalable distributed search:** Host an arbitrary number of indexes on Amazon S3 and answer search queries in less than a second with a small pool of stateless search instances. - **Stream indexing:** Ingest TB of data from your favorite distributed event streaming service. As of today, Quickwit supports Apache Kafka natively. The next releases will bring support for more platforms. - **Fault-tolerant architecture that won't lose data:** Quickwit achieves **exactly-once** processing for indexing and safely stores your data on highly reliable object storage services such as Amazon S3. - **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, add or remove search nodes within seconds. From c33d712aaa471128d47d9a68f59347da72d25229 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:14:51 +0100 Subject: [PATCH 09/10] Fix link. --- docs/introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/introduction.md b/docs/introduction.md index 893402aa6f0..3b4c4ea560f 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -11,7 +11,7 @@ Quickwit is particularly well-suited for dealing with large, immutable datasets Common use cases for Quickwit include: - Searching through logs, from small amounts of data to terabytes. -- Adding full-text search capabilities to [OLAP databases such as ClickHouse](link to blog post). +- Adding full-text search capabilities to [OLAP databases such as ClickHouse](./guides/add-full-text-search-to-your-olap-db.md). - Searching through backups sitting on your cloud storage by adding Quickwit index files on your same storage. # Key features of Quickwit From 6691e2236b6553def4fe739380a2fb46144a771a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Massot?= Date: Tue, 11 Jan 2022 11:21:41 +0100 Subject: [PATCH 10/10] Fix footnote --- docs/introduction.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/introduction.md b/docs/introduction.md index 3b4c4ea560f..2deb180702e 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -6,7 +6,7 @@ sidebar_position: 1 Quickwit is a distributed search engine designed from the ground up to offer cost-efficiency and high reliability on large data sets. -Quickwit is particularly well-suited for dealing with large, immutable datasets and relatively low average QPS$^1$. Its benefits are most apparent in multi-tenancy or multi-index settings. +Quickwit is particularly well-suited for dealing with large, immutable datasets and relatively low average QPS[1](#footnote1). Its benefits are most apparent in multi-tenancy or multi-index settings. Common use cases for Quickwit include: @@ -23,7 +23,7 @@ Quickwit is designed to search straight from object storage allowing true decoup - **Fault-tolerant architecture that won't lose data:** Quickwit achieves **exactly-once** processing for indexing and safely stores your data on highly reliable object storage services such as Amazon S3. - **Cloud-native, easy to operate:** Thanks to true decoupled compute and storage, search instances are stateless, add or remove search nodes within seconds. - **Sub-second full-text search on cloud / distributed storage:** Quickwit Search re-designed indexing and index data structure to open it in less than 60ms on Amazon S3**.** -- **Time-based sharding:** Quickwit shards data by time when enabled. And you can use a second dimension to shard data thanks to our [tags feature](../design/querying.md). Time-based queries only access splits (a data piece of the index) that match the time range of the query which leads to significant performance improvements. +- **Time-based sharding:** Quickwit shards data by time when enabled. And you can use a second dimension to shard data thanks to our [tags feature](./design/querying.md). Time-based queries only access splits (a data piece of the index) that match the time range of the query which leads to significant performance improvements. - **Painless multi-tenant search:** Create indexes for each tenant without hurting query performance. Or group tenants into one index and use tagging to prune irrelevant splits for your tenant query to improve significantly performance. # When to use Quickwit @@ -32,7 +32,7 @@ Quickwit should be a good match if your use case has some of the following chara - Your documents are immutable. - You are targeting query latencies of 100ms to a few seconds. -- You have a low average QPS$^1$, typically < 10 QPS on average over the month. This is the case for most search use cases as long as search is not public: enterprise search, log search, email search, security search, ... +- You have a low average QPS[1](#footnote1), typically < 10 QPS on average over the month. This is the case for most search use cases as long as search is not public: enterprise search, log search, email search, security search, ... - Your data has a time component. Quickwit includes optimizations and design choices specifically related to time. - You want to load data from Kafka, local files (and soon directly from object storage like Amazon S3). - You want full-text search in a multi-tenant environment. @@ -48,4 +48,6 @@ Use cases where you would likely *not* want to use Quickwit include: - [Architecture](./design/architecture.md) - [0.2 Release](https://quickwit.io/blog/quickwit-0.2) -1: QPS stands for Queries per second. It is a standard measure of the amount of search traffic. Low average QPS is typically under 10. This is the case for most search use cases as long as search is not public: enterprise search, log search, email search, security search, ... \ No newline at end of file + +--- +1.: QPS stands for Queries per second. It is a standard measure of the amount of search traffic. Low average QPS is typically under 10. This is the case for most search use cases as long as search is not public: enterprise search, log search, email search, security search, ...