Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
107 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| --- | ||
| layout: post.en | ||
| title: PGroonga 1.0.0 has been released | ||
| description: PGroonga 1.0.0 has been released! | ||
| --- | ||
|
|
||
| ## PGroonga 1.0.0 has been released | ||
|
|
||
| PGroonga (píːzí:lúnɡά) 1.0.0 has been released! It's the first major release! | ||
|
|
||
| ### About PGroonga | ||
|
|
||
| PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages! | ||
|
|
||
| There are some PostgreSQL extensions that improves full text search feature of PostgreSQL such as [pg_trgm](http://www.postgresql.org/docs/current/static/pgtrgm.html) and [pg_bigm](http://pgbigm.osdn.jp/index_en.html). | ||
|
|
||
| pg\_trgm doesn't support languages that use non-alphanumerics characters such as Japanese and Chinese. | ||
|
|
||
| pg\_bigm supports languages that use non-alphanumerics characters but it's slow. | ||
|
|
||
| PGroonga supports all languages, provides rich full text search related features and is very fast. Because PGroonga uses [Groonga](http://groonga.org/) that is a full-fledged full text search engine as backend. | ||
|
|
||
| For example, PGroonga is a few times faster than pg\_bigm. In some cases, PGroonga is 10 times over faster than pg\_bigm. | ||
|
|
||
| Here are benchmark results between PGroonga and pg\_bigm. They use Japanese Wikipedia data. | ||
|
|
||
| Here is a benchmark result for creating an index: | ||
|
|
||
| Extension | Index creation time | ||
| -----------|-------------------- | ||
| PGroonga | 25m 37s | ||
| pg_bigm | 5h 56m 15s | ||
|
|
||
| In this case, PGroonga is about 14 times faster than pg\_bigm. | ||
|
|
||
| Here is a benchmark result for full text search: | ||
|
|
||
| Search keywords | N hits | PGroonga | pg\_bigm | ||
| ---------------------------------------|----------|------------|--------- | ||
| "PostgreSQL" or "MySQL" | 368 | *0.030s* | 0.107s | ||
| データベース (database in Japanese) | 17172 | *0.121s* | 1.224s | ||
| テレビアニメ (TV animation in Japanese) | 22885 | *0.179s* | 2.472s | ||
| 日本 (Japan in Japanese) | 625792 | 0.646s | *0.556s* | ||
|
|
||
| In "日本" (Japan in Japanese) case, pg\_bigm is a bit faster(\*) than PGroonga. But PGroonga is 3 times to 14 times faster than pg\_bigm in other cases. The result shows that PGroonga can perform stable high performance fast full text search against all keywords. | ||
|
|
||
| (\*) pg\_bigm can perform faster full text search against keywords that have 2 or less characters rather than keywords that have 3 or more characters. | ||
|
|
||
| PGroonga provides the following features that aren't provided by other extensions: | ||
|
|
||
| * Normalize feature | ||
| * Custom tokenizer feature | ||
| * Snippet feature | ||
|
|
||
| Normalize feature is a feature that converts different notation texts to unified notation text. | ||
|
|
||
| For simple example, both "ABC" and "abc" are converted to "abc". | ||
|
|
||
| For more complex example, both "ポスグレ" (HALFWIDTH KATAKANA) and "ポスグレ" (FULLWIDTH KATAKANA) are converted to "ポスグレ" (FULLWIDTH KATAKANA). ("ポスグレ" is an abbreviation of PostgreSQL in Japanese.) | ||
|
|
||
| This normalization is based on [Unicode NFKC](http://unicode.org/reports/tr15/). | ||
|
|
||
| Custom tokenizer feature is a feature that customizes search keyword extraction process (tokenization). If you can custom tokenization, you can control trade-off between search precision and search performance. | ||
|
|
||
| For example, if you use "tokenizer that is based of morphological analyzer", you can get better search precision and search performance but may not find some texts. | ||
|
|
||
| FYI: There is no other extension that supports morphological analyzer based tokenizer. PGroonga is the only extension that supports it. | ||
|
|
||
| Snippet feature is a feature that shows texts around keyword. It's used by Web search engine. Google also uses it. You can find it under page title in hit page list. PGroonga provides a function that implements it. | ||
|
|
||
| There are more features: | ||
|
|
||
| * Query language that uses Web search engine like syntax | ||
| * JSON search | ||
| * You can use each value for condition. You can also perform full text search against all texts in JSON. No other extension such as [JsQuery](https://github.com/postgrespro/jsquery) doesn't provide full text search feature against JSON. | ||
|
|
||
| Here are features that will be implemented in the feature. They are already implemented in Groonga. | ||
|
|
||
| * Query expansion feature | ||
| * Weight feature | ||
| * Stemming feature | ||
|
|
||
| ### Usage | ||
|
|
||
| You can use PGroonga without full text search knowledge. You just create an index and puts a condition into `WHERE`: | ||
|
|
||
| ```sql | ||
| CREATE INDEX index_name ON table USING pgroonga (column); | ||
|
|
||
| SELECT * FROM table WHERE column @@ 'PostgreSQL'; | ||
| ``` | ||
|
|
||
| You can also use `LIKE` to use PGroonga. PGroonga provides a feature that performs `LIKE` with index. `LIKE` with PGroonga index is faster than `LIKE` without index. It means that you can improve performance without changing your application that uses the following SQL: | ||
|
|
||
| ```sql | ||
| SELECT * FROM table WHERE column LIKE '%PostgreSQL%'; | ||
| ``` | ||
|
|
||
| Are you interested in PGroonga? Please [install](http://pgroonga.github.io/install/) and try [tutorial](http://pgroonga.github.io/tutorial/). You can know all PGroonga features. | ||
|
|
||
| You can install PGroonga easily. Because PGroonga provides packages for major platforms. There are binaries for Windows. | ||
|
|
||
| ## Conclusion | ||
|
|
||
| New PGroonga version has been released. PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages. | ||
|
|
||
| It's the first major release. If you want to use fast full text search for all langauges on PostgreSQL, try PGroonga! |