-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
blog: add release announce for 9.0.0
- Loading branch information
Showing
2 changed files
with
473 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,238 @@ | ||
--- | ||
layout: post.en | ||
title: Groonga 9.0.0 has been released | ||
description: Groonga 9.0.0 has been released! | ||
published: false | ||
--- | ||
|
||
## Groonga 9.0.0 has been released | ||
|
||
[Groonga 9.0.0](/docs/news.html#release-9-0-0) has been released! | ||
|
||
This is a major version up! But It keeps backward compatibility. | ||
You can upgrade to 9.0.0 without rebuilding database. | ||
|
||
How to install: [Install](/docs/install.html) | ||
|
||
### Changes | ||
|
||
Here are important changes in this release: | ||
|
||
* [Tokenizers](/docs/reference/tokenizers.html) Added a new tokenizer `TokenPattern`. | ||
|
||
* [Tokenizers](/docs/reference/tokenizers.html) Added a new tokenizer ``TokenTable``. | ||
|
||
* [select](/docs/reference/commands/select.html) Supported similer search against index column. | ||
|
||
* [Normalizers](/docs/reference/normalizers.html) Added new option `remove_blank` for `NormalizerNFKC100`. | ||
|
||
* [groonga executable file](/docs/reference/executables/groonga.html) Improve display of thread id in log. | ||
|
||
### [Tokenizers](/docs/reference/tokenizers.html) Added a new tokenizer `TokenPattern`. | ||
|
||
You can extract tokens by regular expression as below. | ||
This tokenizer extracts only token that matches the regular expression. | ||
|
||
You can also specify multiple patterns of regular expression. | ||
|
||
``` | ||
tokenize 'TokenPattern("pattern", "\\\\d+円", "pattern", "りんご|みかん")' "私は100円のりんごと50円のみかんを129円で買いました。" | ||
[ | ||
[ | ||
0, | ||
0.0, | ||
0.0 | ||
], | ||
[ | ||
{ | ||
"value": "100円", | ||
"position": 0, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
}, | ||
{ | ||
"value": "りんご", | ||
"position": 1, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
}, | ||
{ | ||
"value": "50円", | ||
"position": 2, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
}, | ||
{ | ||
"value": "みかん", | ||
"position": 3, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
}, | ||
{ | ||
"value": "129円", | ||
"position": 4, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
} | ||
] | ||
] | ||
``` | ||
|
||
### [Tokenizers](/docs/reference/tokenizers.html) Added a new tokenizer ``TokenTable``. | ||
|
||
You can extract tokens by a key of existing a table as below. | ||
|
||
``` | ||
table_create Keywords TABLE_PAT_KEY ShortText --normalizer NormalizerNFKC100 | ||
load --table Keywords | ||
[ | ||
{"_key": "100円"}, | ||
{"_key": "りんご"}, | ||
{"_key": "29円"} | ||
] | ||
tokenize 'TokenTable("table", "Keywords")' "私は100円のりんごを29円で買いました。" | ||
[ | ||
[ | ||
0, | ||
0.0, | ||
0.0 | ||
], | ||
[ | ||
{ | ||
"value": "100円", | ||
"position": 0, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
}, | ||
{ | ||
"value": "りんご", | ||
"position": 1, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
}, | ||
{ | ||
"value": "29円", | ||
"position": 2, | ||
"force_prefix": false, | ||
"force_prefix_search": false | ||
} | ||
] | ||
] | ||
``` | ||
|
||
### [select](/docs/reference/commands/select.html) Supported similer search against index column. | ||
|
||
If you have used multi column index, you can similar search against all source columns by this feature. | ||
|
||
``` | ||
table_create Documents TABLE_HASH_KEY ShortText | ||
column_create Documents content1 COLUMN_SCALAR Text | ||
column_create Documents content2 COLUMN_SCALAR Text | ||
table_create Terms TABLE_PAT_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram | ||
column_create Terms document_index COLUMN_INDEX|WITH_POSITION|WITH_SECTION Documents content1,content2 | ||
load --table Documents | ||
[ | ||
["_key", "content1"], | ||
["groonga の概要", "groonga は転置索引を用いた高速・高精度な全文検索エンジンであり、登録された文書をすぐに検索結果に反映できます。"], | ||
["全文検索と即時更新", "一般的なデータベースにおいては、追加・削除などの操作がすぐに反映されます。一方、全文検索においては、転置索引が逐次更新の難しいデータ構造であることから、文書の追加・削除に対応しないエンジンが少なくありません。"], | ||
["カラムストアと集計クエリ", "現代は、インターネットを情報源とすれば、いくらでも情報を収集できる時代です。"] | ||
] | ||
load --table Documents | ||
[ | ||
["_key", "content2"], | ||
["転置索引とトークナイザ", "転置索引は大規模な全文検索に用いられる伝統的なデータ構造です"], | ||
["共有可能なストレージと参照ロックフリー", "CPU のマルチコア化が進んでいるため、同時に複数のクエリを実行したり、一つのクエリを複数のスレッドで実行したりすることの重要性はますます高まっています。"], | ||
["位置情報(緯度・経度)検索", "GPS に代表される測位システムを搭載した高機能な携帯端末の普及などによって、位置情報を扱うサービスはますます便利になっています。"], | ||
["groonga ライブラリ", "Groonga の基本機能は C ライブラリとして提供されているので、任意のアプリケーションに組み込んで利用することができます。"], | ||
["groonga サーバ", "groonga にはサーバ機能があるため、レンタルサーバなどの新しいライブラリをインストールできない環境においても利用できます。"], | ||
["groonga ストレージエンジン", "groonga は独自のカラムストアを持つ列指向のデータベースとしての側面を持っていますが、既存の RDBMS のストレージエンジンとして利用することもできます。"] | ||
] | ||
select Documents --filter 'Terms.document_index *S "MySQLで全文検索"' --output_columns '_key, _score, content1, content2' | ||
[ | ||
[ | ||
0, | ||
1549608674.553312, | ||
0.0007221698760986328 | ||
], | ||
[ | ||
[ | ||
[ | ||
3 | ||
], | ||
[ | ||
[ | ||
"_key", | ||
"ShortText" | ||
], | ||
[ | ||
"_score", | ||
"Int32" | ||
], | ||
[ | ||
"content1", | ||
"Text" | ||
], | ||
[ | ||
"content2", | ||
"Text" | ||
] | ||
], | ||
[ | ||
"groonga の概要", | ||
419432, | ||
"groonga は転置索引を用いた高速・高精度な全文検索エンジンであり、登録された文書をすぐに検索結果に反映できます。", | ||
"" | ||
], | ||
[ | ||
"全文検索と即時更新", | ||
209716, | ||
"一般的なデータベースにおいては、追加・削除などの操作がすぐに反映されます。一方、全文検索においては、転置索引が逐次更新の難しいデータ構造であることから、文書の追加・削除に対応しないエンジンが少なくありません。", | ||
"" | ||
], | ||
[ | ||
"転置索引とトークナイザ", | ||
209716, | ||
"", | ||
"転置索引は大規模な全文検索に用いられる伝統的なデータ構造です" | ||
] | ||
] | ||
] | ||
] | ||
``` | ||
|
||
### [Normalizers](/docs/reference/normalizers.html) Added new option `remove_blank` for NormalizerNFKC100. | ||
|
||
This option remove white spaces as below. | ||
|
||
``` | ||
normalize 'NormalizerNFKC100("remove_blank", true)' "This is a pen." | ||
[ | ||
[ | ||
0, | ||
1549528178.608151, | ||
0.0002171993255615234 | ||
], | ||
{ | ||
"normalized": "thisisapen.", | ||
"types": [ | ||
], | ||
"checks": [ | ||
] | ||
} | ||
] | ||
``` | ||
|
||
### [groonga executable file](/docs/reference/executables/groonga.html) Improve display of thread id in log. | ||
|
||
Because It was easy to confuse thread id and process id on Windows version, it made clear which is a thread id or a process id. | ||
|
||
* (Before): `|2436|1032:` | ||
* `2436` is a process id. `1032` is a thread id. | ||
* (After): `|2436|00001032:` | ||
* `2436` is a process id, `00001032` is a thread id. | ||
|
||
### Conclusion | ||
|
||
See [Release 9.0.0 2019-02-09](/docs/news.html#release-9-0-0) about detailed changes since 8.1.1 | ||
|
||
Let's search by Groonga! |
Oops, something went wrong.