Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
140 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| --- | ||
| layout: post.en | ||
| title: Groonga 3.0.5 has been released | ||
| description: Groonga 3.0.5 has been released! | ||
| published: false | ||
| --- | ||
|
|
||
|
|
||
| h2. Groonga 3.0.5 has been released | ||
|
|
||
| "Groonga 3.0.5":/docs/news.html#release-3-0-5 has been released! | ||
|
|
||
| How to install: "Install":/docs/install.html | ||
|
|
||
| There are two topics for this release. | ||
|
|
||
| * Supported single quoted string literal in output_columns | ||
| * Supported "html_untag":/docs/reference/functions/html_untag.html function experimentally | ||
|
|
||
| h3. Supported single quoted string literal in output_columns | ||
|
|
||
| In this release, we began to support single quoted string literal in output_columns. | ||
|
|
||
| Since groonga 3.0.2 release, complex string concatination in @--output_columns@ had been supported. | ||
| This feature support following expression: | ||
|
|
||
| <pre> | ||
| '"<" + title + ">"' | ||
| </pre> | ||
|
|
||
| Note that 'title' means 'title' column in this case. Above query returns @"<(CONTENT OF TITLE)>"@. | ||
|
|
||
| But there is the fact that single quote isn't supported in string literal at that time. | ||
|
|
||
| Here is the sample schema: | ||
|
|
||
| <pre> | ||
| table_create Entries TABLE_NO_KEY | ||
| column_create Entries title COLUMN_SCALAR ShortText | ||
|
|
||
| load --table Entries | ||
| [ | ||
| {"title": "Single quote and double quote"} | ||
| ] | ||
| </pre> | ||
|
|
||
| In the previous release, there are some way to get @"<(CONTENT OF TITLE)>"@. | ||
|
|
||
| * @select Entries --output_columns '_id, "<" + title + ">"' --command_version 2@ | ||
| * @select Entries --output_columns "_id, \"<\" + title + \">\"" --command_version 2@ | ||
|
|
||
| Here is the revised query using single quote in string literal for groonga 3.0.5: | ||
|
|
||
| <pre> | ||
| select Entries --output_columns "_id, '<' + title + '>'" --command_version 2 | ||
| </pre> | ||
|
|
||
| As single quote has been supported, groonga 3.0.5 returns intended result sets even though the query which groonga 3.0.4 returns empty result. | ||
|
|
||
| Here is the sample queries which groonga 3.0.4 or earlier version returns empty set: | ||
|
|
||
| <pre> | ||
| # <"(contents of title column)"> | ||
| select Entries --output_columns "_id, '<\"' + title + '\">'" --command_version 2 | ||
| #=> [1,"<\"Single quote and double quote\">"] | ||
|
|
||
| # <'(contents of title column)'> | ||
| select Entries --output_columns "_id, '<\\'' + title + '\\'>'" --command_version 2 | ||
| #=> [1,"<'Single quote and double quote'>"] | ||
| </pre> | ||
|
|
||
| h3. Supported html_untag function experimentally | ||
|
|
||
| In this release, we began to support html_untag function which strips HTML tags experimentally. | ||
|
|
||
| For example, consider the case that scraped web site HTML is stored into groonga database. | ||
|
|
||
| Here is the sample schema which stores scraped HTML: | ||
|
|
||
| <pre> | ||
| table_create WebClips TABLE_NO_KEY | ||
| column_create WebClips url COLUMN_SCALAR ShortText | ||
| column_create WebClips content COLUMN_SCALAR ShortText | ||
| column_create WebClips tag COLUMN_VECTOR ShortText | ||
| </pre> | ||
|
|
||
| Here is the sample data: | ||
|
|
||
| <pre> | ||
| load --table WebClips | ||
| [ | ||
| {"url": "http://groonga.org", "tag": ["groonga"], "content": "groonga is <span class='emphasize'>fast</span>"}, | ||
| {"url": "http://mroonga.org", "tag": ["mroonga"], "content": "mroonga is <span class=\"emphasize\">fast</span>"}, | ||
| ] | ||
| </pre> | ||
|
|
||
| Specify column name as an argument of html_untag function. | ||
| According to above sample schema, if you want to get plain text of content column, use html_untag(content). | ||
|
|
||
| Here is the sample query which returns plain text of content column: | ||
|
|
||
| <pre> | ||
| select WebClips --output_columns "html_untag(content)" --command_version 2 | ||
| </pre> | ||
|
|
||
| Here is the execution result of above query: | ||
|
|
||
| <pre> | ||
| [[2], | ||
| [ | ||
| ["html_untag", "null"] | ||
| ], | ||
| ["groonga is fast"], | ||
| ["mroonga is fast"] | ||
| ] | ||
| </pre> | ||
|
|
||
| You can see that span tag with a class attribute is eliminated. | ||
|
|
||
| Note that you need to specify with @--command_version 2@ if you use html_untag function. | ||
| Without this, you can't get intended search results. | ||
|
|
||
| There is a reason why html_untag is supported. | ||
| It is a demand that we want to search scraped HTML contents which is stored into groonga database, then extract highlighted search results which does not contain extra noisy HTML tags. | ||
|
|
||
| It is assumed to use with the snippet_html function (it isn't supported yet). | ||
|
|
||
| Here is the concrete processing flow: | ||
|
|
||
| <pre> | ||
| original HTML -(html_untag)-> plain text -(snippet_html)-> highlighted HTML | ||
| </pre> | ||
|
|
||
| It isn't supported combination usage of html_untag and snippet_html yet, but it will be supported in the future release. | ||
|
|
||
| h3. Conclusion | ||
|
|
||
| See "Release 3.0.5 2013/06/29":/docs/news.html#release-3-0-5 about detailed changes since 3.0.4. | ||
|
|
||
| Let's search by groonga! |