Navigation Menu

Skip to content

Commit

Permalink
blog en: add 3.0.5 release entry
Browse files Browse the repository at this point in the history
  • Loading branch information
kenhys committed Jun 28, 2013
1 parent 2967cff commit 685f5a2
Showing 1 changed file with 140 additions and 0 deletions.
140 changes: 140 additions & 0 deletions en/_posts/2013-06-29-release.textile
@@ -0,0 +1,140 @@
---
layout: post.en
title: Groonga 3.0.5 has been released
description: Groonga 3.0.5 has been released!
published: false
---


h2. Groonga 3.0.5 has been released

"Groonga 3.0.5":/docs/news.html#release-3-0-5 has been released!

How to install: "Install":/docs/install.html

There are two topics for this release.

* Supported single quoted string literal in output_columns
* Supported "html_untag":/docs/reference/functions/html_untag.html function experimentally

h3. Supported single quoted string literal in output_columns

In this release, we began to support single quoted string literal in output_columns.

Since groonga 3.0.2 release, complex string concatination in @--output_columns@ had been supported.
This feature support following expression:

<pre>
'"<" + title + ">"'
</pre>

Note that 'title' means 'title' column in this case. Above query returns @"<(CONTENT OF TITLE)>"@.

But there is the fact that single quote isn't supported in string literal at that time.

Here is the sample schema:

<pre>
table_create Entries TABLE_NO_KEY
column_create Entries title COLUMN_SCALAR ShortText

load --table Entries
[
{"title": "Single quote and double quote"}
]
</pre>

In the previous release, there are some way to get @"<(CONTENT OF TITLE)>"@.

* @select Entries --output_columns '_id, "<" + title + ">"' --command_version 2@
* @select Entries --output_columns "_id, \"<\" + title + \">\"" --command_version 2@

Here is the revised query using single quote in string literal for groonga 3.0.5:

<pre>
select Entries --output_columns "_id, '<' + title + '>'" --command_version 2
</pre>

As single quote has been supported, groonga 3.0.5 returns intended result sets even though the query which groonga 3.0.4 returns empty result.

Here is the sample queries which groonga 3.0.4 or earlier version returns empty set:

<pre>
# <"(contents of title column)">
select Entries --output_columns "_id, '<\"' + title + '\">'" --command_version 2
#=> [1,"<\"Single quote and double quote\">"]

# <'(contents of title column)'>
select Entries --output_columns "_id, '<\\'' + title + '\\'>'" --command_version 2
#=> [1,"<'Single quote and double quote'>"]
</pre>

h3. Supported html_untag function experimentally

In this release, we began to support html_untag function which strips HTML tags experimentally.

For example, consider the case that scraped web site HTML is stored into groonga database.

Here is the sample schema which stores scraped HTML:

<pre>
table_create WebClips TABLE_NO_KEY
column_create WebClips url COLUMN_SCALAR ShortText
column_create WebClips content COLUMN_SCALAR ShortText
column_create WebClips tag COLUMN_VECTOR ShortText
</pre>

Here is the sample data:

<pre>
load --table WebClips
[
{"url": "http://groonga.org", "tag": ["groonga"], "content": "groonga is <span class='emphasize'>fast</span>"},
{"url": "http://mroonga.org", "tag": ["mroonga"], "content": "mroonga is <span class=\"emphasize\">fast</span>"},
]
</pre>

Specify column name as an argument of html_untag function.
According to above sample schema, if you want to get plain text of content column, use html_untag(content).

Here is the sample query which returns plain text of content column:

<pre>
select WebClips --output_columns "html_untag(content)" --command_version 2
</pre>

Here is the execution result of above query:

<pre>
[[2],
[
["html_untag", "null"]
],
["groonga is fast"],
["mroonga is fast"]
]
</pre>

You can see that span tag with a class attribute is eliminated.

Note that you need to specify with @--command_version 2@ if you use html_untag function.
Without this, you can't get intended search results.

There is a reason why html_untag is supported.
It is a demand that we want to search scraped HTML contents which is stored into groonga database, then extract highlighted search results which does not contain extra noisy HTML tags.

It is assumed to use with the snippet_html function (it isn't supported yet).

Here is the concrete processing flow:

<pre>
original HTML -(html_untag)-> plain text -(snippet_html)-> highlighted HTML
</pre>

It isn't supported combination usage of html_untag and snippet_html yet, but it will be supported in the future release.

h3. Conclusion

See "Release 3.0.5 2013/06/29":/docs/news.html#release-3-0-5 about detailed changes since 3.0.4.

Let's search by groonga!

0 comments on commit 685f5a2

Please sign in to comment.