Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not search for Japanese word #2159

Closed
khoi-thinh opened this issue Feb 12, 2016 · 11 comments
Closed

Can not search for Japanese word #2159

khoi-thinh opened this issue Feb 12, 2016 · 11 comments

Comments

@khoi-thinh
Copy link

Hi everyone,
I'm running mattermost 1.4.0 on server with CentOS6.7 and docker version 1.7.1 installed.
The point is i was not be able to search Japanese word. Is there anyway to fix it?

@it33
Copy link
Contributor

it33 commented Feb 12, 2016

Hi @thinhduckhoi,

If you're using MySQL please try changing the parser settings to better index for searching in Japanese. If you're using Postgres, please search for similar functionality?

We'd highly appreciate your feedback on what you find works (or doesn't work).

Alternatively, you can add * after the words you're searching for to get around the parsing issues.

@it33 it33 added the Awaiting Submitter Action Blocked on the author label Feb 12, 2016
@khoi-thinh
Copy link
Author

Thank you, i could find the japanese string when adding * after it.

@lindy65 lindy65 removed the Awaiting Submitter Action Blocked on the author label Feb 15, 2016
@lindy65
Copy link
Contributor

lindy65 commented Feb 15, 2016

Glad to hear your issue is sorted thinhduckhoi!

@lindy65 lindy65 closed this as completed Feb 15, 2016
@yukihane
Copy link

yukihane commented Apr 6, 2016

FYI: How to enable Japanese sentences search on PotgreSQL
(Unfortunately, almost all references are written only in Japanese, and I don't have much English skill...)

My environment: CentOS7.2, GitLab Mattermost v2.1.0


Install necessary packages:

sudo yum install git gcc-c++

Intall Mecab and Mecab IPA-dictionary:

mkdir ~/tmp
cd ~/tmp
git clone https://github.com/taku910/mecab.git
cd mecab/mecab/
./configure --prefix=/opt/gitlab/embedded
make
sudo make install
cd ~/tmp/mecab/mecab-ipadic/
./configure --prefix=/opt/gitlab/embedded --with-mecab-config=/opt/gitlab/embedded/bin/mecab-config --with-charset=utf8
make
sudo make install

Install textsearch_ja:

Site pgfoundry.org is down for several days, so I have used forked version.

cd ~/tmp
git clone https://github.com/oknj/textsearch_ja.git
cd textsearch_ja
make USE_PGXS=1 PG_CONFIG=/opt/gitlab/embedded/bin/pg_config
sudo make USE_PGXS=1 PG_CONFIG=/opt/gitlab/embedded/bin/pg_config install

Enable pg_catalog.japanese:

sudo su - gitlab-psql
export PGHOST=/var/opt/gitlab/postgresql
export PGDATA=/var/opt/gitlab/postgresql/data
psql mattermost_production
> create extension if not exists textsearch_ja;
> \q

Edit parameter in data/postgresql.conf:

Modify
default_text_search_config = 'pg_catalog.english'
to
default_text_search_config = 'pg_catalog.japanese'

(* This is global setting, may have side-effets to GitLab.)

Enable edited parameter:

pg_ctl reload
exit

default_text_search_config has effect when searching posted messages, because to_tsvector has no explicit paramter(refs 1, 2).

On current version(v2.1.0), I guess the above is all.
Though, by right, I think index should work.(see also: #2622)
So, for the future, I have described following how to re-create index.


Install pg_bigm:

cd ~/tmp
curl -L -O https://osdn.jp/projects/pgbigm/downloads/63792/pg_bigm-1.1-20150910.tar.gz
tar zxvf pg_bigm-1.1-20150910.tar.gz
cd pg_bigm-1.1-20150910
make USE_PGXS=1 PG_CONFIG=/opt/gitlab/embedded/bin/pg_config
sudo make USE_PGXS=1 PG_CONFIG=/opt/gitlab/embedded/bin/pg_config install

Edit setting in data/postgresql.conf:

sudo su - gitlab-psql
export PGHOST=/var/opt/gitlab/postgresql
export PGDATA=/var/opt/gitlab/postgresql/data

Add in data/postgresql.conf:
shared_preload_libraries = 'pg_bigm'

Re-create index using pg_bigm:

psql mattermost_production
> create extension if not exists pg_bigm;
> drop index idx_posts_message_txt;
> create index idx_posts_message_txt on posts using gin (message gin_bigm_ops);
> \q
exit

@it33
Copy link
Contributor

it33 commented Apr 6, 2016

Thanks @yukihane!

Highly appreciated, we've included a link to your post from the documentation for the Japanese community.

Would you be interested in contributing to the correction and translation of our guidance for Japanese language speakers?

Here is a page that has used machine translation from English to Japanese: https://github.com/mattermost/docs/blob/master/source/install/i18n.rst

With your help, I think we could improve support for the community.

@hdbn
Copy link

hdbn commented Apr 9, 2016

Just a note: It seems data/porgresql.conf is overwritten whenever gitlab-ctl reconfigure is run.

@dkastl
Copy link

dkastl commented Nov 29, 2016

I find the proposed approach to enable Japanese search as too complicated and risky on a production system, where PostgreSQL is also used by other services (ie. Gitlab). I would like to avoid changing the PostgreSQL configuration and compile libraries.

I have learned about PGroonga (http://pgroonga.github.io/), which looks like a good solution. However, I'm not sure if this would be a practical solution here and which changes would be necessary. It would be great if PGroonga could be used if available and otherwise fall back to the default.

@morihaya
Copy link

Hello @dkastl

I tried pgroonga but didn't work.
Because pgroonga dose not support 4000 bytes over size varchar.

I needed create index for message columun of posts table but its type is varying(4000).
According to pgroonga document, an encorded varchar by UTF-8 allocates 4 bytes each 1 character.
So varying(4000) is 16000 bytes.

If you convert message columun's type from varchar to text, it may be able to use pgroonga.
(Although there may be other influences...)

@dkastl
Copy link

dkastl commented Apr 17, 2017

@morihaya ,
thank you for trying! As you say, maybe changing the column type from varchar to text might be a solution. As far as I know there is no particular reason in PostgreSQL to always use text column type over varchar.

@s-ponta
Copy link

s-ponta commented Jan 12, 2020

I tried to use the method proposed by @yukihane. Unfortunately, however, I failed at installing textsearch_ja with the following error.

$ make USE_PGXS=1 PG_CONFIG=/opt/gitlab/embedded/bin/pg_config
Makefile:20: /opt/gitlab/embedded/postgresql/10/lib/pgxs/src/makefiles/pgxs.mk: そのようなファイルやディレクトリはありません
make: *** ターゲット `/opt/gitlab/embedded/postgresql/10/lib/pgxs/src/makefiles/pgxs.mk' を make するルールがありません.  中止.

The first japanese sentence means "No such a file or directory.", and the second is "There is no rule to make the target '/opt/gitlab/~~~~'. Cancel."

How should I fix it?
I guess the PostgreSQL currently bundled with Gitlab doesn't have such a file, but honestly speaking, this is the first time for me to use database app directly, so I don't have any confidence... And, it looks that the solution is old (almost 4 years ago). Is there any simpler solution?

Currently, I use Gitlab 12.6.3-ce and bundled mattermost 5.17.1 on CentOS 7.7.1908.

@s-ponta
Copy link

s-ponta commented Jan 24, 2021

The recent version of mattermost bundled in Gitlab cannot use the method proposed by @yukihane as I described in the last post.

However, mattermost introduced Bleve search engine as an experimental feature recently, and by enabling it, I cloud confirm that text search in Japanese worked.
I think this is very easy and much better than modifying the database, so I highly recommend to use Bleve for who encountered with Japanese text search problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants