Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DB] 建置 Elasticsearch + IK中文分詞 + 繁體支援設定檔 by Docker #22

Open
jimliu7434 opened this issue Aug 22, 2019 · 0 comments

Comments

@jimliu7434
Copy link
Owner

jimliu7434 commented Aug 22, 2019

建置 Elasticsearch + IK中文分詞 + 繁體支援設定檔 by Docker

步驟

  1. Pull Docker Image & Run Container
    使用包裝好的 Docker Image docker-elasticsearch-analysis-ik
    本處我將 Container Name 訂為 es_word

    docker run --rm --name es_word -p 9200:9200 -p 9300:9300 -v  $(pwd)/data:/usr/share/elasticsearch/data peterzhang/elasticsearch-analysis-ik
  2. Add Traditional-Chinese Config
    下載 config/ik from https://github.com/sunghau/elasticsearch-analysis-ik-config-traditional-chinese
    將下載好的設定檔 copy 至 docker container 的指定位置 /usr/share/elasticsearch/config

    docker cp elasticsearch-analysis-ik-config-traditional-chinese-master\config\ik es_word:/usr/share/elasticsearch/config
  3. 切換設定檔
    因為 Elasticsearch 當下指定讀取的參數檔位置為簡體 IK 預設位置
    /usr/share/elasticsearch/config/analysis-ik/IKAnalyzer.cfg.xml
    避免麻煩,我們直接把繁簡體設定檔位置交換

    # 登入 docker container bash
    docker exec -it es_word bash
    # 切換至 config folder
    cd /usr/share/elasticsearch/config
    # 交換路徑
    mv analysis-ik analysis-ik2
    mv ik analysis-ik
  4. 重啟 Docker Container

    docker restart es_word
  5. 完工!測試!

直接測試 analyzer

目前 IK 支援 ik_max_word / ik_smart 兩種 analyzer

curl -XPOST http://localhost:9200/_analyze?pretty -H 'Content-Type:application/json' -d'
{
    "analyzer" : "ik_smart",
    "text": "美國2年期公債殖利率與10年期殖利率利差上周出現倒掛,在美國債市浮現經濟衰退凶兆後,被視為判斷美國經濟是否走向衰退的另一指標「露營車」出貨量也明顯下滑,加深了市場的擔憂。"
}'

建立資料進行測試

1.create a index

curl -XPUT http://localhost:9200/myindex

2.create a mapping

  • 目前 IK 支援 ik_max_word / ik_smart 兩種 analyzer
  • 本處 mapping 建立完成後,輸入之資料將會套用此 mapping 規則,無法修改,請先使用 _analyze 測試分詞結果是否如預期
curl -XPOST http://localhost:9200/myindex/fulltext/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }

}'

3.index some docs

curl -XPOST http://localhost:9200/myindex/fulltext/uniqueid01 -H 'Content-Type:application/json' -d'
{"content":"陸美貿易戰已變成科技戰甚至正開打貨幣戰"}
'
curl -XPOST http://localhost:9200/myindex/fulltext/uniqueid02 -H 'Content-Type:application/json' -d'
{"content":"美國2年期公債殖利率與10年期殖利率利差上周出現倒掛"}
'
curl -XPOST http://localhost:9200/myindex/fulltext/uniqueid03 -H 'Content-Type:application/json' -d'
{"content":"用露營車產業來判斷美國經濟是否出現衰退,比經濟學家的預測更精準。"}
'
curl -XPOST http://localhost:9200/myindex/fulltext/uniqueid04 -H 'Content-Type:application/json' -d'
{"content":"被視為判斷美國經濟是否走向衰退的另一指標「露營車」出貨量也明顯下滑"}
'

4.query with highlighting

curl -XPOST http://localhost:9200/myindex/fulltext/_search  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "美國" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}
'

參考資料

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant