Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the parameters for settings. #837

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 31 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,37 +163,28 @@ Result

### Dictionary Configuration

`IKAnalyzer.cfg.xml` can be located at `{conf}/analysis-ik/config/IKAnalyzer.cfg.xml`
or `{plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml`

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>
`IKAnalyzer.yml` can be located at `{conf}/analysis-ik/IKAnalyzer.yml`

```yml
# IK Analyzer 扩展配置
analysis_ik:
# 字典配置
dictionary:
# 用户可以在这里配置自己的扩展字典
ext_dict: ""
# 用户可以在这里配置自己的扩展停止词字典
ext_stop_word: ""
# 用户可以在这里配置远程扩展字典
remote_ext_dict: ""
# 用户可以在这里配置远程扩展停止词字典
remote_ext_stop_word: ""
```

### 热更新 IK 分词使用方法

目前该插件支持热更新 IK 分词,通过上文在 IK 配置文件中提到的如下配置

```xml
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">location</entry>
```

其中 `location` 是指一个 url,比如 `http://yoursite.com/getCustomDict`,该请求只需满足以下两点即可完成分词热更新。
`remote_ext_dict`和`remote_ext_stop_word`,他们的参数值是指一个 url,比如 `http://yoursite.com/getCustomDict`,该请求只需满足以下两点即可完成分词热更新。

1. 该 http 请求需要返回两个头部(header),一个是 `Last-Modified`,一个是 `ETag`,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。

Expand All @@ -205,6 +196,21 @@ or `{plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml`

have fun.

如果使用Docker运行ElasticSearch服务(需要定制ElasticSearch镜像,安装上本插件),可以在创建容器时,通过配置环境变量,将上述参数传递进去:

```yml
elasticsearch:
image: my-elasticsearch-chs:7.9.3
container_name: elasticsearch
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
- analysis_ik.dictionary.remote_ext_dict=http://www.example.com/dic.txt
- analysis_ik.dictionary.remote_ext_stop_word=http://www.example.com/stop-word.txt
```

常见问题
-------

Expand Down
13 changes: 0 additions & 13 deletions config/IKAnalyzer.cfg.xml

This file was deleted.

12 changes: 12 additions & 0 deletions config/IKAnalyzer.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# IK Analyzer 扩展配置
analysis_ik:
# 字典配置
dictionary:
# 用户可以在这里配置自己的扩展字典
ext_dict: ""
# 用户可以在这里配置自己的扩展停止词字典
ext_stop_word: ""
# 用户可以在这里配置远程扩展字典
remote_ext_dict: ""
# 用户可以在这里配置远程扩展停止词字典
remote_ext_stop_word: ""
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-analysis-ik</artifactId>
<version>${elasticsearch.version}</version>
<version>7.9.3</version>
<packaging>jar</packaging>
<description>IK Analyzer for Elasticsearch</description>
<inceptionYear>2011</inceptionYear>

<properties>
<elasticsearch.version>7.4.0</elasticsearch.version>
<elasticsearch.version>${project.version}</elasticsearch.version>
<maven.compiler.target>1.8</maven.compiler.target>
<elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor>
<elasticsearch.plugin.name>analysis-ik</elasticsearch.plugin.name>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,21 +1,44 @@
package org.elasticsearch.plugin.analysis.ik;

import org.apache.lucene.analysis.Analyzer;
import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.index.analysis.AnalyzerProvider;
import org.elasticsearch.index.analysis.IkAnalyzerProvider;
import org.elasticsearch.index.analysis.IkTokenizerFactory;
import org.elasticsearch.index.analysis.TokenizerFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;
import org.apache.logging.log4j.Logger;
import org.wltea.analyzer.help.ESPluginLoggerFactory;

import java.io.IOException;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;


public class AnalysisIkPlugin extends Plugin implements AnalysisPlugin {

public static String PLUGIN_NAME = "analysis-ik";
public static String PLUGIN_NAME = "analysis-ik";

private final static String FILE_NAME = "IKAnalyzer.yml";

private final Path configPath;

private static final Logger logger = ESPluginLoggerFactory.getLogger(AnalysisIkPlugin.class.getName());

private final static String EXT_DICT = "ext_dict";
private final static String REMOTE_EXT_DICT = "remote_ext_dict";
private final static String EXT_STOP = "ext_stop_word";
private final static String REMOTE_EXT_STOP = "remote_ext_stop_word";

public AnalysisIkPlugin(Settings settings, Path configPath) {
this.configPath = configPath;
}

@Override
public Map<String, AnalysisModule.AnalysisProvider<TokenizerFactory>> getTokenizers() {
Expand All @@ -38,4 +61,27 @@ public Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends An
return extra;
}

@Override
public Settings additionalSettings() {
Path configFile = this.configPath.resolve(PLUGIN_NAME).resolve(FILE_NAME);
try {
return Settings.builder().loadFromPath(configFile).build();
} catch (IOException e) {
logger.error("ik-analyzer failed to load settings", e);
}
return super.additionalSettings();
}

@Override
public List<Setting<?>> getSettings() {
String[] dictionaries = { EXT_DICT, EXT_STOP, REMOTE_EXT_DICT, REMOTE_EXT_STOP };
List<Setting<?>> settings = new ArrayList<Setting<?>>();
for (String dictionary : dictionaries) {
String[] keyInfo = { PLUGIN_NAME.replace("-", "_"), "dictionary", dictionary };
String key = String.join(".", keyInfo);
Setting<String> setting = Setting.simpleString(key, "", Setting.Property.NodeScope);
settings.add(setting);
}
return settings;
}
}
72 changes: 23 additions & 49 deletions src/main/java/org/wltea/analyzer/dic/Dictionary.java
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,14 @@

import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.elasticsearch.SpecialPermission;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin;
import org.wltea.analyzer.cfg.Configuration;
import org.apache.logging.log4j.Logger;
Expand Down Expand Up @@ -91,50 +91,27 @@ public class Dictionary {
private static final String PATH_DIC_PREP = "preposition.dic";
private static final String PATH_DIC_STOP = "stopword.dic";

private final static String FILE_NAME = "IKAnalyzer.cfg.xml";
private final static String EXT_DICT = "ext_dict";
private final static String REMOTE_EXT_DICT = "remote_ext_dict";
private final static String EXT_STOP = "ext_stopwords";
private final static String REMOTE_EXT_STOP = "remote_ext_stopwords";
private final static String EXT_STOP = "ext_stop_word";
private final static String REMOTE_EXT_STOP = "remote_ext_stop_word";

private Path conf_dir;
private Properties props;
private Path configDir;
private Settings settings;

private Dictionary(Configuration cfg) {
this.configuration = cfg;
this.props = new Properties();
this.conf_dir = cfg.getEnvironment().configFile().resolve(AnalysisIkPlugin.PLUGIN_NAME);
Path configFile = conf_dir.resolve(FILE_NAME);
this.configDir = cfg.getEnvironment().configFile().resolve(AnalysisIkPlugin.PLUGIN_NAME);
this.settings = cfg.getEnvironment().settings();
}

InputStream input = null;
try {
logger.info("try load config from {}", configFile);
input = new FileInputStream(configFile.toFile());
} catch (FileNotFoundException e) {
conf_dir = cfg.getConfigInPluginDir();
configFile = conf_dir.resolve(FILE_NAME);
try {
logger.info("try load config from {}", configFile);
input = new FileInputStream(configFile.toFile());
} catch (FileNotFoundException ex) {
// We should report origin exception
logger.error("ik-analyzer", e);
}
}
if (input != null) {
try {
props.loadFromXML(input);
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
public Settings getSettings() {
return settings;
}

private String getProperty(String key){
if(props!=null){
return props.getProperty(key);
}
return null;
private String getDictionarySetting(String key) {
String[] keys = { AnalysisIkPlugin.PLUGIN_NAME.replace("-", "_"), "dictionary", key };
return settings.get(String.join(".", keys));
}
/**
* 词典初始化 由于IK Analyzer的词典采用Dictionary类的静态方法进行词典初始化
Expand Down Expand Up @@ -218,9 +195,8 @@ private void loadDictFile(DictSegment dict, Path file, boolean critical, String

private List<String> getExtDictionarys() {
List<String> extDictFiles = new ArrayList<String>(2);
String extDictCfg = getProperty(EXT_DICT);
if (extDictCfg != null) {

String extDictCfg = getDictionarySetting(EXT_DICT);
if (!extDictCfg.trim().equals("")) {
String[] filePaths = extDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Expand All @@ -235,9 +211,9 @@ private List<String> getExtDictionarys() {

private List<String> getRemoteExtDictionarys() {
List<String> remoteExtDictFiles = new ArrayList<String>(2);
String remoteExtDictCfg = getProperty(REMOTE_EXT_DICT);
if (remoteExtDictCfg != null) {

String remoteExtDictCfg = getDictionarySetting(REMOTE_EXT_DICT);
if (!remoteExtDictCfg.trim().equals("")) {
logger.info(">>>" + remoteExtDictCfg);
String[] filePaths = remoteExtDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Expand All @@ -251,9 +227,8 @@ private List<String> getRemoteExtDictionarys() {

private List<String> getExtStopWordDictionarys() {
List<String> extStopWordDictFiles = new ArrayList<String>(2);
String extStopWordDictCfg = getProperty(EXT_STOP);
if (extStopWordDictCfg != null) {

String extStopWordDictCfg = getDictionarySetting(EXT_STOP);
if (!extStopWordDictCfg.trim().equals("")) {
String[] filePaths = extStopWordDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Expand All @@ -268,9 +243,8 @@ private List<String> getExtStopWordDictionarys() {

private List<String> getRemoteExtStopWordDictionarys() {
List<String> remoteExtStopWordDictFiles = new ArrayList<String>(2);
String remoteExtStopWordDictCfg = getProperty(REMOTE_EXT_STOP);
if (remoteExtStopWordDictCfg != null) {

String remoteExtStopWordDictCfg = getDictionarySetting(REMOTE_EXT_STOP);
if (!remoteExtStopWordDictCfg.trim().equals("")) {
String[] filePaths = remoteExtStopWordDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Expand All @@ -283,7 +257,7 @@ private List<String> getRemoteExtStopWordDictionarys() {
}

private String getDictRoot() {
return conf_dir.toAbsolutePath().toString();
return configDir.toAbsolutePath().toString();
}


Expand Down