Skip to content

Commit

Permalink
Add ingest-useragent plugin
Browse files Browse the repository at this point in the history
  • Loading branch information
Christoph Wurm committed Jul 1, 2016
1 parent 7c87d39 commit 81c5bdf
Show file tree
Hide file tree
Showing 15 changed files with 6,112 additions and 0 deletions.
74 changes: 74 additions & 0 deletions docs/plugins/ingest-useragent.asciidoc
@@ -0,0 +1,74 @@
[[ingest-useragent]]
=== Ingest Useragent Processor Plugin

The Useragent processor extracts details from the user agent string a browser sends with its web requests.
This processor adds this information by default under the `useragent` field.

The ingest-useragent plugin ships by default with the regexes.yaml made available by uap-java with an Apache 2.0 license. For more details see https://github.com/ua-parser/uap-core.

[[ingest-useragent-install]]
[float]
==== Installation

This plugin can be installed using the plugin manager:

[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin install ingest-useragent
----------------------------------------------------------------

The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.

[[ingest-useragent-remove]]
[float]
==== Removal

The plugin can be removed with the following command:

[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin remove ingest-useragent
----------------------------------------------------------------

The node must be stopped before removing the plugin.

[[using-ingest-useragent]]
==== Using the Useragent Processor in a Pipeline

[[ingest-useragent-options]]
.Useragent options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field containing the user agent string.
| `target_field` | no | useragent | The field that will be filled with the user agent details.
| `regex_file` | no | - | The name of the file in the `config/ingest-useragent` directory containing the regular expressions for parsing the user agent string. Both the directory and the file have to be created before starting Elasticsearch. If not specified, ingest-useragent will use the regexes.yaml from uap-core it ships with (see below).
| `properties` | no | [`name`, `major`, `minor`, `patch`, `build`, `os`, `os_name`, `os_major`, `os_minor`, `device`] | Controls what properties are added to `target_field`.
|======

Here is an example that adds the user agent details to the `useragent` field based on the `agent` field:

[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [
{
"useragent" : {
"field" : "agent"
}
}
]
}
--------------------------------------------------

===== Using a custom regex file
To use a custom regex file for parsing the user agents, that file has to be put into the `config/ingest-useragent` directory and
has to have a `.yaml` filename extension. The file has to be present at node startup, any changes to it or any new files added
while the node is running will not have any effect.

In practice, it will make most sense for any custom regex file to be a variant of the default file, either a more recent version
or a customised version.

The default file included in `ingest-useragent` is the `regexes.yaml` from uap-core: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml
29 changes: 29 additions & 0 deletions plugins/ingest-useragent/build.gradle
@@ -0,0 +1,29 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

esplugin {
description 'Ingest processor that extracts information from a user agent'
classname 'org.elasticsearch.ingest.useragent.IngestUserAgentPlugin'
}

integTest {
cluster {
extraConfigFile 'ingest-useragent/test-regexes.yaml', 'test/test-regexes.yaml'
}
}
@@ -0,0 +1,86 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.ingest.useragent;

import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.node.NodeModule;
import org.elasticsearch.plugins.Plugin;

import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.PathMatcher;
import java.nio.file.StandardOpenOption;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.Stream;

public class IngestUserAgentPlugin extends Plugin {

private final Setting<Long> CACHE_SIZE_SETTING = Setting.longSetting("ingest.useragent.cache_size", 1000, 0,
Setting.Property.NodeScope);

static final String DEFAULT_PARSER_NAME = "_default_";

public void onModule(NodeModule nodeModule) throws IOException {
Path userAgentConfigDirectory = nodeModule.getNode().getEnvironment().configFile().resolve("ingest-useragent");

if (Files.exists(userAgentConfigDirectory) == false && Files.isDirectory(userAgentConfigDirectory)) {
throw new IllegalStateException(
"the user agent directory [" + userAgentConfigDirectory + "] containing the regex file doesn't exist");
}

long cacheSize = CACHE_SIZE_SETTING.get(nodeModule.getNode().settings());

UserAgentCache cache = new UserAgentCache(cacheSize);

Map<String, UserAgentParser> userAgentParsers = createUserAgentParsers(userAgentConfigDirectory, cache);

nodeModule.registerProcessor(UserAgentProcessor.TYPE, (registry) -> new UserAgentProcessor.Factory(userAgentParsers));
}

static Map<String, UserAgentParser> createUserAgentParsers(Path userAgentConfigDirectory, UserAgentCache cache) throws IOException {
Map<String, UserAgentParser> userAgentParsers = new HashMap<>();

UserAgentParser defaultParser = new UserAgentParser(DEFAULT_PARSER_NAME,
IngestUserAgentPlugin.class.getResourceAsStream("/regexes.yaml"), cache);
userAgentParsers.put(DEFAULT_PARSER_NAME, defaultParser);

if (Files.exists(userAgentConfigDirectory) && Files.isDirectory(userAgentConfigDirectory)) {
PathMatcher pathMatcher = userAgentConfigDirectory.getFileSystem().getPathMatcher("glob:**.yaml");

try (Stream<Path> regexFiles = Files.find(userAgentConfigDirectory, 1,
(path, attr) -> attr.isRegularFile() && pathMatcher.matches(path))) {
Iterable<Path> iterable = regexFiles::iterator;
for (Path path : iterable) {
String parserName = path.getFileName().toString();
try (InputStream regexStream = Files.newInputStream(path, StandardOpenOption.READ)) {
userAgentParsers.put(parserName, new UserAgentParser(parserName, regexStream, cache));
}
}
}
}

return Collections.unmodifiableMap(userAgentParsers);
}

}
@@ -0,0 +1,66 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.ingest.useragent;

import org.elasticsearch.common.cache.Cache;
import org.elasticsearch.common.cache.CacheBuilder;
import org.elasticsearch.ingest.useragent.UserAgentParser.Details;

import java.util.Objects;

class UserAgentCache {
private final Cache<CompositeCacheKey, Details> cache;

UserAgentCache(long cacheSize) {
cache = CacheBuilder.<CompositeCacheKey, Details>builder().setMaximumWeight(cacheSize).build();
}

public Details get(String parserName, String userAgent) {
return cache.get(new CompositeCacheKey(parserName, userAgent));
}

public void put(String parserName, String userAgent, Details details) {
cache.put(new CompositeCacheKey(parserName, userAgent), details);
}

private static final class CompositeCacheKey {
private final String parserName;
private final String userAgent;

CompositeCacheKey(String parserName, String userAgent) {
this.parserName = parserName;
this.userAgent = userAgent;
}

@Override
public boolean equals(Object obj) {
if(obj != null && obj instanceof CompositeCacheKey) {
CompositeCacheKey s = (CompositeCacheKey)obj;
return parserName.equals(s.parserName) && userAgent.equals(s.userAgent);
}
return false;
}

@Override
public int hashCode() {
return Objects.hash(parserName, userAgent);
}
}
}

0 comments on commit 81c5bdf

Please sign in to comment.