Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the 'Anonymous IP' database to the geoip processor #107287

Merged
merged 14 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/107287.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 107287
summary: Add support for the 'Anonymous IP' database to the geoip processor
area: Ingest Node
type: enhancement
issues:
- 90789
33 changes: 18 additions & 15 deletions docs/reference/ingest/processors/geoip.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ IPv4 or IPv6 address.

[[geoip-automatic-updates]]
By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
ASN GeoIP2 databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
ASN IP geolocation databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to `storage.googleapis.com` domain and either:

* `ingest.geoip.downloader.eager.download` is set to true
Expand Down Expand Up @@ -38,7 +38,7 @@ field instead.
| Name | Required | Default | Description
| `field` | yes | - | The field to get the ip address from for the geographical lookup.
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the `ingest-geoip` config directory.
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to one of the automatically downloaded GeoLite2 databases (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or the name of a supported database file in the `ingest-geoip` config directory.
| `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
| `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
Expand All @@ -47,15 +47,18 @@ field instead.

*Depends on what is available in `database_file`:

* If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
* If a GeoLite2 City or GeoIP2 City database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
* If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
* If a GeoLite2 Country or GeoIP2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
were configured in `properties`.
* If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
`asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
in `properties`.
* If the GeoIP2 Anonymous IP database is used, then the following fields may be added under the `target_field`: `ip`,
`hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`. The fields actually added
depend on what has been found and which properties were configured in `properties`.


Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
Expand Down Expand Up @@ -109,7 +112,7 @@ Which returns:

Here is an example that uses the default country database and adds the
geographical information to the `geo` field based on the `ip` field. Note that
this database is included in the module. So this:
this database is downloaded automatically. So this:

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -316,14 +319,14 @@ GET /my_ip_locations/_search
////

[[manage-geoip-database-updates]]
==== Manage your own GeoIP2 database updates
==== Manage your own IP geolocation database updates

If you can't <<geoip-automatic-updates,automatically update>> your GeoIP2
databases from the Elastic endpoint, you have a few other options:
If you can't <<geoip-automatic-updates,automatically update>> your IP geolocation databases
from the Elastic endpoint, you have a few other options:

* <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
* <<use-custom-geoip-endpoint,Use a custom endpoint>>
* <<manually-update-geoip-databases,Manually update your GeoIP2 databases>>
* <<manually-update-geoip-databases,Manually update your IP geolocation databases>>

[[use-proxy-geoip-endpoint]]
**Use a proxy endpoint**
Expand Down Expand Up @@ -375,7 +378,7 @@ settings API>> to set
<<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.

[[manually-update-geoip-databases]]
**Manually update your GeoIP2 databases**
**Manually update your IP geolocation databases**

. Use the <<cluster-update-settings,cluster update settings API>> to set
`ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
Expand Down Expand Up @@ -414,22 +417,22 @@ Note that these settings are node settings and apply to all `geoip` processors,
[[ingest-geoip-downloader-enabled]]
`ingest.geoip.downloader.enabled`::
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
If `true`, {es} automatically downloads and manages updates for GeoIP2 databases
If `true`, {es} automatically downloads and manages updates for IP geolocation databases
from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
updates and deletes all downloaded databases. Defaults to `true`.

[[ingest-geoip-downloader-eager-download]]
`ingest.geoip.downloader.eager.download`::
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
If `true`, {es} downloads GeoIP2 databases immediately, regardless of whether a
If `true`, {es} downloads IP geolocation databases immediately, regardless of whether a
pipeline exists with a geoip processor. If `false`, {es} only begins downloading
the databases if a pipeline with a geoip processor exists or is added. Defaults
to `false`.

[[ingest-geoip-downloader-endpoint]]
`ingest.geoip.downloader.endpoint`::
(<<static-cluster-setting,Static>>, string)
Endpoint URL used to download updates for GeoIP2 databases. For example, `https://myDomain.com/overview.json`.
Endpoint URL used to download updates for IP geolocation databases. For example, `https://myDomain.com/overview.json`.
Defaults to `https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
each node's <<es-tmpdir,temporary directory>> at `$ES_TMPDIR/geoip-databases/<node_id>`.
Note that {es} will make a GET request to `${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree`,
Expand All @@ -440,6 +443,6 @@ The GeoIP downloader uses the JDK's builtin cacerts. If you're using a custom en
[[ingest-geoip-downloader-poll-interval]]
`ingest.geoip.downloader.poll.interval`::
(<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
How often {es} checks for GeoIP2 database updates at the
How often {es} checks for IP geolocation database updates at the
`ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
to `3d` (three days).
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ enum Database {
Property.LOCATION
),
Set.of(
Property.CONTINENT_NAME,
Property.COUNTRY_NAME,
Property.COUNTRY_ISO_CODE,
Property.COUNTRY_NAME,
Property.CONTINENT_NAME,
Property.REGION_ISO_CODE,
Property.REGION_NAME,
Property.CITY_NAME,
Expand All @@ -55,11 +55,31 @@ enum Database {
Asn(
Set.of(Property.IP, Property.ASN, Property.ORGANIZATION_NAME, Property.NETWORK),
Set.of(Property.IP, Property.ASN, Property.ORGANIZATION_NAME, Property.NETWORK)
),
AnonymousIp(
Set.of(
Property.IP,
Property.HOSTING_PROVIDER,
Property.TOR_EXIT_NODE,
Property.ANONYMOUS_VPN,
Property.ANONYMOUS,
Property.PUBLIC_PROXY,
Property.RESIDENTIAL_PROXY
),
Set.of(
Property.HOSTING_PROVIDER,
Property.TOR_EXIT_NODE,
Property.ANONYMOUS_VPN,
Property.ANONYMOUS,
Property.PUBLIC_PROXY,
Property.RESIDENTIAL_PROXY
)
);

private static final String CITY_DB_SUFFIX = "-City";
private static final String COUNTRY_DB_SUFFIX = "-Country";
private static final String ASN_DB_SUFFIX = "-ASN";
private static final String ANONYMOUS_IP_DB_SUFFIX = "-Anonymous-IP";

/**
* Parses the passed-in databaseType (presumably from the passed-in databaseFile) and return the Database instance that is
Expand All @@ -79,6 +99,8 @@ public static Database getDatabase(final String databaseType, final String datab
database = Database.Country;
} else if (databaseType.endsWith(Database.ASN_DB_SUFFIX)) {
database = Database.Asn;
} else if (databaseType.endsWith(Database.ANONYMOUS_IP_DB_SUFFIX)) {
database = Database.AnonymousIp;
}
}

Expand Down Expand Up @@ -147,7 +169,13 @@ enum Property {
LOCATION,
ASN,
ORGANIZATION_NAME,
NETWORK;
NETWORK,
HOSTING_PROVIDER,
TOR_EXIT_NODE,
ANONYMOUS_VPN,
ANONYMOUS,
PUBLIC_PROXY,
RESIDENTIAL_PROXY;

/**
* Parses a string representation of a property into an actual Property instance. Not all properties that exist are
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import com.maxmind.db.Reader;
import com.maxmind.geoip2.DatabaseReader;
import com.maxmind.geoip2.model.AbstractResponse;
import com.maxmind.geoip2.model.AnonymousIpResponse;
import com.maxmind.geoip2.model.AsnResponse;
import com.maxmind.geoip2.model.CityResponse;
import com.maxmind.geoip2.model.CountryResponse;
Expand Down Expand Up @@ -169,6 +170,12 @@ public AsnResponse getAsn(InetAddress ipAddress) {
return getResponse(ipAddress, DatabaseReader::tryAsn);
}

@Nullable
@Override
public AnonymousIpResponse getAnonymousIp(InetAddress ipAddress) {
return getResponse(ipAddress, DatabaseReader::tryAnonymousIp);
}

boolean preLookup() {
return currentUsages.updateAndGet(current -> current < 0 ? current : current + 1) > 0;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

package org.elasticsearch.ingest.geoip;

import com.maxmind.geoip2.model.AnonymousIpResponse;
import com.maxmind.geoip2.model.AsnResponse;
import com.maxmind.geoip2.model.CityResponse;
import com.maxmind.geoip2.model.CountryResponse;
Expand Down Expand Up @@ -53,6 +54,9 @@ public interface GeoIpDatabase {
@Nullable
AsnResponse getAsn(InetAddress ipAddress);

@Nullable
AnonymousIpResponse getAnonymousIp(InetAddress ipAddress);

/**
* Releases the current database object. Called after processing a single document. Databases should be closed or returned to a
* resource pool. No further interactions should be expected.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
package org.elasticsearch.ingest.geoip;

import com.maxmind.db.Network;
import com.maxmind.geoip2.model.AnonymousIpResponse;
import com.maxmind.geoip2.model.AsnResponse;
import com.maxmind.geoip2.model.CityResponse;
import com.maxmind.geoip2.model.CountryResponse;
Expand Down Expand Up @@ -172,6 +173,7 @@ private Map<String, Object> getGeoData(GeoIpDatabase geoIpDatabase, String ip) t
case City -> retrieveCityGeoData(geoIpDatabase, ipAddress);
case Country -> retrieveCountryGeoData(geoIpDatabase, ipAddress);
case Asn -> retrieveAsnGeoData(geoIpDatabase, ipAddress);
case AnonymousIp -> retrieveAnonymousIpGeoData(geoIpDatabase, ipAddress);
};
}

Expand Down Expand Up @@ -340,6 +342,46 @@ private Map<String, Object> retrieveAsnGeoData(GeoIpDatabase geoIpDatabase, Inet
return geoData;
}

private Map<String, Object> retrieveAnonymousIpGeoData(GeoIpDatabase geoIpDatabase, InetAddress ipAddress) {
AnonymousIpResponse response = geoIpDatabase.getAnonymousIp(ipAddress);
if (response == null) {
return Map.of();
}

boolean isHostingProvider = response.isHostingProvider();
boolean isTorExitNode = response.isTorExitNode();
boolean isAnonymousVpn = response.isAnonymousVpn();
boolean isAnonymous = response.isAnonymous();
boolean isPublicProxy = response.isPublicProxy();
boolean isResidentialProxy = response.isResidentialProxy();

Map<String, Object> geoData = new HashMap<>();
for (Property property : this.properties) {
switch (property) {
case IP -> geoData.put("ip", NetworkAddress.format(ipAddress));
case HOSTING_PROVIDER -> {
geoData.put("hosting_provider", isHostingProvider);
}
case TOR_EXIT_NODE -> {
geoData.put("tor_exit_node", isTorExitNode);
}
case ANONYMOUS_VPN -> {
geoData.put("anonymous_vpn", isAnonymousVpn);
}
case ANONYMOUS -> {
geoData.put("anonymous", isAnonymous);
}
case PUBLIC_PROXY -> {
geoData.put("public_proxy", isPublicProxy);
}
case RESIDENTIAL_PROXY -> {
geoData.put("residential_proxy", isResidentialProxy);
}
}
}
return geoData;
}

/**
* Retrieves and verifies a {@link GeoIpDatabase} instance for each execution of the {@link GeoIpProcessor}. Guards against missing
* custom databases, and ensures that database instances are of the proper type before use.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,39 @@ public void testAsn() throws Exception {
assertThat(geoData.get("network"), equalTo("82.168.0.0/14"));
}

public void testAnonymmousIp() throws Exception {
String ip = "81.2.69.1";
GeoIpProcessor processor = new GeoIpProcessor(
randomAlphaOfLength(10),
null,
"source_field",
loader("/GeoIP2-Anonymous-IP-Test.mmdb"),
() -> true,
"target_field",
ALL_PROPERTIES,
false,
false,
"filename"
);

Map<String, Object> document = new HashMap<>();
document.put("source_field", ip);
IngestDocument ingestDocument = RandomDocumentPicks.randomIngestDocument(random(), document);
processor.execute(ingestDocument);

assertThat(ingestDocument.getSourceAndMetadata().get("source_field"), equalTo(ip));
@SuppressWarnings("unchecked")
Map<String, Object> geoData = (Map<String, Object>) ingestDocument.getSourceAndMetadata().get("target_field");
assertThat(geoData.size(), equalTo(7));
assertThat(geoData.get("ip"), equalTo(ip));
assertThat(geoData.get("hosting_provider"), equalTo(true));
assertThat(geoData.get("tor_exit_node"), equalTo(true));
assertThat(geoData.get("anonymous_vpn"), equalTo(true));
assertThat(geoData.get("anonymous"), equalTo(true));
assertThat(geoData.get("public_proxy"), equalTo(true));
assertThat(geoData.get("residential_proxy"), equalTo(true));
}

public void testAddressIsNotInTheDatabase() throws Exception {
GeoIpProcessor processor = new GeoIpProcessor(
randomAlphaOfLength(10),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@
*/
public class MaxMindSupportTests extends ESTestCase {

private static final Set<String> ANONYMOUS_IP_SUPPORTED_FIELDS = Set.of(
"anonymous",
"anonymousVpn",
"hostingProvider",
"publicProxy",
"residentialProxy",
"torExitNode"
);
private static final Set<String> ANONYMOUS_IP_UNSUPPORTED_FIELDS = Set.of("ipAddress", "network");

private static final Set<String> ASN_SUPPORTED_FIELDS = Set.of("autonomousSystemNumber", "autonomousSystemOrganization", "network");
private static final Set<String> ASN_UNSUPPORTED_FIELDS = Set.of("ipAddress");

Expand Down Expand Up @@ -192,6 +202,8 @@ public class MaxMindSupportTests extends ESTestCase {
);

private static final Map<Database, Set<String>> TYPE_TO_SUPPORTED_FIELDS_MAP = Map.of(
Database.AnonymousIp,
ANONYMOUS_IP_SUPPORTED_FIELDS,
Database.Asn,
ASN_SUPPORTED_FIELDS,
Database.City,
Expand All @@ -200,6 +212,8 @@ public class MaxMindSupportTests extends ESTestCase {
COUNTRY_SUPPORTED_FIELDS
);
private static final Map<Database, Set<String>> TYPE_TO_UNSUPPORTED_FIELDS_MAP = Map.of(
Database.AnonymousIp,
ANONYMOUS_IP_UNSUPPORTED_FIELDS,
Database.Asn,
ASN_UNSUPPORTED_FIELDS,
Database.City,
Expand All @@ -208,6 +222,8 @@ public class MaxMindSupportTests extends ESTestCase {
COUNTRY_UNSUPPORTED_FIELDS
);
private static final Map<Database, Class<? extends AbstractResponse>> TYPE_TO_MAX_MIND_CLASS = Map.of(
Database.AnonymousIp,
AnonymousIpResponse.class,
Database.Asn,
AsnResponse.class,
Database.City,
Expand All @@ -217,7 +233,6 @@ public class MaxMindSupportTests extends ESTestCase {
);

private static final Set<Class<? extends AbstractResponse>> KNOWN_UNSUPPORTED_RESPONSE_CLASSES = Set.of(
AnonymousIpResponse.class,
ConnectionTypeResponse.class,
DomainResponse.class,
EnterpriseResponse.class,
Expand Down
Binary file not shown.