Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoIP processor support for ISP database #71718

Closed
wasserman opened this issue Apr 15, 2021 · 7 comments
Closed

GeoIP processor support for ISP database #71718

wasserman opened this issue Apr 15, 2021 · 7 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team

Comments

@wasserman
Copy link
Contributor

The GeoIP processor support database_file for an alternative database from maxmind. It would be nice to be able to use the ISP database from https://www.maxmind.com/en/geoip2-isp-database.

I prepared a bundle per https://www.elastic.co/guide/en/cloud/current/ec-custom-bundles.html#ec-prepare-custom-bundles.
Used a sample from https://github.com/maxmind/MaxMind-DB/blob/main/test-data/GeoIP2-ISP-Test.mmdb.
JSON representation of the file for reference is at https://github.com/maxmind/MaxMind-DB/blob/main/source-data/GeoIP2-ISP-Test.json

When I tried to use this database_file the error was:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "[database_file] Unsupported database type [GeoIP2-ISP]",
        "property_name" : "database_file",
        "processor_type" : "geoip"
      }
    ],
    "type" : "parse_exception",
    "reason" : "[database_file] Unsupported database type [GeoIP2-ISP]",
    "property_name" : "database_file",
    "processor_type" : "geoip"
  },
  "status" : 400
}

The section of code that shows this limitation is here:
https://github.com/elastic/elasticsearch/blob/425ed4cbc1f3f2bd2ca82091bc357f263687b149/modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java

    private Map<String, Object> getGeoData(String ip) throws IOException {
        String databaseType = lazyLoader.getDatabaseType();
        final InetAddress ipAddress = InetAddresses.forString(ip);
        Map<String, Object> geoData;
        if (databaseType.endsWith(CITY_DB_SUFFIX)) {
            try {
                geoData = retrieveCityGeoData(ipAddress);
            } catch (AddressNotFoundRuntimeException e) {
                geoData = Collections.emptyMap();
            }
        } else if (databaseType.endsWith(COUNTRY_DB_SUFFIX)) {
            try {
                geoData = retrieveCountryGeoData(ipAddress);
            } catch (AddressNotFoundRuntimeException e) {
                geoData = Collections.emptyMap();
            }
        } else if (databaseType.endsWith(ASN_DB_SUFFIX)) {
            try {
                geoData = retrieveAsnGeoData(ipAddress);
            } catch (AddressNotFoundRuntimeException e) {
                geoData = Collections.emptyMap();
            }
        } else {
            throw new ElasticsearchParseException("Unsupported database type [" + lazyLoader.getDatabaseType()
                + "]", new IllegalStateException());
        }
        return geoData;

I hope it is as easy as implementing retrieveISPGeoData and then whitelisting the ISP database filename.

Thanks!

@wasserman wasserman added >enhancement needs:triage Requires assignment of a team area label labels Apr 15, 2021
@dnhatn dnhatn added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Apr 20, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@romseygeek romseygeek removed the needs:triage Requires assignment of a team area label label Apr 23, 2021
@dcode
Copy link

dcode commented Jun 29, 2021

I'd like to add, it'd be better to support, at minimum, the official MaxMind GeoIP2 database types:

  • ANONYMOUS_IP
  • ASN
  • CITY
  • CONNECTION_TYPE
  • COUNTRY
  • DOMAIN
  • ENTERPRISE
  • ISP

The ingest processor loading code I think can get a bit simpler by leveraging the DatabaseReader.getDatabaseType() method here, which returns an int as an OR'd enum. This way the fields available are dictated by the embedded metadata and not an arbitrary filename.

Supporting the Enterprise database and the ISP database essentially provides a superset of all standard database fields. It's not clear to me how the Java bindings allow for accessing custom attributes, but that'd be a "nice to have" as well.

Enhancing this ingest processor this way could add immense value to corporate users that would like to enrich data with internal IP geolocation information and possibly subnet names. For my use-case, I am attempting to use the ingest-geoip processor to enrich known bad malware C2 endpoints. Since I'm limited to a city OR an ASN database, I have to use two distinct databases. Using the approach suggested above with the getDatabaseType(), I think it should be possible to load the City (or Enterprise) fields, and then also load the ASN fields by simply looping over all supported interfaces of the declared database type.

@jakelandis
Copy link
Contributor

This would be a great enhancement. We will need to reach out to MaxMind to see if they offer sample/test databases we could use for testing.

related: #80748

@athanatos64
Copy link

+1 to support more commercial MaxMind databases in geoip processor

@truong-hua
Copy link

truong-hua commented Dec 20, 2022

Please support this which will help to trace the ISP of origin of requests from nginx

@tylerperk
Copy link

Hi @dcode @athanatos64 @truong-hua We are working on adding support for the GeoIP2 Enterprise Database and GeoIP2-Anonymous IP Database to Elasticsearch ingest pipelines.

These files contain different/additional fields than the free GeoLite2 files we currently support. The properties parameter in a geoip processor can be used to specify which fields to return, in case you want more/fewer/different subset than the default. We're trying to decide which fields to return to the target_field by default. For the Anonymous IP file it's a relatively short list so we plan to return most of them by default. The Enterprise file has quite a few fields so we're seeking community feedback for that one.

Can you please respond back with which fields you would typically want by default? The list of available fields are:

GeoIP2 Enterprise Database:
"city.name",
"continent.name",
"country.isoCode",
"country.name",
"location.latitude",
"location.longitude",
"location.timeZone",
"mostSpecificSubdivision.isoCode",
"mostSpecificSubdivision.name",
"traits.anonymous",
"traits.anonymousVpn",
"traits.autonomousSystemNumber",
"traits.autonomousSystemOrganization",
"traits.hostingProvider",
"traits.network",
"traits.publicProxy",
"traits.residentialProxy",
"traits.torExitNode",
"city.confidence",
"city.geoNameId",
"city.names",
"continent.code",
"continent.geoNameId",
"continent.names",
"country.confidence",
"country.geoNameId",
"country.inEuropeanUnion",
"country.names",
"leastSpecificSubdivision.confidence",
"leastSpecificSubdivision.geoNameId",
"leastSpecificSubdivision.isoCode",
"leastSpecificSubdivision.name",
"leastSpecificSubdivision.names",
"location.accuracyRadius",
"location.averageIncome",
"location.metroCode",
"location.populationDensity",
"maxMind",
"mostSpecificSubdivision.confidence",
"mostSpecificSubdivision.geoNameId",
"mostSpecificSubdivision.names",
"postal.code",
"postal.confidence",
"registeredCountry.confidence",
"registeredCountry.geoNameId",
"registeredCountry.inEuropeanUnion",
"registeredCountry.isoCode",
"registeredCountry.name",
"registeredCountry.names",
"representedCountry.confidence",
"representedCountry.geoNameId",
"representedCountry.inEuropeanUnion",
"representedCountry.isoCode",
"representedCountry.name",
"representedCountry.names",
"representedCountry.type",
"subdivisions.confidence",
"subdivisions.geoNameId",
"subdivisions.isoCode",
"subdivisions.name",
"subdivisions.names",
"traits.anonymousProxy",
"traits.anycast",
"traits.connectionType",
"traits.domain",
"traits.ipAddress",
"traits.isp",
"traits.legitimateProxy",
"traits.mobileCountryCode",
"traits.mobileNetworkCode",
"traits.organization",
"traits.satelliteProvider",
"traits.staticIpScore",
"traits.userCount",
"traits.userType"

cc @joegallo

@joegallo
Copy link
Contributor

Closed by #108651

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

10 participants