Skip to content

Conversation

@masseyke
Copy link
Member

Until now, we have been extracted a few number of fields from the binary files sent to the ingest attachment plugin:

  • content,
  • title,
  • author,
  • keywords,
  • date,
  • content_type,
  • content_length,
  • language.

Tika has a list of more standard properties which can be extracted:

  • modified,
  • format,
  • identifier,
  • contributor,
  • coverage,
  • modifier,
  • creator_tool,
  • publisher,
  • relation,
  • rights,
  • source,
  • type,
  • description,
  • print_date,
  • metadata_date,
  • latitude,
  • longitude,
  • altitude,
  • rating,
  • comments

This commit exposes those new fields.

Related to #22339.

Co-authored-by: Keith Massey keith.massey@elastic.co

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

Until now, we have been extracted a few number of fields from the binary files sent to the ingest attachment plugin:

* `content`,
* `title`,
* `author`,
* `keywords`,
* `date`,
* `content_type`,
* `content_length`,
* `language`.

Tika has a list of more standard properties which can be extracted:

* `modified`,
* `format`,
* `identifier`,
* `contributor`,
* `coverage`,
* `modifier`,
* `creator_tool`,
* `publisher`,
* `relation`,
* `rights`,
* `source`,
* `type`,
* `description`,
* `print_date`,
* `metadata_date`,
* `latitude`,
* `longitude`,
* `altitude`,
* `rating`,
* `comments`

This commit exposes those new fields.

Related to elastic#22339.

Co-authored-by: Keith Massey <keith.massey@elastic.co>
@masseyke masseyke added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP backport labels Nov 29, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Nov 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke
Copy link
Member Author

Relates #78754

@masseyke masseyke merged commit a80c8ce into elastic:8.0 Nov 29, 2021
@masseyke masseyke deleted the feature/backport-78754 branch November 29, 2021 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.0.0-rc1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants