Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for Visio and potm files #23214

Merged
merged 2 commits into from
Feb 17, 2017

Conversation

dadoonet
Copy link
Member

Related to #22077

This PR comes with 2 changes, one for ingest-attachment and the other for mapper-attachments.
It's essentially a backport of #22079 for 5.x series.

Ingest Attachment Plugin

  • Send a non supported document to an ingest pipeline using ingest-attachment
  • If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Mapper Attachments Plugin

  • Parse a non supported document using mapper-attachments
  • If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Note that for this one as we did not apply yet #22963 it hides the fact that we removed the potm sample file from the tika big ZIP file.

@dadoonet
Copy link
Member Author

as #22963 has been merged, I pushed new changes which makes more obvious that we are removing support for potm files.

@jasontedor It's ready for review.

@clintongormley I'd like to push it as well in 5.3 and 5.2 branches as it's a bug fix. Do you agree?

@jasontedor
Copy link
Member

Can you please rebase this as currently there are conflicts with the target branch?

I'm fine with thing going into 5.3, I'm not fine with this going into 5.2.

@dadoonet dadoonet removed the v5.2.2 label Feb 16, 2017
@dadoonet
Copy link
Member Author

Can you please rebase this as currently there are conflicts with the target branch?

Can you refresh your browser? I don't see the conflict on my side.

@jasontedor
Copy link
Member

Can you refresh your browser? I don't see the conflict on my side.

It's not a browser refresh issue.

12:54:55 [jason:~/src/elastic/elasticsearch-5.x] 5.x+ ± git fetch origin 5.x
From github.com:elastic/elasticsearch
 * branch                  5.x        -> FETCH_HEAD
12:55:01 [jason:~/src/elastic/elasticsearch-5.x] 5.x+ ± git fetch origin pull/23214/head:pr/23214
remote: Counting objects: 72, done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 72 (delta 12), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (72/72), done.
From github.com:elastic/elasticsearch
 * [new ref]               refs/pull/23214/head -> pr/23214
12:55:27 [jason:~/src/elastic/elasticsearch-5.x] 5.x+ ± git checkout pr/23214
Switched to branch 'pr/23214'
12:55:34 [jason:~/src/elastic/elasticsearch-5.x] pr/23214+ ± git rebase 5.x
First, rewinding head to replay your work on top of it...
Applying: Remove support for Visio and potm files
Applying: Remove support for Visio and potm files
Using index info to reconstruct a base tree...
A	plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip deleted in HEAD and modified in Remove support for Visio and potm files. Version Remove support for Visio and potm files of plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip left in tree.
error: Failed to merge in the changes.
Patch failed at 0002 Remove support for Visio and potm files
The copy of the patch that failed is found in: /Users/jason/src/elastic/elasticsearch/.git/worktrees/elasticsearch-5.x/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

12:55:39 [jason:~/src/elastic/elasticsearch-5.x] 954f7c2add(+0/-0)+ REBASING 128 ± git status
rebase in progress; onto 5ac2f9f311
You are currently rebasing branch 'pr/23214' on '5ac2f9f311'.
  (fix conflicts and then run "git rebase --continue")
  (use "git rebase --skip" to skip this patch)
  (use "git rebase --abort" to check out the original branch)

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   plugins/mapper-attachments/build.gradle
	modified:   plugins/mapper-attachments/src/main/java/org/elasticsearch/mapper/attachments/TikaImpl.java
	modified:   plugins/mapper-attachments/src/test/java/org/elasticsearch/mapper/attachments/VariousDocTests.java
	new file:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/sample-files/issue-22077.doc
	new file:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/sample-files/issue-22077.docx
	new file:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/sample-files/issue-22077.vsdx

Unmerged paths:
  (use "git reset HEAD <file>..." to unstage)
  (use "git add/rm <file>..." as appropriate to mark resolution)

	deleted by us:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip

12:55:43 [jason:~/src/elastic/elasticsearch-5.x] 954f7c2add(+0/-0)+ REBASING ± 

@dadoonet
Copy link
Member Author

Hmmm. Why do you rebase instead of merging?

I understand why rebase is failing on the second commit but it should not be an issue because of the next commits.

@jasontedor
Copy link
Member

Hmmm. Why do you rebase instead of merging?

Because that's what you're going to go when you merge the commit it, and that's why GitHub is already showing you the conflict.

@jasontedor
Copy link
Member

I understand why rebase is failing on the second commit but it should not be an issue because of the next commits.

Also, that's not how rebasing works. Rebasing replays every commit and halts on conflict.

@dadoonet
Copy link
Member Author

img_0182

But GitHub does not show a conflict. But ok I'll look at it tomorrow

@dadoonet
Copy link
Member Author

Note that I merged 5.x in my branch with the 3rd commit.

@jasontedor
Copy link
Member

It does when I look:

screen shot 2017-02-16 at 1 47 02 pm

And anyway, I showed from the command line where the conflict is. 😄

@dadoonet
Copy link
Member Author

I understand why you see this conflict and I don't. You defined rebase and merge as the default way to merge. I didn't.

So if you don't rebase there is no conflict.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please squash when merging.

* Send a non supported document to an ingest pipeline using `ingest-attachment`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

```
GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}
```

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to elastic#22077

Backport of elastic#22079 in 5.x branch (5.3)
* Parse a non supported document using `mapper-attachments`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to elastic#22077 and elastic#22079 for mapper-attachments plugin
@dadoonet dadoonet force-pushed the fix/22077-ingest-attachment-5x branch from 3758b7a to 64953e3 Compare February 17, 2017 08:11
@dadoonet dadoonet merged commit 64953e3 into elastic:5.x Feb 17, 2017
@dadoonet dadoonet deleted the fix/22077-ingest-attachment-5x branch February 17, 2017 08:13
@dadoonet
Copy link
Member Author

Thanks @jasontedor. I rebased and merged.

dadoonet added a commit that referenced this pull request Feb 20, 2017
* Send a non supported document to an ingest pipeline using `ingest-attachment`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

```
GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}
```

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077

Backport of #23214 in 5.2 branch
dadoonet added a commit that referenced this pull request Feb 20, 2017
* Parse a non supported document using `mapper-attachments`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077 and #22079 for mapper-attachments plugin

Backport of #23214 in 5.2 branch
@dadoonet
Copy link
Member Author

As discussed with @clintongormley, this has been also pushed to 5.2 branch.

Related commits:

  • 76a977a: Remove support for Visio and potm files (ingest-attachment)
  • 07a9f29: Remove support for Visio and potm files (mapper-attachments)
  • 3fda86a: Replace tika-files.zip by a tika-files dir (ingest-attachment tests)
  • 0561d1b: Replace tika-files.zip by a tika-files dir (mapper-attachments tests)

@dumakant
Copy link

I am using Elasticsearch 5.1.2 and facing the same error

java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point.
Can your fixes be backported to 5.1.2 as well. Also our instances are in production, how can we consume the fixes?

@dadoonet
Copy link
Member Author

@dumakant No it won't be backported.

You need to upgrade to 5.2.2 which is pretty much straightforward with a rolling upgrade (even easier if you are using elastic cloud).

@RkirkCBD
Copy link

I'm running 5.30 and I am receiving this error as a fatal error. Is the fix in the 5.3 release?

[2017-04-22T10:54:58,532][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-node-docserver] fatal error in thread [elasticsearch[es-node-docserver][bulk][T#9]], exiting
java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point

@dadoonet
Copy link
Member Author

OMG! I did not push that fix in 5.3 branch apparently. So it's fixed in 5.2, 5.4 but not 5.3...

@dadoonet
Copy link
Member Author

@RkirkCBD could you open a new issue and I'll push later today the fix in 5.3 which will go hopefully in 5.3.2?

Thanks a lot for reporting !

@dadoonet dadoonet removed the v5.3.0 label Apr 23, 2017
@RkirkCBD
Copy link

RkirkCBD commented Apr 23, 2017

See #24273. Thanks, when should 5.3.2 be released. Is it possible to downgrade my cluster to 5.2.2?

@dadoonet
Copy link
Member Author

AFAIK soonish. No you can't downgrade.

dadoonet added a commit that referenced this pull request Apr 23, 2017
* Send a non supported document to an ingest pipeline using `ingest-attachment`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

```
GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}
```

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077

Backport of #23214 in 5.3 branch

(cherry picked from commit 76a977a)
@dadoonet
Copy link
Member Author

For the record, I pushed the missing commit in 5.3 as well: dc4888e

Apparently I pushed only one of the 2 commits from this PR in the 5.3 branch... :(

@clintongormley clintongormley added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :Search Foundations/Mapping Index mappings, including merging and defining field types and removed :Plugin Ingest Attachment labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :Search Foundations/Mapping Index mappings, including merging and defining field types v5.2.2 v5.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants