Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for Visio and potm files #23214

Merged
merged 2 commits into from Feb 17, 2017

Conversation

@dadoonet
Copy link
Member

dadoonet commented Feb 16, 2017

Related to #22077

This PR comes with 2 changes, one for ingest-attachment and the other for mapper-attachments.
It's essentially a backport of #22079 for 5.x series.

Ingest Attachment Plugin

  • Send a non supported document to an ingest pipeline using ingest-attachment
  • If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Mapper Attachments Plugin

  • Parse a non supported document using mapper-attachments
  • If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Note that for this one as we did not apply yet #22963 it hides the fact that we removed the potm sample file from the tika big ZIP file.

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 16, 2017

as #22963 has been merged, I pushed new changes which makes more obvious that we are removing support for potm files.

@jasontedor It's ready for review.

@clintongormley I'd like to push it as well in 5.3 and 5.2 branches as it's a bug fix. Do you agree?

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Feb 16, 2017

Can you please rebase this as currently there are conflicts with the target branch?

I'm fine with thing going into 5.3, I'm not fine with this going into 5.2.

@dadoonet dadoonet removed the v5.2.2 label Feb 16, 2017
@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 16, 2017

Can you please rebase this as currently there are conflicts with the target branch?

Can you refresh your browser? I don't see the conflict on my side.

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Feb 16, 2017

Can you refresh your browser? I don't see the conflict on my side.

It's not a browser refresh issue.

12:54:55 [jason:~/src/elastic/elasticsearch-5.x] 5.x+ ± git fetch origin 5.x
From github.com:elastic/elasticsearch
 * branch                  5.x        -> FETCH_HEAD
12:55:01 [jason:~/src/elastic/elasticsearch-5.x] 5.x+ ± git fetch origin pull/23214/head:pr/23214
remote: Counting objects: 72, done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 72 (delta 12), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (72/72), done.
From github.com:elastic/elasticsearch
 * [new ref]               refs/pull/23214/head -> pr/23214
12:55:27 [jason:~/src/elastic/elasticsearch-5.x] 5.x+ ± git checkout pr/23214
Switched to branch 'pr/23214'
12:55:34 [jason:~/src/elastic/elasticsearch-5.x] pr/23214+ ± git rebase 5.x
First, rewinding head to replay your work on top of it...
Applying: Remove support for Visio and potm files
Applying: Remove support for Visio and potm files
Using index info to reconstruct a base tree...
A	plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip deleted in HEAD and modified in Remove support for Visio and potm files. Version Remove support for Visio and potm files of plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip left in tree.
error: Failed to merge in the changes.
Patch failed at 0002 Remove support for Visio and potm files
The copy of the patch that failed is found in: /Users/jason/src/elastic/elasticsearch/.git/worktrees/elasticsearch-5.x/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

12:55:39 [jason:~/src/elastic/elasticsearch-5.x] 954f7c2add(+0/-0)+ REBASING 128 ± git status
rebase in progress; onto 5ac2f9f311
You are currently rebasing branch 'pr/23214' on '5ac2f9f311'.
  (fix conflicts and then run "git rebase --continue")
  (use "git rebase --skip" to skip this patch)
  (use "git rebase --abort" to check out the original branch)

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   plugins/mapper-attachments/build.gradle
	modified:   plugins/mapper-attachments/src/main/java/org/elasticsearch/mapper/attachments/TikaImpl.java
	modified:   plugins/mapper-attachments/src/test/java/org/elasticsearch/mapper/attachments/VariousDocTests.java
	new file:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/sample-files/issue-22077.doc
	new file:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/sample-files/issue-22077.docx
	new file:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/sample-files/issue-22077.vsdx

Unmerged paths:
  (use "git reset HEAD <file>..." to unstage)
  (use "git add/rm <file>..." as appropriate to mark resolution)

	deleted by us:   plugins/mapper-attachments/src/test/resources/org/elasticsearch/index/mapper/attachment/test/tika-files.zip

12:55:43 [jason:~/src/elastic/elasticsearch-5.x] 954f7c2add(+0/-0)+ REBASING ± 
@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 16, 2017

Hmmm. Why do you rebase instead of merging?

I understand why rebase is failing on the second commit but it should not be an issue because of the next commits.

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Feb 16, 2017

Hmmm. Why do you rebase instead of merging?

Because that's what you're going to go when you merge the commit it, and that's why GitHub is already showing you the conflict.

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Feb 16, 2017

I understand why rebase is failing on the second commit but it should not be an issue because of the next commits.

Also, that's not how rebasing works. Rebasing replays every commit and halts on conflict.

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 16, 2017

img_0182

But GitHub does not show a conflict. But ok I'll look at it tomorrow

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 16, 2017

Note that I merged 5.x in my branch with the 3rd commit.

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Feb 16, 2017

It does when I look:

screen shot 2017-02-16 at 1 47 02 pm

And anyway, I showed from the command line where the conflict is. 😄

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 16, 2017

I understand why you see this conflict and I don't. You defined rebase and merge as the default way to merge. I didn't.

So if you don't rebase there is no conflict.

Copy link
Member

jasontedor left a comment

LGTM. Please squash when merging.

dadoonet added 2 commits Feb 16, 2017
* Send a non supported document to an ingest pipeline using `ingest-attachment`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

```
GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}
```

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077

Backport of #22079 in 5.x branch (5.3)
* Parse a non supported document using `mapper-attachments`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077 and #22079 for mapper-attachments plugin
@dadoonet dadoonet force-pushed the dadoonet:fix/22077-ingest-attachment-5x branch to 64953e3 Feb 17, 2017
@dadoonet dadoonet merged commit 64953e3 into elastic:5.x Feb 17, 2017
1 check passed
1 check passed
CLA Commit author is a member of Elasticsearch
Details
@dadoonet dadoonet deleted the dadoonet:fix/22077-ingest-attachment-5x branch Feb 17, 2017
@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 17, 2017

Thanks @jasontedor. I rebased and merged.

@dadoonet dadoonet added the v5.2.2 label Feb 20, 2017
dadoonet added a commit that referenced this pull request Feb 20, 2017
* Send a non supported document to an ingest pipeline using `ingest-attachment`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

```
GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}
```

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077

Backport of #23214 in 5.2 branch
dadoonet added a commit that referenced this pull request Feb 20, 2017
* Parse a non supported document using `mapper-attachments`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077 and #22079 for mapper-attachments plugin

Backport of #23214 in 5.2 branch
@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Feb 20, 2017

As discussed with @clintongormley, this has been also pushed to 5.2 branch.

Related commits:

  • 76a977a: Remove support for Visio and potm files (ingest-attachment)
  • 07a9f29: Remove support for Visio and potm files (mapper-attachments)
  • 3fda86a: Replace tika-files.zip by a tika-files dir (ingest-attachment tests)
  • 0561d1b: Replace tika-files.zip by a tika-files dir (mapper-attachments tests)
@dumakant

This comment has been minimized.

Copy link

dumakant commented Mar 22, 2017

I am using Elasticsearch 5.1.2 and facing the same error

java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point.
Can your fixes be backported to 5.1.2 as well. Also our instances are in production, how can we consume the fixes?

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Mar 22, 2017

@dumakant No it won't be backported.

You need to upgrade to 5.2.2 which is pretty much straightforward with a rolling upgrade (even easier if you are using elastic cloud).

@RkirkCBD

This comment has been minimized.

Copy link

RkirkCBD commented Apr 22, 2017

I'm running 5.30 and I am receiving this error as a fatal error. Is the fix in the 5.3 release?

[2017-04-22T10:54:58,532][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-node-docserver] fatal error in thread [elasticsearch[es-node-docserver][bulk][T#9]], exiting
java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Apr 23, 2017

OMG! I did not push that fix in 5.3 branch apparently. So it's fixed in 5.2, 5.4 but not 5.3...

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Apr 23, 2017

@RkirkCBD could you open a new issue and I'll push later today the fix in 5.3 which will go hopefully in 5.3.2?

Thanks a lot for reporting !

@dadoonet dadoonet removed the v5.3.0 label Apr 23, 2017
@RkirkCBD

This comment has been minimized.

Copy link

RkirkCBD commented Apr 23, 2017

See #24273. Thanks, when should 5.3.2 be released. Is it possible to downgrade my cluster to 5.2.2?

@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Apr 23, 2017

AFAIK soonish. No you can't downgrade.

dadoonet added a commit that referenced this pull request Apr 23, 2017
* Send a non supported document to an ingest pipeline using `ingest-attachment`
* If Tika is not able to parse the document because of a missing class (we are not importing all jars needed by Tika), Tika throws a Throwable which is not catch.

This commit removes support for Visio and POTM office files.

So elasticsearch is not killed anymore when you run a command like:

```
GET _ingest/pipeline/_simulate
{
  "pipeline" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "file"
        }
      }
    ]
  },
  "docs" : [
    {
      "_source" : {
        "file" : "BASE64CONTENT"
      }
    }
  ]
}
```

The good news is that it does not kill the node anymore and allows to extract the text which is in the Office document even if we have a Visio content (which is not extracted anymore).

Related to #22077

Backport of #23214 in 5.3 branch

(cherry picked from commit 76a977a)
@dadoonet

This comment has been minimized.

Copy link
Member Author

dadoonet commented Apr 23, 2017

For the record, I pushed the missing commit in 5.3 as well: dc4888e

Apparently I pushed only one of the 2 commits from this PR in the 5.3 branch... :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.