-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAMES-2910 HTML could be indexed directly in ElasticSearch #2739
Conversation
The size of indexed JSON documents significantly decreases for emails containing HTML (eg from 196 KB to 8.6 KB)
This avoids needless network ops
IndexAttachments.NO); | ||
String convertToJsonWithoutAttachment = messageToElasticSearchJson.convertToJsonWithoutAttachment(message, ImmutableList.of(USER)); | ||
|
||
System.out.println(convertToJsonWithoutAttachment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug
mailbox/tika/src/main/java/org/apache/james/mailbox/tika/TikaTextExtractor.java
Show resolved
Hide resolved
mvn test 🍏 test this please |
mailbox/store/src/main/java/org/apache/james/mailbox/store/extractor/JsoupTextExtractor.java
Outdated
Show resolved
Hide resolved
.../guice/cassandra-guice/src/main/java/org/apache/james/modules/mailbox/TikaMailboxModule.java
Outdated
Show resolved
Hide resolved
mailbox/store/src/main/java/org/apache/james/mailbox/store/extractor/JsoupTextExtractor.java
Outdated
Show resolved
Hide resolved
mailbox/tika/src/main/java/org/apache/james/mailbox/tika/TikaTextExtractor.java
Outdated
Show resolved
Hide resolved
mailbox/tika/src/test/java/org/apache/james/mailbox/tika/TikaTextExtractorTest.java
Outdated
Show resolved
Hide resolved
mailbox/store/src/main/java/org/apache/james/mailbox/store/extractor/JsoupTextExtractor.java
Outdated
Show resolved
Hide resolved
MemorySpamAssassinContractTest.spamShouldBeDeliveredInSpamMailboxOrInboxWhenMultipleRecipientsConfigurations » ConditionTimeout test this please |
mailbox/store/src/main/java/org/apache/james/mailbox/store/extractor/JsoupTextExtractor.java
Show resolved
Hide resolved
mailbox/tika/src/test/java/org/apache/james/mailbox/tika/TikaTextExtractorTest.java
Outdated
Show resolved
Hide resolved
test this please |
mailbox/tika/src/test/java/org/apache/james/mailbox/tika/TikaTextExtractorTest.java
Show resolved
Hide resolved
The email corpus we tested this against does not contains HTML nor attachment and thus is not significative for performance evaluation of this change set. |
Merged |
No description provided.