-
Notifications
You must be signed in to change notification settings - Fork 95
Unsupported date format while parsing Exchange Emails #22
Comments
This means the date string you're supplying in the source document ES is not auto-recognized. Please see http://www.elasticsearch.org/guide/reference/mapping/date-format.html and supply the correct format string in the mapping for your date field. FYI, issues are not meant to be opened for general ES usage questions. Please post to the mailing list if you're still having trouble. |
So people who work on the elasticsearch-mapper-attachment plugin expect me to take an email pulled directly out of Outlook and CHANGE the dates inside of it? I assure you I didn't send any malformed dates into ElasticSearch. This was an email that Outlook created that I stored as an attachment. |
Gosh, I totally overlooked that this was for the mapper plugin and not core ES. These github notifications come from all over the place! This is either an issue with Tika or our integration with it. I'll reopen so we can take a look. |
Thanks! |
Same problem to store webpages. For example, I have this problem when trying to index http://www.unm.edu/ Any workaround to have something working? |
@scstarkey @tpatris can you provide us some sample data your are indexing, which fails. I have an assumption, that the Tika is extracting the date from your document, but stores it wrong. You might be able to change the date formattings inside of the attachment plugin like this (just a wild guess, but worth a try):
Note: The format above needs to be changed, according to http://www.elasticsearch.org/guide/reference/mapping/date-format/ I hope this helps, but anyway, just post your samples here, in order to be make sure it is not a different bug we are chasing. |
I can not paste all the content of the HTML that I want to index here but you can get it by using ctrl + u in your browser on the page http://www.unm.edu/. My error is:
My mapping is:
|
Hey, looking at the HTML source, specifically at this line
shows a custom date format, which needs to be configured explicitly, as mentioned in my last post. See http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for possible options. However I am a bit unsure about that format, the |
Hello Alexander, First thanks for your answer and sorry for the time since my last message. So... your answer actually doesn't answer to our problem. I will try to explain exactly what it is: We are building a bookmarking tool, when a user bookmark an URL, we index the full page. It means we don't know what kind of date formats will be used in the webpage content. So my question is: How to get rid of this error without have to do something specific about date formats? Thanks |
Yes is there a simple way to ignore or override the date if one doesnt care about the formats. I mean cant we just store it as string ?? Tried the following mapping {
"files-type": {
"properties": {
"content": {
"type": "attachment",
"fields": {
"content": {
"store": "yes",
"include_in_all": true,
"term_vector": "with_positions_offsets"
},
"date" : { "type": "string" }
}
}
}
}
} But attachment type seem to override the one I give explicitly and changes it back to {
"properties": {
"content": {
"fields": {
"author": {
"type": "string"
},
"content": {
"include_in_all": true,
"store": "yes",
"term_vector": "with_positions_offsets",
"type": "string"
},
"content_type": {
"type": "string"
},
"date": {
"format": "dateOptionalTime",
"type": "date"
},
"keywords": {
"type": "string"
},
"name": {
"type": "string"
},
"title": {
"type": "string"
}
},
"path": "full",
"type": "attachment"
}
}
} |
If you define some specific mapping for your file content, such as the following: ```javascript { "person": { "properties": { "file": { "type": "attachment", "path": "full", "fields": { "date": { "type": "string" } } } } } } ``` And then, if you ask back the mapping, you get: ```javascript { "person":{ "properties":{ "file":{ "type":"attachment", "path":"full", "fields":{ "file":{ "type":"string" }, "author":{ "type":"string" }, "title":{ "type":"string" }, "name":{ "type":"string" }, "date":{ "type":"date", "format":"dateOptionalTime" }, "keywords":{ "type":"string" }, "content_type":{ "type":"string" } } } } } } ``` All your settings have been overwrited by the mapper plugin. See also issue #22 where the issue was found. Closes #39.
Did someone tested mapper 1.9? Closing this issue but feel free to reopen if the error still occurs. |
same question |
Sorry but how is this related to mapper attachment plugin? I mean that starttime is not generated by the mapper plugin, right? That said, I'm pretty sure your mapping has not been applied as Date parser is stil using the default format. I'd open a thread on the mailing list and provide a full script which shows exactly what you are doing. So we can help you there. If you think it's absolutely related to the mapper plugin, you can open a new issue and provide all the same details I just mentioned. |
This format showed up while parsing a bunch of our customers' data:
It is keeping a bunch of our emails from getting parsed. Sadness!
The text was updated successfully, but these errors were encountered: