Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

Unsupported date format while parsing Exchange Emails #22

Closed
coreagile opened this issue Jan 29, 2013 · 14 comments
Closed

Unsupported date format while parsing Exchange Emails #22

coreagile opened this issue Jan 29, 2013 · 14 comments

Comments

@coreagile
Copy link

This format showed up while parsing a bunch of our customers' data:

Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date 
field [16 Aug 2010 13:20:42 +0200], tried both date format [dateOptionalTime], and 
timestamp number0

It is keeping a bunch of our emails from getting parsed. Sadness!

@drewr
Copy link

drewr commented Jan 29, 2013

This means the date string you're supplying in the source document ES is not auto-recognized. Please see http://www.elasticsearch.org/guide/reference/mapping/date-format.html and supply the correct format string in the mapping for your date field.

FYI, issues are not meant to be opened for general ES usage questions. Please post to the mailing list if you're still having trouble.

@drewr drewr closed this as completed Jan 29, 2013
@coreagile
Copy link
Author

So people who work on the elasticsearch-mapper-attachment plugin expect me to take an email pulled directly out of Outlook and CHANGE the dates inside of it? I assure you I didn't send any malformed dates into ElasticSearch. This was an email that Outlook created that I stored as an attachment.

@drewr
Copy link

drewr commented Jan 29, 2013

Gosh, I totally overlooked that this was for the mapper plugin and not core ES. These github notifications come from all over the place!

This is either an issue with Tika or our integration with it. I'll reopen so we can take a look.

@drewr drewr reopened this Jan 29, 2013
@coreagile
Copy link
Author

Thanks!

@ghost
Copy link

ghost commented May 14, 2013

Same problem to store webpages.

For example, I have this problem when trying to index http://www.unm.edu/

Any workaround to have something working?

@spinscale
Copy link

@scstarkey @tpatris

can you provide us some sample data your are indexing, which fails. I have an assumption, that the Tika is extracting the date from your document, but stores it wrong.

You might be able to change the date formattings inside of the attachment plugin like this (just a wild guess, but worth a try):

{
    "person" : {
        "properties" : {
            "file" : { 
                "type" : "attachment",
                "fields" : {
                    "date" : {"store" : "yyyy/MM/dd||date_optional_time||date_time"},
                }
            }
        }
    }
}

Note: The format above needs to be changed, according to http://www.elasticsearch.org/guide/reference/mapping/date-format/

I hope this helps, but anyway, just post your samples here, in order to be make sure it is not a different bug we are chasing.

@ghost
Copy link

ghost commented May 15, 2013

I can not paste all the content of the HTML that I want to index here but you can get it by using ctrl + u in your browser on the page http://www.unm.edu/.

My error is:

{"error"=>"MapperParsingException[Failed to parse [content.date]]; nested: MapperParsingException[failed to parse date field [Tue, 14 May 2013 08:000:11 -0440], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: \"Tue, 14 May 2013 08:000:11 -0440\"]; ", "status"=>400}

My mapping is:


mappings: {
  weblink: {
    properties: {
      tags: {
        store: yes
        analyzer: keyword
        boost: 2
        type: string
      }
      id: {
        type: integer
      }
      content: {
        path: full
        type: attachment
        fields: {
          content: {
            store: yes
            term_vector: with_positions_offsets
            type: string
          }
          author: {
            store: yes
            type: string
          }
          title: {
            store: yes
            type: string
          }
          keywords: {
            store: yes
            type: string
          }
          name: {
            store: yes
            type: string
          }
          date: {
            format: dateOptionalTime
            type: date
          }
          content_type: {
            store: yes
            type: string
          }
        }
      }
      library_id: {
        type: long
      }
      created_at: {
        store: yes
        format: dateOptionalTime
        type: date
      }
      user_id: {
        type: integer
      }
      type: {
        type: string
      }
      url: {
        index: not_analyzed
        omit_norms: true
        index_options: docs
        type: string
      }
    }
  }
}

@spinscale
Copy link

Hey,

looking at the HTML source, specifically at this line

<meta content="Thu, 16 May 2013 01:000:12 -0440" name="date" />

shows a custom date format, which needs to be configured explicitly, as mentioned in my last post. See http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html for possible options.

However I am a bit unsure about that format, the 000 makes me pretty unsure about it. Reloading it gives me a different date, so there is some caching involved and you should be able to get it working.

@ghost
Copy link

ghost commented Jun 17, 2013

Hello Alexander,

First thanks for your answer and sorry for the time since my last message.

So... your answer actually doesn't answer to our problem. I will try to explain exactly what it is:

We are building a bookmarking tool, when a user bookmark an URL, we index the full page. It means we don't know what kind of date formats will be used in the webpage content.

So my question is: How to get rid of this error without have to do something specific about date formats?

Thanks

@katta
Copy link

katta commented Jun 25, 2013

Yes is there a simple way to ignore or override the date if one doesnt care about the formats. I mean cant we just store it as string ??

Tried the following mapping

{
    "files-type": {
        "properties": {
            "content": {
                "type": "attachment",
                "fields": {
                    "content": {
                        "store": "yes",
                        "include_in_all": true,
                        "term_vector": "with_positions_offsets"
                    },                    
                    "date" : { "type": "string" }
                }
            }
        }
    }
}

But attachment type seem to override the one I give explicitly and changes it back to

  {
    "properties": {
            "content": {
                "fields": {
                    "author": {
                        "type": "string"
                    },
                    "content": {
                        "include_in_all": true,
                        "store": "yes",
                        "term_vector": "with_positions_offsets",
                        "type": "string"
                    },
                    "content_type": {
                        "type": "string"
                    },
                    "date": {
                        "format": "dateOptionalTime",
                        "type": "date"
                    },
                    "keywords": {
                        "type": "string"
                    },
                    "name": {
                        "type": "string"
                    },
                    "title": {
                        "type": "string"
                    }
                },
                "path": "full",
                "type": "attachment"
            }
        }
  }

@dadoonet
Copy link
Member

Heya,

Jumping in this thread. In next 1.9.0 version, mapper attachment plugin will now ignore metadata fields in case of error, unless you ask it to fail explicitly. See #38.

About mapping, I will look at it. I just fixed something similar in #37 about using multifield.

dadoonet added a commit that referenced this issue Aug 20, 2013
If you define some specific mapping for your file content, such as the following:

```javascript
{
    "person": {
        "properties": {
            "file": {
                "type": "attachment",
                "path": "full",
                "fields": {
                    "date": { "type": "string" }
                }
            }
        }
    }
}
```

And then, if you ask back the mapping, you get:

```javascript
{
   "person":{
      "properties":{
         "file":{
            "type":"attachment",
            "path":"full",
            "fields":{
               "file":{
                  "type":"string"
               },
               "author":{
                  "type":"string"
               },
               "title":{
                  "type":"string"
               },
               "name":{
                  "type":"string"
               },
               "date":{
                  "type":"date",
                  "format":"dateOptionalTime"
               },
               "keywords":{
                  "type":"string"
               },
               "content_type":{
                  "type":"string"
               }
            }
         }
      }
   }
}
```

All your settings have been overwrited by the mapper plugin.

See also issue #22 where the issue was found.

Closes #39.
@dadoonet
Copy link
Member

Did someone tested mapper 1.9? Closing this issue but feel free to reopen if the error still occurs.

@506764932
Copy link

same question
my mapping is
"starttime": {
"type": "date",
"format":"yyyy/MM"
}
and my data is
"starttime":"2015/01"
and exceptionis
MapperParsingException[failed to parse date field [1997/09], tried both date format[dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "1997/09" is malformed at "/09"];
what should i do?

@dadoonet
Copy link
Member

Sorry but how is this related to mapper attachment plugin? I mean that starttime is not generated by the mapper plugin, right?

That said, I'm pretty sure your mapping has not been applied as Date parser is stil using the default format.

I'd open a thread on the mailing list and provide a full script which shows exactly what you are doing. So we can help you there.

If you think it's absolutely related to the mapper plugin, you can open a new issue and provide all the same details I just mentioned.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants