Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode keys of dictionary partially fail #66

Closed
Diolor opened this issue Mar 22, 2014 · 3 comments
Closed

Unicode keys of dictionary partially fail #66

Diolor opened this issue Mar 22, 2014 · 3 comments

Comments

@Diolor
Copy link

Diolor commented Mar 22, 2014

Hi,

I found (in a very painful way:) ) that if there are unicode keys on a inner dictionary ES or the python library fails to add the object. e.g. for the following dictionary:

{
    u'city': u'Toronto',
    u'name': u'PostBeyond',
    u'events': {
        u'title': u'ExtremeCachingwithPHP',
        u'start_date': u'2014-01-08T00: 00: 00+00: 00'
    }
}

ES will process/add correctly city and name fields. However the inner events.title and events.start_date will not be added.

This will be correctly processed:

{
    'city': u'Toronto',
    'name': u'PostBeyond',
    'events': {
        'title': u'ExtremeCachingwithPHP',
        'start_date': u'2014-01-08T00: 00: 00+00: 00'
    }
}

As a workaround I tried the following which for some reason does not work. I guess the 'events' is still somehow in the memory.

temp_events = {str(key): val for key, val in doc['events'].items()} #make unicode keys strings
doc.pop('events',None)
doc['events'] = temp_events  #this will not be added either
doc['events2'] = temp_events #this will be added

Anyway it's 6am

Best,
D

@honzakral
Copy link
Contributor

Hi Diolor,

I cannot replicate your issue:

>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>> es.index(index='i', doc_type='t', id=42, body={
...    u'city': u'Toronto',
...    u'name': u'PostBeyond',
...    u'events': {
...        u'title': u'ExtremeCachingwithPHP',
...        u'start_date': u'2014-01-08T00: 00: 00+00: 00'
...    }
... })
{u'_id': u'42', u'_index': u'i', u'_type': u't', u'_version': 1, u'created': False}
>>> es.get(index='i', doc_type='t', id=42)
{u'_id': u'42', u'_index': u'i', u'_source': {u'city': u'Toronto',
  u'events': {u'start_date': u'2014-01-08T00: 00: 00+00: 00', u'title': u'ExtremeCachingwithPHP'},
  u'name': u'PostBeyond'},
u'_type': u't', u'_version': 1, u'found': True}

Could it maybe be caused by the misspelled date in your example?

Honza

@Diolor
Copy link
Author

Diolor commented Mar 24, 2014

Hi Honza,

Can you replicate this?

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch()
actions = []

action = {
    '_type': 't',
    '_id': '52cb45cec36b4442751728f5',
    '_source': {
        u'city': u'Toronto',
        u'name': u'PostBeyond',
        u'_id': {
            '$oid': '52cb45cdc36b4442751728f4'
        },
        u'events': {
            u'title': u'ExtremeCachingwithPHP',
            u'event_id': {
                '$oid': '52cb45cec36b4442751728f5'
            },
            u'start_date': u'2014-01-08T00:00:00+00:00'
        }
    },
    '_index': 'i'
}

actions.append(action)
helpers.bulk(es, actions)



from elasticsearch.client import IndicesClient

ic = IndicesClient(es)
ic.get_mapping(index='i',doc_type='t')

The last line gives me :

>>> ic.get_mapping(index='i',doc_type='t')
{u'i': {u'mappings': {u't': {u'properties': {u'$oid': {u'type': u'string'}, u'city': {u'type': u'string'}}}}}}

The conflict is with the second _id inside the _source. If the action does't have a second id :

action = {
    '_type': 't',
    '_id': '52cb45cec36b4442751728f5',
    '_source': {
        u'city': u'Toronto',
        u'name': u'PostBeyond',
        u'events': {
            u'title': u'ExtremeCachingwithPHP',
            u'event_id': {
                '$oid': '52cb45cec36b4442751728f5'
            },
            u'start_date': u'2014-01-08T00:00:00+00:00'
        }
    },
    '_index': 'i'
}

The mapping is correct:

>>> ic.get_mapping(index='i',doc_type='t')
{u'i': {u'mappings': {u't': {u'properties': {u'$oid': {u'type': u'string'}, u'city': {u'type': u'string'}, 
u'events': {u'properties': {u'event_id': {u'properties': {u'$oid': {u'type': u'string'}}}, u'start_date': 
{u'type': u'date', u'format': u'dateOptionalTime'}, u'title': {u'type': u'string'}}}, u'name': {u'type': 
u'string'}}}}}}

Apparently I this is not python client's problem. ES searches for a _id field[1]. Still wondering if I can have a _id field inside the _source different than the ES's doc id. I should better address it to the main ES community.

Best,
D

[1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html#mapping-id-field

@honzakral
Copy link
Contributor

Yes, the _id field has to be a value, not another object. The correct way to handle this is to transform your document before handing it off to bulk or use the expand_action_callback to do it from within.

As this issue is not python related I am closing the ticket, please feel free to open a new one for any issue you find. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants