Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null fields lost when uploading datastore backup with repeated nested records #30

Closed
GoogleCodeExporter opened this issue Apr 14, 2016 · 4 comments

Comments

@GoogleCodeExporter
Copy link
Contributor

What steps will reproduce the problem?

1.  Create data in the app engine datastore which has a repeated nested model 
and includes some null values in the repeating data:

e.g.
      Session ID:      2343243
      Start Time:      11:32
      Events [
          {Event Type: A
           Event Time: 1
           Event Error:  },
          {Event Type: B
           Event Time: 7
           Event Error:  null pointer},
          {Event Type: A
           Event Time: 12
           Event Error:  },


2.  Upload to BQ via datastore backup.


What is the expected output? What do you see instead?

  I expect to be able to establish that the null pointer error is associated with event B.

  The data uploads as multiple repeated fields, with no null value placeholders in the repeating value list:
      Session ID:      2343243
      Start Time:      11:32
      Event Type:     [A, B, A]
      Event Time:     [1, 7, 12]
      Event Error:     [null pointer]

  It is therefore not possible to usefully query the nested fields with null values.


What version of the product are you using? On what operating system?

  Tested on GAE and BQ versions live on Thursday, July 25th, 2013 (NZST).


Please provide any additional information below.

This issue was addressed on SO in June 2013, but the issue of null data was not 
raised.  The workaround provided works in the absence of null data, so the 
issue may have received little priority.

http://stackoverflow.com/questions/17228281



Original issue reported on code.google.com by martin.r...@hapara.com on 26 Jul 2013 at 5:11

@GoogleCodeExporter
Copy link
Contributor Author

Could you share a little additional information:

- the structure of your appengine entity (specifically how it is declared in 
your code)
- the job id of the import

The current implementation turns appengine entities into bigquery nested 
records so if your entity really is:
[ { name, time, error } ]
then you should get a nested record and not a 3 top level list fields:
[ name ]
[ time ]
[ error ]
however this depends on how the top level entity is constructed in appengine 
and that is why I am asking for your appengine entity declaration.

Original comment by sna...@google.com on 26 Jul 2013 at 5:03

@GoogleCodeExporter
Copy link
Contributor Author

Glad to provide more info.

Job IDs for upload (from python 2.7 and 2.5 respectively):
  job_b83f91fb3fd94a2fbc7767699e253854
  job_d63291113ca240dc95f3a58be66bd504


Test data creation code:

from datetime import datetime
from google.appengine.ext.ndb.model import Model, StringProperty, 
StructuredProperty, IntegerProperty, DateTimeProperty


class Event(Model):
    eventType = StringProperty()
    eventTime = IntegerProperty()
    eventError = StringProperty()


class Session(Model):
    sessionID = StringProperty()
    startTime = DateTimeProperty()
    events = StructuredProperty(modelclass=Event, repeated=True)


class BQTestHandler(webapp.RequestHandler):

    def get(self):
        event1 = Event(eventType='A', eventTime=1)
        event2 = Event(eventType='B', eventTime=7, eventError='Null Pointer')
        event3 = Event(eventType='A', eventTime=12)
        session = Session(sessionID='23432', startTime=datetime.now(), events=[event1, event2, event3])
        session.put()

Original comment by martin.r...@hapara.com on 26 Jul 2013 at 6:55

@GoogleCodeExporter
Copy link
Contributor Author

So the mismatch here is due to how StructureProperty lays out the data:
https://developers.google.com/appengine/docs/python/ndb/properties#structured

That doc explains that it lays it out as an array of repeated fields. It also 
calls out the restriction that a repeated structured property cannot contain 
another repeated structured property / field. Which is a result of this layout.

I don't yet have a solution for you but I am investing LocalStructuredProperty 
and its behavior. Also, I am investigating the underlying db.Model library and 
checking if it has a way of expressing structured properties. I will update you 
once I know more.

Original comment by sna...@google.com on 26 Jul 2013 at 9:03

@GoogleCodeExporter
Copy link
Contributor Author

It looks like this issue is quite old, and the engineer who was working on it 
has left the company.

Resolving due to lack of activity; please re-open if this is still an issue for 
you.

Original comment by thomasp...@google.com on 22 Aug 2014 at 5:45

  • Changed state: WontFix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant