Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlalchemy marshmallow avoid loading data into session #62

Closed
ShankarGanesh opened this issue Apr 11, 2016 · 13 comments

Comments

@ShankarGanesh
Copy link

commented Apr 11, 2016

Is there a way to avoid inserting the data into session while using Marshmallow - sqlalchemy .load()

Because we tried to manage the objects by ourself (adding them into session if required) . Will add into the session if required , but for only validation I need to use load () method which is provided by marshmallow-sqlalchemy

@nickw444

This comment has been minimized.

Copy link

commented Apr 15, 2016

+1 on this, also documentation seems to contradict the requirement for a session to be passed.

https://github.com/marshmallow-code/marshmallow-sqlalchemy/blob/dev/marshmallow_sqlalchemy/schema.py#L185

    def load(self, data, session=None, instance=None, *args, **kwargs):
        """Deserialize data to internal representation.
        :param session: Optional SQLAlchemy session.
        :param instance: Optional existing instance to modify.
        """
        self.session = session or self.session
        self.instance = instance or self.instance
        if not self.session:
            raise ValueError('Deserialization requires a session')

Docs say it's optional, but if not provided, an exception is thrown

@ShankarGanesh

This comment has been minimized.

Copy link
Author

commented Apr 29, 2016

@nickw444 @jmcarp @singingwolfboy @dpwrussell @rudaporto

I understood that passing session is mandatory due to we can't load the data without session, as it required to READ data from DB Model. But why we are adding the data into session as well while using load/ validate. Is there a way ignore it. Please let me know if there is any work around for the same.

@dpwrussell

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2016

@nickw444 session is optional in load because this is an opportunity to override the session specified in https://marshmallow-sqlalchemy.readthedocs.io/en/latest/api_reference.html#marshmallow_sqlalchemy.ModelSchema. Or specify it if none was.

@ShankarGanesh Please do not mention people just to get attention in issues, that is super bad form. This is an open source project, you can't have expectations about timeliness of support when it's accomplished by volunteers.

I also don't fully understand your question, post a code example and the issue.

@ShankarGanesh

This comment has been minimized.

Copy link
Author

commented May 2, 2016

@dpwrussell Sorry about mentioning people in the thread. I will not do that in future.

Sample mentioned in the documentation

author = Author(name='Chuck Paluhniuk')

author_schema.dump(author).data
dump_data = { 'name': 'Chuck Paluhniuk'}
author_schema.load(dump_data, session=session).data

<Author(name='Chuck Paluhniuk')>

The above-mentioned load / validate will add the data into session itself . To be brief session.dirty will be set to True.

How to avoid this issue ? I don't want data needs to be added into session.

Thanks,
Shankar.

@dpwrussell

This comment has been minimized.

Copy link
Contributor

commented May 2, 2016

Ok, I think I see what you are after. Newly created objects are in a Transient state whereas newly loaded objects are in Pending state.

The reason that is currently the case is that it is possible to do this:

author = Author(name='Chuck Paluhniuk')
book = Book(title='Fight Club', author=author)
session.add(author)
session.add(book)
session.commit()
print author
# <Author(name=u'Chuck Paluhniuk')>

dump_data = {'id': 1}
author = author_schema.load(dump_data, session=session).data
print author
# <Author(name=u'Chuck Paluhniuk')>

This is a pretty typical use-case so you can see why it is done that way.

I assume what you are trying to do is load new data from serialized format, but have some criteria where you want to add them to the session or not. I assume you also want to have all the objects around for reasons other than committing? Or perhaps you want to add them to different sessions (although you could do that one use case with the current implementation) or at a time of your choosing etc.

dump_data = {'books': [1], 'name': 'Chuck Paluhniuk'}
author = author_schema.load(dump_data).data
# Some arbitrary criteria
if author.books:
    session.add(author)
# Make a list of all authors
all_authors.append(author)

# Do something with all the authors regardless of whether they were added to session
print authors

I can't immediately think of a reason why that would not be a bad option as possibly an alternate method. @sloria Thoughts?

@ShankarGanesh

This comment has been minimized.

Copy link
Author

commented May 7, 2016

@dpwrussell Thanks a lot . You got my requirement exactly . In my scenario I had some arbitrary conditions, I will insert the data into same session later if the condition succeeds.
It looks we can change the existing code to support this.

Could I submit a Pull request for the below changes ?

Suggestion:

  1. Adding dump_session (Boolean Flag) which defaults to True , which can override dump_session variable value while calling load / validate method. Note : Passing the dump_session flag as False will not dump data into session.

  2. Adding the new Class(duplicate) similar to Model Schema but without post_load method which seems to dumps the data into session

Please add your thoughts on this and let me know if I am missing anything.

I prefer for the option1 as it is an integrated solution with the existing flow and minimal code changes required .

Thanks,
Shankar.

@dpwrussell

This comment has been minimized.

Copy link
Contributor

commented May 21, 2016

The get_instance method is what causes this, but if you remove that you lose the ability to use the database to populate the other fields of an object. E.g. if you load an object with just an ID.

Perhaps you might be better off just doing something like this:

dump_data = {'id': 1, 'name': 'Chuck Paluhniuk'}
author = author_schema.load(dump_data, session=session).data

state = inspect(author)
print 'Transient: %s, Pending %s, Detached: %s, Persistent: %s' % (
    state.transient, state.pending, state.detached, state.persistent
)
#Transient: False, Pending False, Detached: False, Persistent: True
print 'dirty: %s' % session.dirty
#dirty: IdentitySet([<Author(name=u'Chuck Paluhniuk')>])

sa.orm.session.make_transient(author)

state = inspect(author)
print 'Transient: %s, Pending %s, Detached: %s, Persistent: %s' % (
    state.transient, state.pending, state.detached, state.persistent
)
#Transient: True, Pending False, Detached: False, Persistent: False
print 'dirty: %s' % session.dirty
#dirty: IdentitySet([])
@medavis

This comment has been minimized.

Copy link

commented May 24, 2016

I like some of the original posters appreciate some of the features of marshmallow-sqlalchemy that marshmallow alone does not supply, most notably the automated schema generation from the model. And like them, though, I want to be in control of how and when the deserialized data gets added to or merged with the session.

We have been getting the behavior we have wanted because we were using version 0.3.0 in conjunction with flask-marshmallow. From walking the code it appears that flask-marshmallow automatically sets a DummySession object which caused this older version of marshmallow-sqlalchemy to do just the model deserialization as we wanted.

However, when we attempt to upgrade to newer versions of these packages, this approach now fails, throwing an exception "AttributeError: 'DummySession' object has no attribute 'query'". What I would propose is that if the session is None, then the loads method should just do the deserialization and not attempt to add or merge the model to the session. Then if we can get the flask-marshmallow folks to not insert the DummySession that would seem to solve everyone's problems without the need to generate new/different function calls. No?

@bivald

This comment has been minimized.

Copy link

commented Mar 20, 2018

As a maintainer, would you merge this if someone made a pull request? If so, would it be possible for you to give a short suggested way of implementation?

That way it would be easier for a new user (like myself) to give a high quality pull request which has a higher chance of being merged.

@jacksmith15 jacksmith15 referenced this issue Dec 3, 2018
5 of 5 tasks complete
@sloria

This comment has been minimized.

Copy link
Member

commented Dec 3, 2018

Sorry for the delay on this. I think there's a valid use case for deserializing transient instances. #160 attempts to address this by adding a transient Meta class option and load argument. There's still the question of how to handle relationships, though.

If anyone here would be willing to review that PR and/or propose an approach, I would be grateful. I currently don't have much time to work on this myself (not currently using this project).

@jacksmith15

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

@sloria I have added a proposed solution to that PR, which propagates transience through to relationships.

@sloria

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

This is addressed by #160. I will cut a release shortly.

@sloria sloria closed this Feb 4, 2019

@sloria

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

Transient loading support is released in 0.16.0. Thanks again @jacksmith15 for working on it. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.