New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Switch from Peewee to SQLAlchemy #876

Closed
jeffwidman opened this Issue Feb 29, 2016 · 14 comments

Comments

Projects
None yet
3 participants
@jeffwidman
Contributor

jeffwidman commented Feb 29, 2016

Can you clarify a bit why you went with Peewee instead of SQLAlchemy?

The latter is a much more power ORM, versus the former focuses on simplicity. Peewee was probably the right choice in the early days of the project, but since you're looking to build this into a full-fledged business, wouldn't it be better to switch to SQLAlchemy? SQLAlchemy also offers the benefit of being able to drop into Core when the ORM isn't flexible enough.

This assumes you're using the ORM for MySQL/Postgres connectors... If not, then I simply misunderstand how the architecture/connectors work.

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr Mar 8, 2016

Member

We're using Peewee as an ORM for the metadata (users, queries, results, dashboards, etc). I chose Peewee because it was simple but powerful. I did give SQLAlchemy a try, but actually it failed me when I did want to run a raw SQL query and get back ORM objects (something that is really simple w/ Peewee).

But today I'm more open to trying out SQLAlchemy again:

  1. Peewee doesn't have migrations support. There are projects like arnold, but they are very bare bones.
  2. No optimistic-locking support.
  3. SQLAlchemy is the de-facto ORM for Python, resulting in much wider support in other libraries.

Having said that, we're not really limited by Peewee's feature set. So I'm not sure what will be more work: switching to SQLAlchemy or creating our own support for migrations with Peewee.

Member

arikfr commented Mar 8, 2016

We're using Peewee as an ORM for the metadata (users, queries, results, dashboards, etc). I chose Peewee because it was simple but powerful. I did give SQLAlchemy a try, but actually it failed me when I did want to run a raw SQL query and get back ORM objects (something that is really simple w/ Peewee).

But today I'm more open to trying out SQLAlchemy again:

  1. Peewee doesn't have migrations support. There are projects like arnold, but they are very bare bones.
  2. No optimistic-locking support.
  3. SQLAlchemy is the de-facto ORM for Python, resulting in much wider support in other libraries.

Having said that, we're not really limited by Peewee's feature set. So I'm not sure what will be more work: switching to SQLAlchemy or creating our own support for migrations with Peewee.

@jeffwidman

This comment has been minimized.

Show comment
Hide comment
@jeffwidman

jeffwidman Mar 8, 2016

Contributor

dunno, but I personally am very comfortable with sqlalchemy and much prefer to hack on projects that use it because it's so powerful/flexible/commonly used--between Django ORM and SQLAlchemy, that's all I need to know for 99% of Python ORM projects...

btw, I don't think alembic is tied to sqlalchemy, but I might be wrong...

In general, I'd vote for SQLAlchemy just because it gives you options down the road if you decide you want to do something fancy for some reason

Contributor

jeffwidman commented Mar 8, 2016

dunno, but I personally am very comfortable with sqlalchemy and much prefer to hack on projects that use it because it's so powerful/flexible/commonly used--between Django ORM and SQLAlchemy, that's all I need to know for 99% of Python ORM projects...

btw, I don't think alembic is tied to sqlalchemy, but I might be wrong...

In general, I'd vote for SQLAlchemy just because it gives you options down the road if you decide you want to do something fancy for some reason

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr May 31, 2016

Member

Just to follow up on this, alembic is SQLAlchemy specific. Also, for reference, another peewee/migrations project: https://github.com/klen/peewee_migrate.

I really want to address the issue of migrations already, and basically we need to decide between -

  1. Keep Peewee and add migrations (either our own or one of the mentioned projects).
  2. Switch to SQLAlchemy.

I suspect the first will be faster to implement, but moving to SQLAlchemy will probably pay off in the future.

Member

arikfr commented May 31, 2016

Just to follow up on this, alembic is SQLAlchemy specific. Also, for reference, another peewee/migrations project: https://github.com/klen/peewee_migrate.

I really want to address the issue of migrations already, and basically we need to decide between -

  1. Keep Peewee and add migrations (either our own or one of the mentioned projects).
  2. Switch to SQLAlchemy.

I suspect the first will be faster to implement, but moving to SQLAlchemy will probably pay off in the future.

@jeffwidman

This comment has been minimized.

Show comment
Hide comment
@jeffwidman

jeffwidman May 31, 2016

Contributor

If you go the SQLAlchemy route, I'm happy to help and I can probably justify spending a few days of company timeto help with it since it'd make our upgrade process a lot smoother in the future. I was recently asked but our chief architect about how we plan to handle db migrations for redash metadata and didn't have a great ansswer. I'm already very familiar with SQLAlchemy so could be productive very quickly. I'm at pycon this week and have a couple of weeks of work that need to happen right when I get back, but after that could probably put in some time if you decide to go that route.

Contributor

jeffwidman commented May 31, 2016

If you go the SQLAlchemy route, I'm happy to help and I can probably justify spending a few days of company timeto help with it since it'd make our upgrade process a lot smoother in the future. I was recently asked but our chief architect about how we plan to handle db migrations for redash metadata and didn't have a great ansswer. I'm already very familiar with SQLAlchemy so could be productive very quickly. I'm at pycon this week and have a couple of weeks of work that need to happen right when I get back, but after that could probably put in some time if you decide to go that route.

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr May 31, 2016

Member

Thanks! Help will be appreciated.

In the meantime, if you get the time -- can you post some pointers towards things we might need --

  1. Any good tutorials to start w/ SQLAlchemy?
  2. Should we use Flask-SQLAlchemy? How will it work in non web context (Celery/CLI)?
  3. How to handle JSON fields?
Member

arikfr commented May 31, 2016

Thanks! Help will be appreciated.

In the meantime, if you get the time -- can you post some pointers towards things we might need --

  1. Any good tutorials to start w/ SQLAlchemy?
  2. Should we use Flask-SQLAlchemy? How will it work in non web context (Celery/CLI)?
  3. How to handle JSON fields?
@jeffwidman

This comment has been minimized.

Show comment
Hide comment
@jeffwidman

jeffwidman May 31, 2016

Contributor

Sure I'll post links in the morning when I'm not on my phone.

JSON support is built into SQLAlchemy although currently under a postgres specific section andlast I saw the plan was to refactor into a little more generic abstraction now that MySQL has JSON support...

Good question about flask-sqlalxhemyoutside of flask... if you go with it, you have to subclass model base which requires a flask app context... so the two options are don't use it and write your own convenience wrapper or run celery within a flask app context. Neither is that bad really... Long term, the flask SQLAlchemy team plans to let you pick your own base class for models, but that is likely a ways off. I can post links explaining these two options in the morning.

Contributor

jeffwidman commented May 31, 2016

Sure I'll post links in the morning when I'm not on my phone.

JSON support is built into SQLAlchemy although currently under a postgres specific section andlast I saw the plan was to refactor into a little more generic abstraction now that MySQL has JSON support...

Good question about flask-sqlalxhemyoutside of flask... if you go with it, you have to subclass model base which requires a flask app context... so the two options are don't use it and write your own convenience wrapper or run celery within a flask app context. Neither is that bad really... Long term, the flask SQLAlchemy team plans to let you pick your own base class for models, but that is likely a ways off. I can post links explaining these two options in the morning.

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr May 31, 2016

Member

Sure I'll post links in the morning when I'm not on my phone.

Thanks!

if you go with it, you have to subclass model base which requires a flask app context... so the two options are don't use it and write your own convenience wrapper or run celery within a flask app context.

To run Celery with app context, maybe we can use the new Flask CLI stuff? http://flask.pocoo.org/docs/0.11/cli/

Also another option is alchy which has a compatibility layer for Flask-SQLAlchemy (Flask-Alchy).

Member

arikfr commented May 31, 2016

Sure I'll post links in the morning when I'm not on my phone.

Thanks!

if you go with it, you have to subclass model base which requires a flask app context... so the two options are don't use it and write your own convenience wrapper or run celery within a flask app context.

To run Celery with app context, maybe we can use the new Flask CLI stuff? http://flask.pocoo.org/docs/0.11/cli/

Also another option is alchy which has a compatibility layer for Flask-SQLAlchemy (Flask-Alchy).

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 3, 2016

Hi, I was searching on my phone for something and came across this issue. I'm the author of peewee and, after reading the comments here, wanted to offer any help or advice if there are problems you're running into using peewee. I understand that the lack of automatic schema migrations may be a deal-breaker, but to write your own is actually quite easy and the docs have many examples.

from playhouse.migrate import *

from my_app.models import database

migrator = SchemaMigrator.from_database(database)

# Add a new VARCHAR(255) column named "new_column" to model "my_tbl", then
# create an index on "new_column" + "another_column".
migrate(
    migrator.add_column('my_tbl', 'new_column', CharField(max_length=255)),
    migrator.add_index('my_tbl', ('new_column', 'another_column')),
)

Regarding JSON support, Peewee has supported Postgresql's JSON and JSONB data-types for a while now. The syntax is quite nice if I do say so myself:

# Query a JSON data structure using a nested key lookup:
offense_responses = APIResponse.select().where(
  APIResponse.response['meta']['model'] == 'offense')

# Retrieve a sub-key for each APIResponse. By calling .as_json(), the
# data at the sub-key will be returned as Python objects (dicts, lists,
# etc) instead of serialized JSON.
q = (APIResponse
     .select(
       APIResponse.data['booking']['person'].as_json().alias('person'))
     .where(
       APIResponse.data['meta']['model'] == 'booking'))

for result in q:
    print result.person['name'], result.person['dob']

Regarding optimistic locking, I'm unfortunately not very familiar with the ways this is typically implemented by database libraries. My thought would be to write a model subclass that defined a version number field...maybe something like:

class BaseVersionedModel(BaseModel):
    version = IntegerField(default=0)

    def save_optimistic(self):
        if not self.version:
            return self.save()  # Since this is an `INSERT`, just call regular save method.

        # Update any data that has changed and bump the version counter.
        field_data = dict(self._data)
        current_version = field_data.pop('version', 0)
        field_data = self._prune_fields(field_data, self.dirty_fields)
        if not field_data:
            raise ValueError('No changes have been made.')

        ModelClass = type(self)
        field_data['version'] = ModelClass.version + 1  # Atomic increment

        query = ModelClass.update(**field_data).where(
            (ModelClass.version == current_version) &
            (ModelClass.id == self.id))

        nrows = query.execute()
        if nrows == 0:
            # It looks like another process has updated the version number.
            raise ConflictDetectedException()  # Raise exception? Return False?
        else:
            self.version += 1  # Update in-memory version number.
            return nrows

I don't know if the above is helpful or if it misses the target completely :/

Anyways, please feel free to reach out to me if you've got any questions or would like to discuss solutions to problems you're encountering. And of course, there's nothing wrong with preferring SQLAlchemy to Peewee - there's a reason the Python community has rallied around it as their preferred ORM.

coleifer commented Jun 3, 2016

Hi, I was searching on my phone for something and came across this issue. I'm the author of peewee and, after reading the comments here, wanted to offer any help or advice if there are problems you're running into using peewee. I understand that the lack of automatic schema migrations may be a deal-breaker, but to write your own is actually quite easy and the docs have many examples.

from playhouse.migrate import *

from my_app.models import database

migrator = SchemaMigrator.from_database(database)

# Add a new VARCHAR(255) column named "new_column" to model "my_tbl", then
# create an index on "new_column" + "another_column".
migrate(
    migrator.add_column('my_tbl', 'new_column', CharField(max_length=255)),
    migrator.add_index('my_tbl', ('new_column', 'another_column')),
)

Regarding JSON support, Peewee has supported Postgresql's JSON and JSONB data-types for a while now. The syntax is quite nice if I do say so myself:

# Query a JSON data structure using a nested key lookup:
offense_responses = APIResponse.select().where(
  APIResponse.response['meta']['model'] == 'offense')

# Retrieve a sub-key for each APIResponse. By calling .as_json(), the
# data at the sub-key will be returned as Python objects (dicts, lists,
# etc) instead of serialized JSON.
q = (APIResponse
     .select(
       APIResponse.data['booking']['person'].as_json().alias('person'))
     .where(
       APIResponse.data['meta']['model'] == 'booking'))

for result in q:
    print result.person['name'], result.person['dob']

Regarding optimistic locking, I'm unfortunately not very familiar with the ways this is typically implemented by database libraries. My thought would be to write a model subclass that defined a version number field...maybe something like:

class BaseVersionedModel(BaseModel):
    version = IntegerField(default=0)

    def save_optimistic(self):
        if not self.version:
            return self.save()  # Since this is an `INSERT`, just call regular save method.

        # Update any data that has changed and bump the version counter.
        field_data = dict(self._data)
        current_version = field_data.pop('version', 0)
        field_data = self._prune_fields(field_data, self.dirty_fields)
        if not field_data:
            raise ValueError('No changes have been made.')

        ModelClass = type(self)
        field_data['version'] = ModelClass.version + 1  # Atomic increment

        query = ModelClass.update(**field_data).where(
            (ModelClass.version == current_version) &
            (ModelClass.id == self.id))

        nrows = query.execute()
        if nrows == 0:
            # It looks like another process has updated the version number.
            raise ConflictDetectedException()  # Raise exception? Return False?
        else:
            self.version += 1  # Update in-memory version number.
            return nrows

I don't know if the above is helpful or if it misses the target completely :/

Anyways, please feel free to reach out to me if you've got any questions or would like to discuss solutions to problems you're encountering. And of course, there's nothing wrong with preferring SQLAlchemy to Peewee - there's a reason the Python community has rallied around it as their preferred ORM.

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr Jun 3, 2016

Member

@coleifer thank you for joining the discussion! (and for creating peewee!)

Actually I don't want or need automatic migrations. I never trust these things and I don't mind writing migrations with the API peewee already provides. But what I do need is something to execute those migrations and track what was applied.

I know peewee supports JSON although for "historic" reasons we ended implementing a JSON type on top of text fields, which works great. My comment about JSON fields was for porting to SQLAlchemy.

My concern about continuing to use peewee is the need to implement pieces of supporting functionality, like migrations, optimistic locking and probably others in the future. While each on its own might be simple it increases the footprint/complexity of Redash, and doesn't get the benefit of using well tested implementations.

It feels that peewee has a community of users but most of the development is done by you. This results in you not adding anything to the library that you don't find personally useful. I think this approach is the right one and I don't expect a different behavior, because eventually you're the one who will have to maintain this. But as a user this is a concern. While you can say that I (and others) should join the development, from following peewee's development it doesn't feel like you want this. I might be wrong here though and will be happy to hear your thoughts.

Member

arikfr commented Jun 3, 2016

@coleifer thank you for joining the discussion! (and for creating peewee!)

Actually I don't want or need automatic migrations. I never trust these things and I don't mind writing migrations with the API peewee already provides. But what I do need is something to execute those migrations and track what was applied.

I know peewee supports JSON although for "historic" reasons we ended implementing a JSON type on top of text fields, which works great. My comment about JSON fields was for porting to SQLAlchemy.

My concern about continuing to use peewee is the need to implement pieces of supporting functionality, like migrations, optimistic locking and probably others in the future. While each on its own might be simple it increases the footprint/complexity of Redash, and doesn't get the benefit of using well tested implementations.

It feels that peewee has a community of users but most of the development is done by you. This results in you not adding anything to the library that you don't find personally useful. I think this approach is the right one and I don't expect a different behavior, because eventually you're the one who will have to maintain this. But as a user this is a concern. While you can say that I (and others) should join the development, from following peewee's development it doesn't feel like you want this. I might be wrong here though and will be happy to hear your thoughts.

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 3, 2016

Thanks for the thoughtful response. I definitely understand feeling wary about implementing library-type functionality - there are quite a few reasons for not wanting this in your app's codebase and it certainly can increase the complexity and maintenance requirements.

I welcome contributions to Peewee, the sqlcipher extension is a good example of a significant contribution that was merged. To be honest, any contribution that has tried to address significant changes has suffered from poor implementation. Peewee is a small library and can be very cohesive and very tight as a result. When another user contributes code that doesn't conform to the aesthetic of the rest of the codebase, it's typically not going to get merged. I understand your concerns about this, though, so don't think I'm trying to sell you on a solution :) just thought I'd add my .02 on the nature of the development/community contributions.

coleifer commented Jun 3, 2016

Thanks for the thoughtful response. I definitely understand feeling wary about implementing library-type functionality - there are quite a few reasons for not wanting this in your app's codebase and it certainly can increase the complexity and maintenance requirements.

I welcome contributions to Peewee, the sqlcipher extension is a good example of a significant contribution that was merged. To be honest, any contribution that has tried to address significant changes has suffered from poor implementation. Peewee is a small library and can be very cohesive and very tight as a result. When another user contributes code that doesn't conform to the aesthetic of the rest of the codebase, it's typically not going to get merged. I understand your concerns about this, though, so don't think I'm trying to sell you on a solution :) just thought I'd add my .02 on the nature of the development/community contributions.

@jeffwidman

This comment has been minimized.

Show comment
Hide comment
@jeffwidman

jeffwidman Jun 3, 2016

Contributor

Sorry I've been busier than I expected at Pycon.

Peewee is a small library and can be very cohesive and very tight as a result.

I really like Peewee for a certain class of problems, for exactly this reason. Huge thanks to you @coleifer for keeping a clear vision for the project.

Obviously I agree with @arikfr that redash as a project is getting to the point where having the extra power of SQLAlchemy is worth the additional complexity.

@arikfr here's my goto list of SQLAlchemy articles:

  1. http://alextechrants.blogspot.com/2013/11/10-common-stumbling-blocks-for.html
  2. The SQLAlchemy tutorial is generally all folks need to get started. Be sure to do both the Core and the ORM tutorials.
  3. @miguelgrinberg's tutorial has a good walkthrough of Flask + SQLAlchemy, including things like backreferences and lazy vs eager joins.
  4. Flask-specific, the http://derrickgilland.com/posts/demystifying-flask-sqlalchemy/ is quite good.

A few SQLAlchemy concepts that often are slightly less intuitive:

a. The session, which is basically an additional transaction layer, but on the Python side so that SQLAlchemy can queue/modify DB records locally before they get passed to the DB.
b. The concept of SQLAlchemy core which is much more performant and flexible than the ORM layer, with the tradeoff that you're effectively writing SQL constructs (with some shortcuts) rather than handling objects.
c. The difference between the DataMapper pattern vs the Active Record pattern. Understanding this isn't necessary in order to use SQLAlchemy, but it can be helpful. For example, this design decision made it possible to have the ORM layer built on top of a separate core layer.

The Flask-SQLAlchemy extension doesn't currently support passing in external classes for Base models. This is on the roadmap, but it might be a while. As a result, if you use that extension, you have to use their base models, which in turn (IIRC) require a Flask app context to be present. So for celery/external CLI, you can either:

  1. always push an app context, it's pretty simple, one stackoverflow answer comes to mind, but I couldn't find the link with a quick search. I can dig it up if you want. Also Miguel Grinberg has a couple of blog posts on this.
  2. Decouple Flask from your SQLAlchemy models by not using the Flask-SQLAlchemy extension, I think @dgilland's Alchy/Flask-Alchy might come in handy here, but I haven't used them myself.

Hopefully this is helpful, let me know if more questions. It'll be the week after next that I can likely dig into code on this, as I expect teh first week I get back to be full of day job stuff. But I am more than happy to review PRs, there's subtle shortcuts like one() vs one_or_none() vs scalar() that aren't necessary for making the code work, but do help with keeping the code clean.

Contributor

jeffwidman commented Jun 3, 2016

Sorry I've been busier than I expected at Pycon.

Peewee is a small library and can be very cohesive and very tight as a result.

I really like Peewee for a certain class of problems, for exactly this reason. Huge thanks to you @coleifer for keeping a clear vision for the project.

Obviously I agree with @arikfr that redash as a project is getting to the point where having the extra power of SQLAlchemy is worth the additional complexity.

@arikfr here's my goto list of SQLAlchemy articles:

  1. http://alextechrants.blogspot.com/2013/11/10-common-stumbling-blocks-for.html
  2. The SQLAlchemy tutorial is generally all folks need to get started. Be sure to do both the Core and the ORM tutorials.
  3. @miguelgrinberg's tutorial has a good walkthrough of Flask + SQLAlchemy, including things like backreferences and lazy vs eager joins.
  4. Flask-specific, the http://derrickgilland.com/posts/demystifying-flask-sqlalchemy/ is quite good.

A few SQLAlchemy concepts that often are slightly less intuitive:

a. The session, which is basically an additional transaction layer, but on the Python side so that SQLAlchemy can queue/modify DB records locally before they get passed to the DB.
b. The concept of SQLAlchemy core which is much more performant and flexible than the ORM layer, with the tradeoff that you're effectively writing SQL constructs (with some shortcuts) rather than handling objects.
c. The difference between the DataMapper pattern vs the Active Record pattern. Understanding this isn't necessary in order to use SQLAlchemy, but it can be helpful. For example, this design decision made it possible to have the ORM layer built on top of a separate core layer.

The Flask-SQLAlchemy extension doesn't currently support passing in external classes for Base models. This is on the roadmap, but it might be a while. As a result, if you use that extension, you have to use their base models, which in turn (IIRC) require a Flask app context to be present. So for celery/external CLI, you can either:

  1. always push an app context, it's pretty simple, one stackoverflow answer comes to mind, but I couldn't find the link with a quick search. I can dig it up if you want. Also Miguel Grinberg has a couple of blog posts on this.
  2. Decouple Flask from your SQLAlchemy models by not using the Flask-SQLAlchemy extension, I think @dgilland's Alchy/Flask-Alchy might come in handy here, but I haven't used them myself.

Hopefully this is helpful, let me know if more questions. It'll be the week after next that I can likely dig into code on this, as I expect teh first week I get back to be full of day job stuff. But I am more than happy to review PRs, there's subtle shortcuts like one() vs one_or_none() vs scalar() that aren't necessary for making the code work, but do help with keeping the code clean.

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr Jun 10, 2016

Member

@jeffwidman thank you for all the pointers!

After having a conversation about this, I was given an interesting idea: we can add Alembic to the project before rewriting all the models/database code to use SQLAlchemy. While it introduces some duplication (mainly in used packages), it allows us to introduce proper migrations before completing the risky refactor which will probably take some time to get right & test.

What do you think?

Member

arikfr commented Jun 10, 2016

@jeffwidman thank you for all the pointers!

After having a conversation about this, I was given an interesting idea: we can add Alembic to the project before rewriting all the models/database code to use SQLAlchemy. While it introduces some duplication (mainly in used packages), it allows us to introduce proper migrations before completing the risky refactor which will probably take some time to get right & test.

What do you think?

@jeffwidman

This comment has been minimized.

Show comment
Hide comment
@jeffwidman

jeffwidman Jun 10, 2016

Contributor

If that's possible, it definitely makes sense. I've never tried using Alembic on a non-SQLAlchemy-based project.

Contributor

jeffwidman commented Jun 10, 2016

If that's possible, it definitely makes sense. I've never tried using Alembic on a non-SQLAlchemy-based project.

@arikfr

This comment has been minimized.

Show comment
Hide comment
@arikfr

arikfr Jun 14, 2016

Member

I think we reached a conclusion. Closing in favor of #1123 and #1124.

Member

arikfr commented Jun 14, 2016

I think we reached a conclusion. Closing in favor of #1123 and #1124.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment