db.insert is significantly slower than simple enity creation #84

Closed
socketpair opened this Issue Oct 14, 2014 · 4 comments

Projects

None yet

2 participants

@socketpair

Example code:

with db_session:
    for i in xrange(10000):
        rec = { ... }
        # TableName(**rec)
        db.insert('tablename', **rec)

Documentation say that this should be faster. http://doc.ponyorm.com/database.html?highlight=insert

@kozlovsky kozlovsky closed this in 0e2b396 Oct 28, 2014
@kozlovsky
Contributor

Fixed, now db.insert() works about 2.5 times faster then entity creation

@socketpair

Whta was the cause of that? did not understand by your commit...As I understand, each insert was done in separate transaction...

@kozlovsky
Contributor

A database connection can be opened in transactional mode or in autocommit mode. When the connection is in transactional mode, then the first SQL command starts a new transaction, and the subsequent SQL commands belongs to the same transaction, until commit() or rollback() is executed.

Some ORMs (such as Django) use autocommit mode by default. But in autocommit mode each UPDATE performs in a separate transaction, and with Pony we want that all UPDATEs which are executed within the same db_session belongs to the same transaction, otherwise database content can become inconsistent and logically corrupted.

But if the db_session consist of SELECT commands only (that is, we only read information from the database and don't write any changes), then for some DBMSes (specifically, for PostgreSQL and SQLite) it is beneficial to perform all SELECTs in autocommit mode, and switch to transactional mode only when first UPDATE/INSERT/DELETE command is about to be send to the database.

For PostgreSQL this is beneficial because this way we can save round-trips to the database on BEGIN..COMMIT/ROLLBACK commands. There is no difference between autocommit and transactional modes until we only read from the database, since PostgreSQL transactions use the READ COMMITTED isolation level by default.

For SQLite the Pony behavior is beneficial because it allows other processes read from the database concurrently. If the Pony would use transactional mode from the very beginning of the db_session in SQLite, then you would get OperationalError: database is locked errors on high-concurrency load. But Pony postpone opening transaction in SQLite until it absolutely needed, and then open it in EXCLUSIVE mode, thus avoiding all of the possible problems with lock escalation.

In order to maintain this logic, Pony database session internally have immediate flag which signalizes that Pony should switch from autocomit to transactional mode before the next SQL command is sent to the database. But before this fix, this immediate flag was not set correctly when db.insert() method was executed. Because of this, INSERTs were executed in autocommit mode, and each INSERT was done in a separate transaction. Now this is fixed. Now, if db.insert() method is executed, Pony starts a new transaction (if it was not already started) and remain in the same transaction until we exit from db_session or perform manual commit()/rollback()

The transactional logic in Pony is pretty complex, but this is just because the topic is hard. For example. when connection to the database is lost, Pony analyzes the current state of the transaction. If only SELECTs were performed earlier in this transaction, and the connection uses the default database isolation level, then Pony silently open new connection and continues its operation. But if some UPDATEs were already sent through old connection, then Pony understands that it does not make sense to continue operations on new connection, because the content of previous UPDATEs was lost. In this case, Pony trows appropriate exception. The end result is that Pony tries to work with the database in the most intelligent and efficient way possible.

@socketpair

You definitely should write this (and other "internal" things) to documentation. Text you are writing is clear and simple for understanding. Very good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment