Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: TinyDB v4.0.0 #284

Closed
msiemens opened this issue Oct 12, 2019 · 12 comments
Closed

Proposal: TinyDB v4.0.0 #284

msiemens opened this issue Oct 12, 2019 · 12 comments
Assignees
Labels
design discussion pinned

Comments

@msiemens
Copy link
Owner

msiemens commented Oct 12, 2019

Lately, I've been thinking about how to move TinyDB forward and what the next steps are. And the more I thought about it, the more I became convinced, that TinyDB needs a v4.0.0.

Motivation

Why would we want a TinyDB v4.0? Why introducing a backwards-incompatible releases? In my view, the reasons to publish a new major release is threefold:

  • Remove deprecated functionality that's been waiting to be removed for more than two years now.
  • Fix design issues that have been introduced by a lack of vision for extension mechanisms in TinyDB.
  • Simplify the architecture in order to fix other issues that cannot be solved without breaking backwards compatibility.

To elaborate on these reasons:

Deprecations

TinyDB is 6 years old now. The first stable release (v1.0.0) was published in July 2013. A year later in September 2014 there has been a major release (v2.0.0) that changed the data format and improved the API. Again a year later in November 2015, the next major release (v3.0.0) cleaned up the query language architecture and started moving non-core functionality to external packages.

Version 3.0.0 is now almost 4 years old. In the meantime TinyDB continued to evolve, including shifting the terminology from elements to documents in v3.6.0 and the deprecation of ujson support. Both of these changes have been major cleanups, but there hasn't been a major release of TinyDB that would actually get rid of the deprecated features. This results in cluttered code which makes it harder to understand the TinyDB software design/architecture.

In addition, Python 2, which TinyDB supports, will reach its end of life at the end of 2019. As TinyDB has quite a few places where it has to do extra work to support both Python 2 and 3 from the same code base, dropping Python 2 support would simplify the code even further.

TinyDB v4.0.0 would simplify the source code by removing deprecated features and in turn make it easier to understand the source and to develop one's own extensions. In addition, only Python 3 would be supported.

Extension Mechanisms

Right from the start, there have been two ways to extend TinyDB: Custom Storages and Custom Middlewares. As the popularity and usage of TinyDB increased, so did requests to make it possible to extend other parts of TinyDB. Thus, Custom Table Classes and Custom Storage Proxy Classes have been added. In addition, mechanisms to modify the default table class name and parameters as well as the default storage class have been introduced. As as result there are no less than seven places where TinyDB's behaviour can be modified.

Except for the first two, all extension mechanisms have been introduced as a result of user requests. At the time of each request, it seemed to be the best option to follow the path of least resistance when adding a new extension mechanism, refraining from any soft of breaking changes. But looking back it is apparent, that there was no real concept of how extending TinyDB should work in general.

TinyDB v4.0.0 would remove all extension mechanisms except for Custom Storages. All other extension mechanisms would be replaced by a unified extension concept as detailed below.

Architecture & API

To be honest, I'm not particularly proud about TinyDB's internal software architecture. As TinyDB evolved gradually, often simplicity of the software architecture was neglected. Now we're in a state, where there's a lot of unneeded indirection. Data access uses up to 5 classes, two of which are some form of proxy class: TinyDBTableStorageProxyDocumentProxyStorage. This makes TinyDB's source code complicated and also impacts performance (see #250). Fixing these design issues requires rearchitecting TinyDB. But this in turn requires breaking backwards compatibility as some extension mechanisms rely on the old software architecture.

Additionally, there's been discussion about inconsistencies in TinyDB's API regarding purging data (see #103). Removing these inconsistencies would break backwards compatibility.

TinyDB v4.0.0. would simplify the internal software architecture and remove inconsistencies from its API, making it easier to understand how TinyDB works and thus making it easier to extend.

Proposals

Deprecations

For the reasons outlined above, I propose to

  • Put TinyDB v3 into maintenance mode, implementing only bug fixes but not adding new features,
  • Remove all deprecated features,
  • Drop Python 2 support

Extension Mechanisms

I propose to replace all existing extension mechanisms with Custom Storages and Inheritance. Custom Storages continue to be a useful extension mechanism that is difficult to replicate by other means. In addition to Custom Storages, the main way to extend TinyDB and to modify its behaviour would be inheritance – to create subclasses of TinyDB. A famous example of this approach is Flask:

The Flask class has many methods designed for subclassing. You can quickly add or customize behavior by subclassing Flask (see the linked method docs) and using that subclass wherever you instantiate an application class.

In addition, the Flask docs state:

As you grow your codebase, don’t just use Flask – understand it. Read the source. Flask’s code is written to be read; its documentation is published so you can use its internal APIs. Flask sticks to documented APIs in upstream libraries, and documents its internal utilities so that you can find the hook points needed for your project.

With the new extension approach, TinyDB would aim to follow the same path: Instead of adding new extension mechanisms endlessly, users would be encouraged to subclass TinyDB and other classes in order to modify, how TinyDB fetches the last ID, how it calculates the next ID, how the default table name is determined, and other behaviours.

Implementing this requires making useful internal TinyDB methods part of the public API and documenting them in a way that makes it easy to overload them with custom behaviour. The documentation should provide examples of what types of extensions are possible by subclassing. Also, the source code itself should have its code comments reworked to make it easy to understand how TinyDB works from the first reading of the source code (based on ideas like literate programming).

The main challenge of implementing this approach is to find the right balance of how much of TinyDB's internal methods should become part of the public API. Making too few methods part of the public API makes it difficult to modify all aspects of TinyDB's behaviour. But making too many methods part of the public API makes it difficult to continue to evolve TinyDB without breaking the existing API and existing extensions and in addition cluttering the public API too much.

My approach regarding which methods to include in the public API would be to be conservative and – at first – include too few methods rather than too many. The reason behind this is that it's possible to move more methods to the public API after the fact without breaking the existing API whereas the opposite would break existing usage.

Architecture & API Changes

I propose to simplify to rearchitect TinyDB to use the following classes:

  • TinyDB class
    • Create and manage tables
    • Forward calls to the default table
  • Table class
    • Receive a storage instance from the TinyDB class
    • Modify table data
    • Cache query results to avoid unneeded I/O
  • Storage class
    • Read and write data to a storage
  • Query class
    • Provide searching and filtering
  • Document class
    • Provide a thin wrapper around stored data that remembers the document's ID

There may be additional internal classes (such as the QueryImp and LRUCache classes we currently have), the classes outlined above should do the lion share of the work for TinyDB. All in all, the new architecture should provide a clear separation of concerns and responsibilities. Simplifying the architecture in this way would allow to fix issue #250 (StorageProxy read performance is abysmal). Also, a simple architecture makes it easy to understand how TinyDB works which in turn would impact how easy it is to extend TinyDB using inheritance (see above). In other words: Having a simpler architecture should make it easier to extend TinyDB.

In addition to architecture changes, I propose to also simplify TinyDB's API. For one thing, we could fix issue #103 (Inconsistency with purge functions) by making function names consistent between the TinyDB and Table classes. For another thing, I would propose to remove the write_back method as it complicates the API, probably is rarely used and can be implemented by subclassing, if needed. Also, I would like to make process_elements part of the internal API again as it's a core method of how data is manipulated and probably should not be modified by subclassing.

Feedback Requested

If you have thoughts, questions, comments or ideas regarding a possible TinyDB v4.0.0, especially regarding the proposals outlined above, feel free to comment and discuss on this issue 🙂

@msiemens msiemens added discussion design labels Oct 12, 2019
@msiemens msiemens self-assigned this Oct 12, 2019
@msiemens
Copy link
Owner Author

msiemens commented Oct 12, 2019

@eugene-eeo If you have a couple of minutes to spare, I'd love to hear your thoughts on this!

@msiemens msiemens pinned this issue Oct 13, 2019
@dcflachs
Copy link

dcflachs commented Oct 16, 2019

I love the idea of making TinyDB more straight forward to understand and extend. As a new user of TinyDB I found myself extending TinyDB almost immediately and to a certain extent scratching my head as to the best ways achieve what i needed to do. I still find myself wondering if i have chosen the most efficient path for some of my customization's. I do like the idea of extension by sub-classing, However i have found the current architecture of middle-wares to be quite useful. Particularly the ability to stack middlewares in such a way to maximize their efficiency. An example would be putting the middleware i use to convert a custom object to JSON below the caching middleware to minimize the number of time consuming encode/decode operations.

@aegiacometti
Copy link

aegiacometti commented Oct 23, 2019

@eugene-eeo
Copy link
Contributor

eugene-eeo commented Oct 24, 2019

@msiemens sorry I didn't notice the mention.

The plan looks good to me, seems like the easiest way may be to rewrite a lot of the Table classes and drop the StorageProxy-ies. I think what started out as a simple wrapper over a JSON file has had too many things added to it with less foresight from all of us back then.

Currently since I don't really catch up in Github I have less and less idea of what goes into the codebase, which means probably my verdict/opinion on this is not that good.

But it's always good when v4 has less code than v3. It would probably help out a lot of people who are just using the vanilla TinyDB w/o extensions, and those that do use extensions probably know how to work around the new changes anyways ;)

@msiemens
Copy link
Owner Author

msiemens commented Oct 26, 2019

Thanks for your feedback, everyone!


However i have found the current architecture of middle-wares to be quite useful. Particularly the ability to stack middlewares in such a way to maximize their efficiency. An example would be putting the middleware i use to convert a custom object to JSON below the caching middleware to minimize the number of time consuming encode/decode operations.

@dcflachs, thanks for this feedback! To be honest, I wasn't entirely sure myself whether to remove middlewares or to keep them. The biggest difficulty with this decision is the lack of insight on how often which extension mechanism is used in real-world usage. Your comment helps me understand how middlewares are useful to you (and I think to other users too).

My approach to extensions would then be having two layers of extension mechanisms: Basic extensions and deep customizations. For basic extensions one can implement custom storages and middlwares, but for anything deeper than this, one would have to resort to subclassing TinyDB, which is the most powerful way of customization.


Clean and simplify the code is always good, and dropping support for old python2 not only will empower the previous two statements, it will also allow to gain focus on future development because of not using time to think in backward compatibilities and related modifications.

Thanks for the encouragement, @AdrianChi!


The plan looks good to me, seems like the easiest way may be to rewrite a lot of the Table classes and drop the StorageProxy-ies. I think what started out as a simple wrapper over a JSON file has had too many things added to it with less foresight from all of us back then.

I agree, @eugene-eeo! It's been a while since TinyDB started and in hindsight it's easy to see what we could have done better. But with v4.0 we have the opportunity to actually implement these improvements 🙂

It would probably help out a lot of people who are just using the vanilla TinyDB w/o extensions, and those that do use extensions probably know how to work around the new changes anyways ;)

Yes, this is my assumption too. Making TinyDB simpler makes it easier for more advanced users to completely customize TinyDB's behavior as the need.


Overall I think there's fair to say that there's good support on the changes as proposed (including the update on middlewares as mentioned in this comment), so I'll go ahead and try to find some time to implement them in the next few feeks 🙂 Thank you everyone ❤️

msiemens added a commit that referenced this issue Nov 2, 2019
msiemens added a commit that referenced this issue Nov 2, 2019
msiemens added a commit that referenced this issue Nov 2, 2019
msiemens added a commit that referenced this issue Nov 2, 2019
See rationale and details in #284

Closes #250
Closes #103
msiemens added a commit that referenced this issue Nov 15, 2019
See _Extension Mechanisms_ described in #284
@stale

This comment has been minimized.

@stale stale bot added the stale label Nov 25, 2019
@msiemens msiemens added pinned and removed stale labels Nov 25, 2019
@darrickyee
Copy link

darrickyee commented Feb 1, 2020

Thanks so much for this convenient and easy-to-use library.

One thing that might be helpful for 4.0+ would be the ability to set custom functions for assigning doc_ids. It seems it can be currently done by subclassing Table, but the code for generating them is spread out a bit. Also I think it's necessary to create a custom StorageProxy if you don't want it to be an int?

One idea would be to pass an id-generating callback as an optional parameter to the Table constructor, although I haven't thought about it too deeply yet.

This could allow, for example, users to implement some basic ORM-like behavior by importing primary keys from a SQL database; or using uuids as keys; or some basic foreign-key & join functionality (e.g., doc_ids as tuples whose elements must be doc_ids in another table).

Or maybe there's already some straightforward way of doing this that I've missed...

@msiemens
Copy link
Owner Author

msiemens commented Feb 5, 2020

That's already included in TinyDB 4 🙃

The new design includes a Table field called document_id_class which is used to convert stringified IDs to the IDs used by TinyDB. If you set document_id_class to a custom function, you can use your own data type that is used to represent IDs 🙂

The requirement to store IDs as strings comes from the fact that the JSON format required keys to be strings, so for storage IDs will always be stored as strings, but when creating document objects, TinyDB 4 will use Table.document_id_class to convert them.

@msiemens
Copy link
Owner Author

msiemens commented Feb 5, 2020

I hope to complete the new version in the next couple of weeks but I'm really busy at the moment with other stuff so I can't say for sure when it will be done.

@darrickyee
Copy link

darrickyee commented Feb 7, 2020

Oh wow, great!

@nisanb
Copy link

nisanb commented Feb 11, 2020

+1 for waiting for a new release ! 👍

@msiemens
Copy link
Owner Author

msiemens commented May 2, 2020

TinyDB 4 is now released 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion pinned
Projects
None yet
Development

No branches or pull requests

6 participants