Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[replit] The lesser of two evils #167

Merged
merged 5 commits into from Feb 27, 2024

Conversation

lhchavez
Copy link
Contributor

@lhchavez lhchavez commented Sep 6, 2023

Why

Currently the replit library has a very gross quirk: it has a global in replit.database.default_db.db, and the mere action of importing this library causes side effects to run! (connects to the database, starts a thread to refresh the URL, and prints a warning to stdout, adding insult to injury).

What changed

So we're trading that very gross quirk with a gross workaround to preserve backwards compatibility: the modules that somehow end up importing that module now have a __getattr__ that lazily calls the code that used to be invoked as a side-effect of importing the library. Maybe in the future we'll deploy a breaking version of the library where we're not beholden to this backwards-compatibility quirck.

Test plan

replit-py/src$ python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import replit
>>> from replit.database import db
WARNING:root:Warning: error initializing database. Replit DB is not configured.
>>> repr(db)
'None'
>>> 

Rollout

  • This is fully backward and forward compatible

@lhchavez lhchavez requested a review from a team as a code owner September 6, 2023 02:20
@lhchavez lhchavez requested review from airportyh and removed request for a team September 6, 2023 02:20
@lhchavez lhchavez force-pushed the lh-replit-the-lesser-of-two-evils branch 3 times, most recently from 582dafb to aad6505 Compare September 6, 2023 03:04
Copy link
Contributor

@airportyh airportyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@blast-hardcheese
Copy link
Collaborator

Hah! I started to write this very changeset. Is there any more testing to do here @lhchavez, or do you want to rebase and we can get this in and cut a minor release?

@blast-hardcheese blast-hardcheese added the enhancement New feature or request label Feb 23, 2024
Currently the replit library has a very gross quirk: it has a global in
`replit.database.default_db.db`, and the mere action of importing this
library causes side effects to run! (connects to the database, starts a
thread to refresh the URL, and prints a warning to stdout, adding insult
to injury).

So we're trading that very gross quirk with a gross workaround to
preserve backwards compatibility: the modules that somehow end up
importing that module now have a `__getattr__` that _lazily_ calls the
code that used to be invoked as a side-effect of importing the library.
Maybe in the future we'll deploy a breaking version of the library where
we're not beholden to this backwards-compatibility quirck.
@blast-hardcheese
Copy link
Collaborator

blast-hardcheese commented Feb 24, 2024

I made some changes:

  • reintroduced a stub for the refresh_db() method
  • Added get_db() and get_db_url() classmethods on LazyDB to avoid LazyDB.get_instance().db/.db_url pattern. That's not an API surface I would feel comfortable maintaining long term.

@blast-hardcheese blast-hardcheese force-pushed the lh-replit-the-lesser-of-two-evils branch 3 times, most recently from 7cc70b6 to f5a6d0e Compare February 24, 2024 09:34
@blast-hardcheese
Copy link
Collaborator

Continuation from #167 (comment),

The more I looked at LazyDB the more I disliked how we had two halves of a Database class, plus lots of singleton indirection threaded through... it wasn't pleasant, and would almost certainly not be stable as we continue to iterate.

I reverted the surface area to what it was before, save with one alteration.

Internally, we only ever use the __getattr__ indirection, avoiding holding a binding all the way up to default_db, which holds its binding in _db.
If users decide to maintain a reference to db in their own code, this is not an issue, since the whole point here was to defer connection until the user required it.

Another issue was that of close(). I now pass unbind=... as a parameter to Database and AsyncDatabase such that when their close() methods are called, the _db global in default_db is cleared. The next time replit.db is accessed, it will reinitialize and reconnect, but this permits a less clunky resource management story.

Finally, a complaint I've seen around is that the Python interpreter will hang if the database has been connected to, since we threading.Timer(...).start(), throwing away the intermediate object that had the corresponding .cancel(). Instead, Database and AsyncDatabase can maintain their own DB URLs, if they receive a get_db_url callable passed in their initializers.

This was the summary of trying to get some thoughts out of my head before the weekend, I still need to revisit the tests, add missing documentation, and probably rename unbind to cleanup or post_cancel or something less prescriptive.

@blast-hardcheese
Copy link
Collaborator

@airportyh Dismissed your initial review since I'm taking a different approach from Luis' initial implementation.

The changes here attempt to maximize compatibility with the existing API without adding too much to the surface area of the library.

There's an additional improvement which was not reflected in the original implementation, being that when the Database is closed, the threading.Timer is closed and the default db property is reset back to None. This means the subsequent accesses of that property will reestablish and reconnect the database object automatically.

An issue with LazyDB is that the refresh_db timer would not get canceled
if the user closes the database.

Additionally, the db_url refresh logic relies on injection, whereas the Database should ideally be the thing requesting that information from the environment
@@ -4,7 +4,7 @@

from flask import Blueprint, Flask, request

from .default_db import db
from . import default_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even possible to use a db other than the default_db?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! As an example, we use it in some CI tests internally. An example is as follows (I added a print in the default DB initializer):

>>> import os
>>> from replit.database import Database
>>> db = Database(os.environ.get("REPLIT_DB_URL"))
>>> db.set('foo', 'bar')
>>> db.get('foo')
'bar'
>>> import replit
>>> replit.db
Initializing default db!
<Database(db_url=...)>

You can see in this example that the initializer for the default database is only triggered on certain property accessors.

Copy link
Contributor

@airportyh airportyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup!

@blast-hardcheese blast-hardcheese merged commit 5c0ab7a into master Feb 27, 2024
6 checks passed
@blast-hardcheese blast-hardcheese deleted the lh-replit-the-lesser-of-two-evils branch February 27, 2024 22:01
@blast-hardcheese blast-hardcheese added the minor Bump minor version label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor Bump minor version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants