Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement some sort of caching mechanism #33

Closed
4 tasks
johndgiese opened this issue Jul 6, 2022 · 0 comments
Closed
4 tasks

Implement some sort of caching mechanism #33

johndgiese opened this issue Jul 6, 2022 · 0 comments
Assignees

Comments

@johndgiese
Copy link
Contributor

johndgiese commented Jul 6, 2022

Figure out some way to minimize the number of API requests needed when pulling down a bunch of different files.

Requirements:

  • The cache MUST be invalidated if any notion data changes.
  • n2y must provide a mechanism to invalidate the cache.
  • The cache has to work across multiple executions of n2y.
  • The cache has to work okay even if there are multiple n2y processes running at the same time.
  • The cache should be stored in a directory called .n2y

Design notes:

  • It seems like it would be nice if the caching occurred at the HTTP request level, that way we could iterate on n2y's internals during n2y development without having to invalidate the cache too often. This is a common thing that comes up during development, since it's often nice to do test runs on a large notion set of pages that can take several minutes to pull down. During development there is often a bug in the new n2y feature that may require you to pull down everything again. This problem will also be an issue for plugin developers, and so it isn't likely to go away even once the core n2y is stable.

  • Notion provides "last_modified" timestamps for pages and blocks. It's not clear to me what exactly will modify the "last_modified" timestamp of a page. E.g., will modifying a sub page also change the last modified timestamp of a parent page? I suspect it won't and that ONLY modifying a block in a page will touch it's timestamp. This should be explored up front, along with any other timestamps or caching mechanisms in the API (if any), as this will make it clear what options we have to implement the caching.

  • To begin with, it's likely sufficient if we can just cache the pages and avoid retrieving their sub blocks if the page's "last_modified" timestamp hasn't changed.

Steps:

  • Explore and document the behavior of the various timestamps/hashes/etc. available for caching decisions
  • Document a possible design and a verification strategy (e.g., probably an end-to-end test that ensures the cache is used during a second run)
  • Discuss with David
  • Implement
@johndgiese johndgiese added this to the Current milestone Aug 10, 2022
@johndgiese johndgiese removed this from the Current milestone Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants