Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dependency tracking on VirtualSourceObjects #1007

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

dairiki
Copy link
Contributor

@dairiki dairiki commented Mar 1, 2022

I think I finally sussed out what was causing the unnecessary rebuilds described in #959. (And it was not exactly was I was guessing in that PR.)

(WIP - more later.)

Issue(s) Resolved

The alt of virtual source dependencies is not being recorded in the buildstate database

The artifacts table in the buildstate database tracks artifacts dependencies on two types of sources:

  • source files
  • VirtualSourceObjects

It identifies those sources with the string-valued column path.

Source files are tracked by setting path to the file-system path of the source file.
Currently, virtual sources are tracked by setting path to VirtualSourceObject.path — i.e. the Lektor db-path of the virtual source.

The issue is that the path is not the full identity key for a VirtualSourceObject — virtual sources vary with alt as well as path.

As a concrete example, what happens in the issues described in #959 is (e.g.), when building the German page for the first blog post (artifact = /de/blog/first-post/index.html), post.get_siblings() is called which registers a dependency on the Siblings virtual source object attached to the post. Since the post has alt="de", this is the virtual source that may be obtained by calling pad.get("/blog/first-post@siblings", alt="de"). Currently, however, the dependency gets recorded only by its path (/blog/first-post@siblings). When the builder, checking for changes that would necessitate a rebuild, goes to check the current state of the virtual source, it fetches pad.get("/blog/first-post@siblings") with no alt specified. This is a different object — instead of alt="de" it has the default alt (en, or _primary before #958) — so it always compares as changed, forcing an unconditional rebuild.

Siblings.get_checksum is being overly-pessimistic

The state of a page's Siblings only depends on the identity of the siblings. As long as the next sibling of /blog/first-post is /blog/blog-post-2 and the prev sibling is None, then the /blog/first-post@siblings virtual source object can be considered unchanged, even if the content of /blog/blog-post-2 changes. (If, while building an artifact, data from that next page is accessed, then that next page is/should be recorded as a dependency, ensuring the artifact gets rebuild if the next page is touched.)

Currently, the Siblings.get_checksum and Siblings.get_mtime compute values dependent on the source files for the siblings. That is overly pessimistic.

Record.__getitem__ should record a dependency on record

I think the reason Siblings.get_checksum (tries to) depend on the contents of the siblings is because this is not currently being done.

I have now implemented this. In testing the build time of the getlektor.com website, the change does not seem to affect build time in any noticeable way.

However, fixing this has made every page depend on its complete lineage (its parent, grandparent, ...)
Strictly speaking, this is correct. The lineage — e.g. by setting custom slugs — can affect the URL of a page. Almost every page uses the url filter in its template. By default, url returns a relative URL. Computing a relative URL depends on the URL of the page being built.
I'm not sure if this will be a big deal in terms of forcing more rebuilds that (most of the time) aren't necessary.

Supercedes #959

Related Issues / Links

Description of Changes

  • Wrote at least one-line docstrings (for any new functions)
  • Added unit test(s) covering the changes (if testable)
  • Included a screenshot or animation (if affecting the UI, see Licecap)
  • Link to corresponding documentation pull request for getlektor.com

This PR munges things in a minimal way so as to store both the alt and the path of VirtualSourceObjects in the artifacts.path column.

It also cleans up the checksum computation for the Siblings virtual source object.

@dairiki dairiki force-pushed the bug.virtual-source-dependency-tracking branch from 9c76795 to f31270e Compare March 1, 2022 04:13
@dairiki dairiki force-pushed the master branch 2 times, most recently from 4aaae00 to 1999510 Compare March 1, 2022 16:43
@dairiki dairiki force-pushed the bug.virtual-source-dependency-tracking branch 2 times, most recently from f6769f3 to 97d0417 Compare March 1, 2022 22:19
@dairiki dairiki force-pushed the bug.virtual-source-dependency-tracking branch from 97d0417 to 2e41c18 Compare March 3, 2022 05:48
@dairiki dairiki mentioned this pull request Nov 24, 2022
2 tasks
It should return the both the alt-specific and the fallback .lr
filenames (in addition to the attachment filename.)
@dairiki dairiki mentioned this pull request Feb 26, 2023
2 tasks
The SourceObject.source_filename property now just returns the first
of the filenames returned by the iter_source_filenames method.
Both Records and VirtualSourceObjects are described by the identity
key (path, alt, pad).  That is, if

   obj2 = pad.get(obj.path, alt=obj.path)

then

   obj2 == obj

Records already had __eq__ and __hash__ methods which made this
explicit.  Here we add those methods for VirtualSourceObjects.

Note this allows us to convert Context.referenced_virtual_dependencies
from a dict to a set.  (The key was only being used as an identity key
for the VirtualSourceObjects.)
The content of the Siblings virtual source object only depends on
the identity of the siblings, not on the siblings' content.  I.e.
the Siblings object should only be considered changed when the
identity of one or both of the siblings is changed.

(Any artifact which in fact uses data from one of the siblings
will also register a dependency on that sibling which will trigger
a rebuild if the sibling is touched.)
Strictly speaking, a sources url_path depends on its complete lineage,
since any of them may, e.g., set a custom slug.

This means, generally, that any page whose template uses the `url`
filter (i.e. pretty much every page) depends on all of the sources
lineage (since we need to know the artifact's URL in order to compute
the relative URL to the target.)

While this is correct, I'm not sure what effect this change will have
on performance in general.
@dairiki dairiki force-pushed the bug.virtual-source-dependency-tracking branch from 2e41c18 to 80baafb Compare February 26, 2023 22:11
@dairiki
Copy link
Contributor Author

dairiki commented Feb 26, 2023

I've just extracted a bunch of what was here to PR #1105, and have rebased this PR onto that one.

dairiki added a commit to dairiki/lektor that referenced this pull request Feb 27, 2023
This exercises the issue with dependencies on VirtualSourceObjects
when alts are enable that is described in lektor#1007 and lektor#959.
@dairiki dairiki added this to the 3.4 milestone Feb 28, 2023
dairiki added a commit that referenced this pull request Mar 8, 2023
* test(builder): add test for virtual source dependency tracking issue

This exercises the issue with dependencies on VirtualSourceObjects
when alts are enable that is described in #1007 and #959.

* The identity of VirtualSourceObjects depends on their alt.

* refactor: make VirtualSourceInfo a dataclass

* refactor: add is_changed method to FileInfo and VirtualSourceInfo

* test: exclude pure virtual methods from coverage testing
dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
* test(builder): add test for virtual source dependency tracking issue

This exercises the issue with dependencies on VirtualSourceObjects
when alts are enable that is described in lektor#1007 and lektor#959.

* The identity of VirtualSourceObjects depends on their alt.

* refactor: make VirtualSourceInfo a dataclass

* refactor: add is_changed method to FileInfo and VirtualSourceInfo

* test: exclude pure virtual methods from coverage testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant