New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dependency tracking on VirtualSourceObjects #1007
Draft
dairiki
wants to merge
9
commits into
lektor:master
Choose a base branch
from
dairiki:bug.virtual-source-dependency-tracking
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Fix dependency tracking on VirtualSourceObjects #1007
dairiki
wants to merge
9
commits into
lektor:master
from
dairiki:bug.virtual-source-dependency-tracking
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dairiki
force-pushed
the
bug.virtual-source-dependency-tracking
branch
from
March 1, 2022 04:13
9c76795
to
f31270e
Compare
dairiki
force-pushed
the
master
branch
2 times, most recently
from
March 1, 2022 16:43
4aaae00
to
1999510
Compare
dairiki
force-pushed
the
bug.virtual-source-dependency-tracking
branch
2 times, most recently
from
March 1, 2022 22:19
f6769f3
to
97d0417
Compare
Closed
3 tasks
dairiki
force-pushed
the
bug.virtual-source-dependency-tracking
branch
from
March 3, 2022 05:48
97d0417
to
2e41c18
Compare
It should return the both the alt-specific and the fallback .lr filenames (in addition to the attachment filename.)
2 tasks
The SourceObject.source_filename property now just returns the first of the filenames returned by the iter_source_filenames method.
Both Records and VirtualSourceObjects are described by the identity key (path, alt, pad). That is, if obj2 = pad.get(obj.path, alt=obj.path) then obj2 == obj Records already had __eq__ and __hash__ methods which made this explicit. Here we add those methods for VirtualSourceObjects. Note this allows us to convert Context.referenced_virtual_dependencies from a dict to a set. (The key was only being used as an identity key for the VirtualSourceObjects.)
The content of the Siblings virtual source object only depends on the identity of the siblings, not on the siblings' content. I.e. the Siblings object should only be considered changed when the identity of one or both of the siblings is changed. (Any artifact which in fact uses data from one of the siblings will also register a dependency on that sibling which will trigger a rebuild if the sibling is touched.)
Strictly speaking, a sources url_path depends on its complete lineage, since any of them may, e.g., set a custom slug. This means, generally, that any page whose template uses the `url` filter (i.e. pretty much every page) depends on all of the sources lineage (since we need to know the artifact's URL in order to compute the relative URL to the target.) While this is correct, I'm not sure what effect this change will have on performance in general.
dairiki
force-pushed
the
bug.virtual-source-dependency-tracking
branch
from
February 26, 2023 22:11
2e41c18
to
80baafb
Compare
I've just extracted a bunch of what was here to PR #1105, and have rebased this PR onto that one. |
2 tasks
dairiki
added a commit
to dairiki/lektor
that referenced
this pull request
Feb 27, 2023
This exercises the issue with dependencies on VirtualSourceObjects when alts are enable that is described in lektor#1007 and lektor#959.
dairiki
added a commit
that referenced
this pull request
Mar 8, 2023
* test(builder): add test for virtual source dependency tracking issue This exercises the issue with dependencies on VirtualSourceObjects when alts are enable that is described in #1007 and #959. * The identity of VirtualSourceObjects depends on their alt. * refactor: make VirtualSourceInfo a dataclass * refactor: add is_changed method to FileInfo and VirtualSourceInfo * test: exclude pure virtual methods from coverage testing
dairiki
added a commit
to dairiki/lektor
that referenced
this pull request
Sep 11, 2023
* test(builder): add test for virtual source dependency tracking issue This exercises the issue with dependencies on VirtualSourceObjects when alts are enable that is described in lektor#1007 and lektor#959. * The identity of VirtualSourceObjects depends on their alt. * refactor: make VirtualSourceInfo a dataclass * refactor: add is_changed method to FileInfo and VirtualSourceInfo * test: exclude pure virtual methods from coverage testing
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I think I finally sussed out what was causing the unnecessary rebuilds described in #959. (And it was not exactly was I was guessing in that PR.)
(WIP - more later.)
Issue(s) Resolved
The
alt
of virtual source dependencies is not being recorded in the buildstate databaseThe
artifacts
table in the buildstate database tracks artifacts dependencies on two types of sources:It identifies those sources with the string-valued column
path
.Source files are tracked by setting
path
to the file-system path of the source file.Currently, virtual sources are tracked by setting
path
toVirtualSourceObject.path
— i.e. the Lektor db-path of the virtual source.The issue is that the path is not the full identity key for a VirtualSourceObject — virtual sources vary with
alt
as well aspath
.As a concrete example, what happens in the issues described in #959 is (e.g.), when building the German page for the first blog post (artifact =
/de/blog/first-post/index.html
),post.get_siblings()
is called which registers a dependency on theSiblings
virtual source object attached to the post. Since the post hasalt="de"
, this is the virtual source that may be obtained by callingpad.get("/blog/first-post@siblings", alt="de")
. Currently, however, the dependency gets recorded only by its path (/blog/first-post@siblings
). When the builder, checking for changes that would necessitate a rebuild, goes to check the current state of the virtual source, it fetchespad.get("/blog/first-post@siblings")
with no alt specified. This is a different object — instead ofalt="de"
it has the default alt (en
, or_primary
before #958) — so it always compares as changed, forcing an unconditional rebuild.Siblings.get_checksum is being overly-pessimistic
The state of a page's
Siblings
only depends on the identity of the siblings. As long as the next sibling of/blog/first-post
is/blog/blog-post-2
and the prev sibling isNone
, then the/blog/first-post@siblings
virtual source object can be considered unchanged, even if the content of/blog/blog-post-2
changes. (If, while building an artifact, data from that next page is accessed, then that next page is/should be recorded as a dependency, ensuring the artifact gets rebuild if the next page is touched.)Currently, the
Siblings.get_checksum
andSiblings.get_mtime
compute values dependent on the source files for the siblings. That is overly pessimistic.Record.__getitem__ should record a dependency on record
I think the reason Siblings.get_checksum (tries to) depend on the contents of the siblings is because this is not currently being done.
I have now implemented this. In testing the build time of the getlektor.com website, the change does not seem to affect build time in any noticeable way.
However, fixing this has made every page depend on its complete lineage (its parent, grandparent, ...)
Strictly speaking, this is correct. The lineage — e.g. by setting custom slugs — can affect the URL of a page. Almost every page uses the
url
filter in its template. By default,url
returns a relative URL. Computing a relative URL depends on the URL of the page being built.I'm not sure if this will be a big deal in terms of forcing more rebuilds that (most of the time) aren't necessary.
Supercedes #959
Related Issues / Links
Description of Changes
This PR munges things in a minimal way so as to store both the
alt
and thepath
ofVirtualSourceObjects
in theartifacts.path
column.It also cleans up the checksum computation for the
Siblings
virtual source object.