Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix semantics of Record equality #1105

Merged
merged 4 commits into from Mar 8, 2023

Conversation

dairiki
Copy link
Contributor

@dairiki dairiki commented Feb 26, 2023

These are bits extracted from the stale draft PR #1007.

It changes/fixes the equality semantics for Records and VirtualSourceObjects so that they compare equal only if all of the following match:

  • __class__
  • path (which includes page_num in the case of Pages)
  • alt
  • pad

Previously only __class__ and _path were compared, thus records with differing page_num, alt, or pad were comparing as equal.

In general, a record is obtained by calling pad.get(path, alt, page_num) and a "different" record obtains if any of pad, path, alt, or page_num is changed.

Other Changes

All SourceObjects should now implement iter_source_filenames

In general, the data comprising a SourceObject can come from multiple source files.
This happens for Pages when alts are in use. It happens for attachments even when alts are not being used: an Attachment depends on both the attachment file and its corresponding metadata .lr file.

I suspect that we should deprecate the SourceObject.source_filename property altogether. The primary use of source_filename or iter_source_filenames is for dependency tracking — in that case, we always want to track all (potential) source files. In cases where a source object has multiple source files, 'm not sure the concept of a "primary" source file makes sense.

New class: DBSourceObject

This PR introduces a DBSourceObject class. It is a subclass of SourceObject and serves as a superclass of all the object types that can be returned from Pad.get.
(The fixed equality comparison is implemented for DBSourceObject.)

The new class hierarchy (for classes in the Lektor source which derive from SourceObject) is

classDiagram
    SourceObject <|-- DBSourceObject
   DBSourceObject <|-- Record
   Record <|-- Page
   Record <|-- Attachment
   Attachment <|-- Image
   Attachment <|-- Video
   DBSourceObject <|-- VirtualSourceObject
   VirtualSourceObject <|-- Siblings
    SourceObject <|-- Asset
    Asset <|-- Directory
    Asset <|-- File

Issue(s) Resolved

Fixes #1101

Related Issues / Links

Description of Changes

  • Wrote at least one-line docstrings (for any new functions)
  • Added unit test(s) covering the changes (if testable)

It should return the both the alt-specific and the fallback .lr
filenames (in addition to the attachment filename.)
The SourceObject.source_filename property now just returns the first
of the filenames returned by the iter_source_filenames method.
Both Records and VirtualSourceObjects are described by the identity
key (path, alt, pad).  That is, if

   obj2 = pad.get(obj.path, alt=obj.path)

then

   obj2 == obj

Records already had __eq__ and __hash__ methods which made this
explicit.  Here we add those methods for VirtualSourceObjects.

Note this allows us to convert Context.referenced_virtual_dependencies
from a dict to a set.  (The key was only being used as an identity key
for the VirtualSourceObjects.)
@dairiki dairiki force-pushed the fix.1101-source-object-equality branch from a9200b0 to a1bdd9c Compare February 26, 2023 20:00
@dairiki dairiki marked this pull request as ready for review February 26, 2023 20:36
@relikd
Copy link
Contributor

relikd commented Feb 26, 2023

Can you elaborate more on the reason why the Pad has to be the same for equality? I am not sure but for me it just does not feel right. Path, page_num, and alt are properties that describe a specific page / end-point. Whereas Pad is the data structure that generates the page. I think I understand the reason behind the thinking (in a build process you want to know if you have seen the same object before), but I could imagine scenarios where you want to compare accross multiple builders to know if you are targeting the same page.

@dairiki
Copy link
Contributor Author

dairiki commented Feb 26, 2023

Can you elaborate more on the reason why the Pad has to be the same for equality? I am not sure but for me it just does not feel right. Path, page_num, and alt are properties that describe a specific page / end-point. Whereas Pad is the data structure that generates the page. I think I understand the reason behind the thinking (in a build process you want to know if you have seen the same object before), but I could imagine scenarios where you want to compare accross multiple builders to know if you are targeting the same page.

Here's my thinking:

Equality is a question of identity. Records from different pads are not the same and can not be used interchangeably. Pad is a attribute of Record, after all. If record1.pad != record2.pad, it would be confusing if len({record1, record2}) == 1.
(That is not to say that there might not be use cases where one wants to check only that the path and alt match — but in those cases, one should make the check explicit. E.g. if (r1.path, r1.alt) == (r2.path, r2.alt): ....)

I could sort of see fudging this, if there were a common use-case that if facilitated — but even if one relaxes things so that record-equality doesn't depend on the pad, it should certainly depend on the db. Records from different databases are clearly not the same.
So relaxing the requirements that the pad be the same to instead require that pad.db be the same doesn't really save any complexity.

Adjusting the semantics of equality to mean that two objects are "the same for some but not all purposes" seems likely to produce surprises. Especially in this case where we define a __hash__ (which makes a claim to Python that the records are immutable.)

(If one does want to work across multiple pads, a useful abstraction would probably be to implement a replace(record, pad) function that returns a (possibly) new record bound to pad rather than whatever other pad it originally came from. Then one could check if replace(record2, pad=record1.pad) == record1: ....)

@dairiki dairiki added this to the 3.4 milestone Feb 28, 2023
@dairiki dairiki merged commit 9bed75c into lektor:master Mar 8, 2023
@dairiki dairiki deleted the fix.1101-source-object-equality branch March 8, 2023 23:43
dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
* A sources identity depends on alt as well as path

* Fix Attachment.iter_source_filenames

It should return the both the alt-specific and the fallback .lr
filenames (in addition to the attachment filename.)

* SourceObject subclasses should all implement .iter_source_filenames

The SourceObject.source_filename property now just returns the first
of the filenames returned by the iter_source_filenames method.

* Make the identity key (source, alt) explicit for VirtualSourceObjects

Both Records and VirtualSourceObjects are described by the identity
key (path, alt, pad).  That is, if

   obj2 = pad.get(obj.path, alt=obj.path)

then

   obj2 == obj

Records already had __eq__ and __hash__ methods which made this
explicit.  Here we add those methods for VirtualSourceObjects.

Note this allows us to convert Context.referenced_virtual_dependencies
from a dict to a set.  (The key was only being used as an identity key
for the VirtualSourceObjects.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lektor.db.Record equality
2 participants