Skip to content

Feature/fresnel lensing#1700

Merged
kwahlin merged 54 commits intodevelopfrom
feature/fresnel-lensing
Mar 6, 2026
Merged

Feature/fresnel lensing#1700
kwahlin merged 54 commits intodevelopfrom
feature/fresnel-lensing

Conversation

@kwahlin
Copy link
Copy Markdown
Contributor

@kwahlin kwahlin commented Feb 23, 2026

Add extended support for Fresnel features and refactor how search cards/chips are built for indexing.

Key changes:

  • Add support fresnel:subLens and fresnel:mergeProperties (https://www.w3.org/2005/04/fresnel-info/manual/)
  • Build chips/cards with FresnelUtil instead of JsonLd.toCard
  • Refactor indexing flow to use “full embellish” before lens application

The main goal is to reduce the number of indexed fields while improving control over how search cards are constructed.

  • Use fresnel:mergeProperties for merging values from multiple properties into a single indexed field instead of creating separate fields.
    • Example 1: Merge Item fields.
    • Example 2: Merge startYear and year into a common field librissearch:year. (In this case we actually create an extra field!)
  • Use fresnel:fslselector (already supported) to include values more than one step away from the main entity. For example for selecting specific record properties for indexing instead of including the entire Record chip for every single linked thing from embellish.
  • Produce _str fields only for entitities with defined search tokens. The _str content is dervied from those definitions.
  • By using a local fresnel:subLens we don't have to define individual search tokens for nested entities (e.g. TitlePart) and thereby prevent unwanted _str for those entitites.

So far I've focused on the immediate use cases, but I think they serve as examples of how we can use these mechanisms to further decrease the number of index fields. Surely we can find more optimizations if we analyze the current search card definitions.

On local test data, this results in ~25% fewer indexed fields, likely because we no longer include the full Record chips everywhere. Nothing appears to break when removing those fields, but this needs validation.

Indexing procedure
One problem with the previous indexing procedure (i.e. ElasticSearch.getShapeForIndex) was data being filtered during the initial embellish and thus data needed for search-card lenses was already lost. Instead perform a "full" embellish and then shrink them once according to the lens definitions.

  • Apply search-card to:
    • original graph entities (usually record + mainEntity)
    • embellished entities with integral relations (i.e. instances of a linked work)
  • Apply search-chip to non-integral embellished entities.

FresnelUtil changes
This class has been heavily refactored to accommodate new features and more flexibility. Mechanisms already in use should behave the same however.

Some behavior needed to be more robust or pushed up to the surface for clarity. For example clarifying how lenses are applied at different levels in the document structure. As an example CHIP_TO_TOKEN means:

  • Top level: try chip
  • Nested level: try token, then chip
    This stuff can surely be made even more clear and maybe fits better as configuration than hard-coded.

I don't think there is a point in looking at the diff (it's huge) for FresnelUtil when reviewing, rather look at the public methods, and their usage (including unit tests).

TODO / Open questions

  • FresnelUtil can be used for creating sortKeyByLang from chip definitions (here). This was previously done with JsonLd.applyLensByLang and some of its behavior may need to be replicated:
    • Fallback keys when no "chip string" is created.
    • Clean strings from noise?
    • ?
      FIXED: 6421c2a / 4f53d23
  • I sneaked a fix in search2 for searching language tagged fields. The individual lang fields (e.g. xByLang.sv) are not needed for search, however we do rely upon them for display so we can't remove them from the indexed docs. Can we remove them from mappings instead? Something to look into.
    UPDATE: d99e452 removes byLang fields from ES mappings. Reduces the number of mapped fields with another ~10% on local test data.
  • Define more search tokens to cover all the needs for free text search within fields. Should be fairly straightforward.
    UPDATE: Done: libris/definitions@4c002d7
  • If an entity type has no lens definition should we skip indexing it or use a fallback lens?
    UPDATE: Use the default lens as fallback.
  • We have to keep an eye on the index performance. Indexing a single document is faster now but using "full" embellish means we don't make use of the card cache which may make indexing slower in the long run. Indexing time for the local example data is pretty much the same. Let's see if we need a corresponding cache for full docs.
  • The FresnelUtil class is still huge and should be decomposed.
  • FE: Use librissearch:yearPublished / publication.librissearch:year for sorting/faceting on both year and startYear.

There is probably more to be said but let's start the discussion from here.

See also libris/definitions#548

Comment thread whelk-core/src/main/groovy/whelk/search2/querytree/Property.java Outdated
Copy link
Copy Markdown
Contributor

@olovy olovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Great work!

As discussed in online review, I think this is good to go.

🚢

@kwahlin kwahlin merged commit cc86a03 into develop Mar 6, 2026
1 check passed
@kwahlin kwahlin deleted the feature/fresnel-lensing branch March 6, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants