Skip to content

Conversation

@Koncopd
Copy link
Member

@Koncopd Koncopd commented Mar 10, 2025

Adds QuerySet.artifacts_load(), QuerySet.artifacts_open(), QuerySet.artifacts_mapped() that do the same as Collection.load(), Collection.open(), Collection.mapped() for query sets of artifacts. This allows to avoid creating collections if only some loading or opening is needed.

Use:

artifacts = ln.Artifact.filter(..., otype="AnnData").order_by("created_at") # order_by to avoid arbitrary order warning
artifacts.artifacts_open()
artifacts.artifacts_load()
artifacts.artifacts_mapped()

@sunnyosun
Copy link
Member

Why not just QuerySet.open() and QuerySet.mapped()? I think that's more in parallel with the current APIs. Maybe raising an error if the QuerySet is not artifacts?

@Koncopd
Copy link
Member Author

Koncopd commented Mar 10, 2025

Maybe raising an error if the QuerySet is not artifacts?

Sure.

Why not just QuerySet.open() and QuerySet.mapped()? I think that's more in parallel with the current APIs.

Hm, i wanted to emphasize that it works only for query sets of artifacts. You don't like it?

@github-actions
Copy link

github-actions bot commented Mar 10, 2025

@github-actions github-actions bot temporarily deployed to pull request March 10, 2025 15:28 Inactive
@Koncopd Koncopd marked this pull request as draft March 10, 2025 15:41
@codecov
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

Attention: Patch coverage is 95.12195% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.06%. Comparing base (22f4071) to head (bc11bd9).
Report is 114 commits behind head on main.

Files with missing lines Patch % Lines
lamindb/models/collection.py 94.44% 2 Missing ⚠️
lamindb/models/query_set.py 95.65% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2546      +/-   ##
==========================================
- Coverage   92.42%   92.06%   -0.36%     
==========================================
  Files          60       58       -2     
  Lines        9989     8638    -1351     
==========================================
- Hits         9232     7953    -1279     
+ Misses        757      685      -72     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot temporarily deployed to pull request March 10, 2025 16:21 Inactive
@Koncopd
Copy link
Member Author

Koncopd commented Mar 10, 2025

Also a thing to discuss. Now i raise an error if the model of a query set is not Artifact. Maybe i should just query related artifacts instead in this case?

@github-actions github-actions bot temporarily deployed to pull request March 10, 2025 20:41 Inactive
@Koncopd Koncopd force-pushed the queryset_open_mapped branch 2 times, most recently from b2173b7 to 41e056b Compare March 14, 2025 12:48
@github-actions github-actions bot temporarily deployed to pull request March 14, 2025 12:59 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 14, 2025 14:50 Inactive
@Koncopd Koncopd changed the title ✨ Add QuerySet.artifacts_open() and QuerySet.artifacts_mapped() ✨ Add .artifacts_load(), .artifacts_open(), .artifacts_mapped() to QuerySet Mar 14, 2025
@Koncopd Koncopd force-pushed the queryset_open_mapped branch from 0a98baa to 6ceaf0c Compare March 14, 2025 15:08
@github-actions github-actions bot temporarily deployed to pull request March 14, 2025 15:19 Inactive
@github-actions
Copy link

github-actions bot commented Mar 18, 2025

Deployment URL: https://dbb6b8d1.lamindb.pages.dev

@sunnyosun
Copy link
Member

Also a thing to discuss. Now i raise an error if the model of a query set is not Artifact. Maybe i should just query related artifacts instead in this case?

I think the error is good! Querying the related would be too implicit in this case.

Copy link
Member

@falexwolf falexwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a class ArtifactQuerySet if we want to do this.

We cannot bloat the generic QuerySet API with methods that won't make sense for almost all query sets, can we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think backed = ln.Artifact.filter(suffix=".parquet").open() would be a lot prettier here 😊

# Why is that? - Sergei
if len(suffixes) != 1:
raise ValueError(
"Can only load collections where all artifacts have the same suffix"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error needs to also adapt to different functions, right? Now it's not only loading collections, can also be Queryset

Returns an in-memory concatenated `DataFrame` or `AnnData` object.
See also {meth}`~lamindb.models.Collecton.load`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the ref link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add Example:: to the docstring?



def _load_concat_artifacts(
artifacts: list[Artifact], join: Literal["inner", "outer"] = "outer", **kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**kwargs → **concat_kwargs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no **kwargs at all!

@falexwolf
Copy link
Member

Quite generally the name choice artifacts_load() etc. is very confusing and I'd not guess what it does.

If we have ArtifactQuerySet.load() that would be more understandable.

@falexwolf
Copy link
Member

Also: we have to find a way of getting rid of monkey-patching Django's QuerySet. I can't think of a simple solution so far but it's pretty annoying that we have this in our codebase (same for QueryManager).

@Koncopd
Copy link
Member Author

Koncopd commented Mar 18, 2025

I think we need a class ArtifactQuerySet if we want to do this.

Yes, but it is much harder to implement as we have to account for cases where django QuerySet is returned. Now we use monkey patching for this, but it is impossible to do for something like ArtifactQuerySet. I will think on what to do, i have some ideas for how to get rid of monkey patching here.

@falexwolf
Copy link
Member

Yes, but it is much harder to implement as we have to account for cases where django QuerySet is returned. Now we use monkey patching for this, but it is impossible to do for something like ArtifactQuerySet.

I know it's much harder. But we have to invest the effort now. No more new hacks. They'll all just bite us. We gotta keep focusin on removing the hacks of the early days to improve quality and performance.

I will think on what to do, i have some ideas for how to get rid of monkey patching here.

That'd be great!

@Koncopd
Copy link
Member Author

Koncopd commented Apr 29, 2025

To do this properly, i need something like this #2637 first.

@falexwolf
Copy link
Member

Makes total sense!

@Koncopd
Copy link
Member Author

Koncopd commented May 5, 2025

Continued here #2743

@Koncopd Koncopd closed this May 5, 2025
@falexwolf falexwolf deleted the queryset_open_mapped branch October 15, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants