Skip to content

fix: Add dataset_id filters to the hit_count's subqueries#33757

Merged
asukaminato0721 merged 14 commits intomainfrom
fix/issue-33756
Mar 19, 2026
Merged

fix: Add dataset_id filters to the hit_count's subqueries#33757
asukaminato0721 merged 14 commits intomainfrom
fix/issue-33756

Conversation

@FFXN
Copy link
Contributor

@FFXN FFXN commented Mar 19, 2026

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

fixes: #33756
Querying document list based on hit_count caused slow SQL, so add dataset_id filters to the hit_count's subqueries.

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

FFXN and others added 13 commits March 3, 2026 14:31
…m service including remote template service and database, return responding error message.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…m service including remote template service and database, return responding error message.
# Conflicts:
#	api/services/rag_pipeline/pipeline_template/remote/remote_retrieval.py
@FFXN FFXN requested a review from JohnJyong as a code owner March 19, 2026 10:12
Copilot AI review requested due to automatic review settings March 19, 2026 10:12
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Mar 19, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a performance issue where querying document lists based on hit_count resulted in slow SQL queries. The change optimizes these queries by introducing a dataset_id filter directly into the hit_count subqueries, ensuring that only relevant data is processed and significantly speeding up the data retrieval process.

Highlights

  • SQL Query Optimization: Added a dataset_id filter to the hit_count subqueries to improve performance when querying document lists.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

Pyrefly Diff

No changes detected.

1 similar comment
@github-actions
Copy link
Contributor

Pyrefly Diff

No changes detected.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a performance issue when listing dataset documents sorted by hit_count by constraining the DocumentSegment aggregation subquery to the current dataset_id, avoiding unnecessary scans.

Changes:

  • Add a dataset_id filter to the hit_count aggregation subquery used for sorting documents.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request addresses a performance issue by adding a dataset_id filter to the hit_count subqueries. This change is crucial for optimizing database queries and should significantly improve the efficiency of fetching document lists based on hit counts. The implementation is straightforward and directly targets the identified bottleneck.

@github-actions
Copy link
Contributor

Pyrefly Diff

No changes detected.

1 similar comment
@github-actions
Copy link
Contributor

Pyrefly Diff

No changes detected.

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.13%. Comparing base (df0ded2) to head (6c7963a).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #33757      +/-   ##
==========================================
- Coverage   77.13%   77.13%   -0.01%     
==========================================
  Files        4356     4356              
  Lines      175245   175245              
  Branches    33477    33477              
==========================================
- Hits       135170   135169       -1     
  Misses      36826    36826              
- Partials     3249     3250       +1     
Flag Coverage Δ
api 76.65% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@hj24 hj24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 19, 2026
@asukaminato0721 asukaminato0721 merged commit bb1a6f8 into main Mar 19, 2026
19 checks passed
@asukaminato0721 asukaminato0721 deleted the fix/issue-33756 branch March 19, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Refactor/Chore] Querying document list based on hit_count caused slow SQL

4 participants