Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "hits total" value to search result #1227

Open
admtech opened this issue Jun 1, 2024 · 4 comments
Open

Add "hits total" value to search result #1227

admtech opened this issue Jun 1, 2024 · 4 comments
Labels
feature New feature or request pg_search Issue related to `pg_search/` priority-3-low Low priority issue user-request This issue was directly requested by a user

Comments

@admtech
Copy link

admtech commented Jun 1, 2024

What
During a search, there is always a hit counter with the number of records found. Even if you use limit or offset. This hit counter does not yet exist for "pg_search".

Why
To display the number of hits found despite the limit or offset without having to do a second query. Otherwise, this will cost performance.

How
Add the count value as a field value so that it can be retrieved if needed. By default, Elasticsearch returns the number of documents found in the response to a search query. This value is located under the hits.total.value key. We also need this value for pg_search. In any form.

@neilyio
Copy link
Contributor

neilyio commented Jun 4, 2024

This is an interesting request, thanks for bringing it up. We don't have a standard interface yet for returning metadata about the search. This a little tricky to us being locked in to "table" results with Postgres, where ES can return a nested key-value object.

Could you give me an example query of how you would like this to work?

@neilyio neilyio added feature New feature or request pg_search Issue related to `pg_search/` user-request This issue was directly requested by a user priority-3-low Low priority issue labels Jun 4, 2024
@admtech
Copy link
Author

admtech commented Jun 4, 2024

For example, we have these searches:

SELECT id, title, stamp_create, paradedb.rank_bm25(id) FROM content_search.search(
    query => paradedb.boolean(
            SHOULD => ARRAY[
                    paradedb.parse('title:"Proxmox routing"'),
                    paradedb.parse('title:Proxmox OR title:routing'),
                    paradedb.fuzzy_term(field => 'title', value => 'Proxmox routing')
            ],
            MUST => ARRAY[
                    paradedb.range(field => 'stamp_create', range => '["2010-01-01","2025-01-01"]'::daterange),
                    paradedb.term(field => 'topic_search',value => '965662248')
            ]
    ),
    limit_rows => 4
);

The result would be a total of 487 posts.

It would be easiest if you could just insert the value into the output when you need it, similar to "paradedb.rank_bm25(id)". For example: paradedb.found(). Then the counter is always displayed in the column, but I don't need a second query.

     id      |                             title                              |      stamp_create      | rank_bm25 | found
-------------+----------------------------------------------------------------+------------------------+-----------+-------
      309619 | Proxmox Routing funktioniert nicht                             | 2016-07-12 21:29:25+02 | 33.003704 |   487
      395928 | Proxmox VM zugrif auf Ports vom Host (Proxmox Routing)         | 2018-12-17 07:35:31+01 | 27.670732 |   487
      607986 | Routing Hetzner Proxmox Server klappt nicht                    | 2020-09-26 20:22:36+02 | 15.868061 |   487
  2573741832 | PfSense als VM unter Proxmox - IPv6 Routing funktioniert nicht | 2022-04-22 21:52:48+02 | 13.734482 |   487

Additional metadata would be ideal of course, but for now this simple solution with a field like "paradedb.found()" is enough for me.

@philippemnoel philippemnoel added good first issue Good for newcomers and removed good first issue Good for newcomers labels Jun 25, 2024
@philippemnoel
Copy link
Collaborator

For example, we have these searches:

SELECT id, title, stamp_create, paradedb.rank_bm25(id) FROM content_search.search(
    query => paradedb.boolean(
            SHOULD => ARRAY[
                    paradedb.parse('title:"Proxmox routing"'),
                    paradedb.parse('title:Proxmox OR title:routing'),
                    paradedb.fuzzy_term(field => 'title', value => 'Proxmox routing')
            ],
            MUST => ARRAY[
                    paradedb.range(field => 'stamp_create', range => '["2010-01-01","2025-01-01"]'::daterange),
                    paradedb.term(field => 'topic_search',value => '965662248')
            ]
    ),
    limit_rows => 4
);

The result would be a total of 487 posts.

It would be easiest if you could just insert the value into the output when you need it, similar to "paradedb.rank_bm25(id)". For example: paradedb.found(). Then the counter is always displayed in the column, but I don't need a second query.

     id      |                             title                              |      stamp_create      | rank_bm25 | found
-------------+----------------------------------------------------------------+------------------------+-----------+-------
      309619 | Proxmox Routing funktioniert nicht                             | 2016-07-12 21:29:25+02 | 33.003704 |   487
      395928 | Proxmox VM zugrif auf Ports vom Host (Proxmox Routing)         | 2018-12-17 07:35:31+01 | 27.670732 |   487
      607986 | Routing Hetzner Proxmox Server klappt nicht                    | 2020-09-26 20:22:36+02 | 15.868061 |   487
  2573741832 | PfSense als VM unter Proxmox - IPv6 Routing funktioniert nicht | 2022-04-22 21:52:48+02 | 13.734482 |   487

Additional metadata would be ideal of course, but for now this simple solution with a field like "paradedb.found()" is enough for me.

What other metadata would you like to see?

@admtech
Copy link
Author

admtech commented Jul 16, 2024

For example, the total search time. But as I said, the number of results found is enough for me :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request pg_search Issue related to `pg_search/` priority-3-low Low priority issue user-request This issue was directly requested by a user
Projects
None yet
Development

No branches or pull requests

3 participants