Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior of Sequence query with wildcards #49

Open
dbealthy opened this issue Nov 24, 2023 · 0 comments
Open

Weird behavior of Sequence query with wildcards #49

dbealthy opened this issue Nov 24, 2023 · 0 comments

Comments

@dbealthy
Copy link

dbealthy commented Nov 24, 2023

Problem

I am trying to implement behaviour similar to sphinx search engine handling phrases with wildcards. For this i use whoosh library. But when i use sequence queries with short words (2 chars length) and wildcards i get an error:

    289     return [Span(pos) for pos in self.value_as("positions")]
    290 else:
--> 291     raise Exception("Field does not support spans")

Exception: Field does not support spans

I noticied this happens when i add a lot of documents to the index, it doesn't happn with small number of documents though.

I want be able to search with queries like:

  • "найденный" AND ("по* мест* проживания" OR "рядом с домом")

Here "по* мест* проживания" causes the error. When i reduce it to "по*" it runs well, if i change it a bit to "по* дороге" i am still getting the same error.

Code and expected results

from whoosh.fields import Schema, TEXT, NUMERIC
from whoosh.qparser import QueryParser, PhrasePlugin, SequencePlugin, OperatorsPlugin
from whoosh import analysis
from whoosh.filedb.filestore import RamStorage


analyzer = analysis.StandardAnalyzer(minsize=None, stoplist=None)
schema = Schema(item_id=NUMERIC(stored=True, bits=64), type=NUMERIC(stored=True), content=TEXT(analyzer=analyzer, stored=True, phrase=True))
storage = RamStorage()

    
ix = storage.create_index(schema)
writer = ix.writer()

with get_db() as db:
    for item in db["items"][0:500]:
        writer.add_document(
            item_id=item["id"], type=item["type"], content=item["content"]
        )
writer.commit(optimize=True)




parser = QueryParser("content", schema=schema)
op = OperatorsPlugin(
    And="AND", Or="OR", AndNot="ANT", Not=None, AndMaybe=None, Require=None
)
parser.remove_plugin_class(PhrasePlugin)
parser.add_plugin(SequencePlugin())
parser.replace_plugin(op)


with ix.searcher() as searcher:
    query = '"найденный" AND ("по* мест* проживания" OR "рядом с домом")'
    query = parser.parse(query, debug=True)
    hits = searcher.search(query, terms=True, limit=None)
    pprint(list(hits))

Expecting to get a list of hits but I am getting the Exception: Field does not support spans instead.

My content is text of variable length in different languages. Queries are also might be in different languages.

cclauss pushed a commit to cclauss/whoosh-1 that referenced this issue Feb 9, 2024
Bumps [actions/stale](https://github.com/actions/stale) from 5 to 9.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/stale/releases">actions/stale's
releases</a>.</em></p>
<blockquote>
<h2>v9.0.0</h2>
<h2>Breaking Changes</h2>
<ol>
<li>Action is now stateful: If the action ends because of <a
href="https://github.com/actions/stale#operations-per-run">operations-per-run</a>
then the next run will start from the first unprocessed issue skipping
the issues processed during the previous run(s). The state is reset when
all the issues are processed. This should be considered for scheduling
workflow runs.</li>
<li>Version 9 of this action updated the runtime to Node.js 20. All
scripts are now run with Node.js 20 instead of Node.js 16 and are
affected by any breaking changes between Node.js 16 and 20.</li>
</ol>
<h2>What Else Changed</h2>
<ol>
<li>Performance optimization that removes unnecessary API calls by <a
href="https://github.com/dsame"><code>@​dsame</code></a> <a
href="https://redirect.github.com/actions/stale/pull/1033/">#1033</a>
fixes <a
href="https://redirect.github.com/actions/stale/issues/792">#792</a></li>
<li>Logs displaying current github API rate limit by <a
href="https://github.com/dsame"><code>@​dsame</code></a> <a
href="https://redirect.github.com/actions/stale/pull/1032">#1032</a>
addresses <a
href="https://redirect.github.com/actions/stale/issues/1029">#1029</a></li>
</ol>
<p>For more information, please read the <a
href="https://github.com/actions/stale#readme">action documentation</a>
and its <a href="https://github.com/actions/stale#statefulness">section
about statefulness</a></p>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/jmeridth"><code>@​jmeridth</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/984">actions/stale#984</a></li>
<li><a
href="https://github.com/nikolai-laevskii"><code>@​nikolai-laevskii</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1020">actions/stale#1020</a></li>
<li><a
href="https://github.com/dusan-trickovic"><code>@​dusan-trickovic</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1056">actions/stale#1056</a></li>
<li><a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/stale/pull/1110">actions/stale#1110</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/stale/compare/v8...v9.0.0">https://github.com/actions/stale/compare/v8...v9.0.0</a></p>
<h2>v8.0.0</h2>
<p>:warning: This version contains breaking changes :warning:</p>
<h2>What's Changed</h2>
<ul>
<li>New option labels-to-remove-when-stale enables users to specify list
of comma delimited labels that will be removed when the issue or PR
becomes stale by <a
href="https://github.com/panticmilos"><code>@​panticmilos</code></a> <a
href="https://redirect.github.com/actions/stale/issues/770">actions/stale#770</a></li>
<li>Skip deleting the branch in the upstream of a forked repo by <a
href="https://github.com/dsame"><code>@​dsame</code></a> <a
href="https://redirect.github.com/actions/stale/pull/913">actions/stale#913</a></li>
<li>abort the build on the error by <a
href="https://github.com/dsame"><code>@​dsame</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/935">actions/stale#935</a></li>
</ul>
<h2>Breaking Changes</h2>
<ul>
<li>In this release we prevent scenarios when the build is not
interrupted on some exceptions, which led to successful builds when they
are supposed to fail</li>
</ul>
<h2>Example</h2>
<pre lang="yaml"><code>name: 'Remove labels when the issue or PR becomes
stale'
on:
  schedule:
    - cron: '30 1 * * *'
<p>permissions:
pull-requests: write</p>
<p>jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v8
with:
labels-to-remove-when-stale: 'label1,label2'
</code></pre></p>
<h2>v7.0.0</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/stale/blob/main/CHANGELOG.md">actions/stale's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h1>[7.0.0]</h1>
<p>:warning: Breaking change :warning:</p>
<ul>
<li>Allow daysBeforeStale options to be float by <a
href="https://github.com/irega"><code>@​irega</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/841">actions/stale#841</a></li>
<li>Use cache in check-dist.yml by <a
href="https://github.com/jongwooo"><code>@​jongwooo</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/876">actions/stale#876</a></li>
<li>fix print outputs step in existing workflows by <a
href="https://github.com/irega"><code>@​irega</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/859">actions/stale#859</a></li>
<li>Update issue and PR templates, add/delete workflow files by <a
href="https://github.com/IvanZosimov"><code>@​IvanZosimov</code></a> in
<a
href="https://redirect.github.com/actions/stale/pull/880">actions/stale#880</a></li>
<li>Update how stale handles exempt items by <a
href="https://github.com/johnsudol"><code>@​johnsudol</code></a> in <a
href="https://redirect.github.com/actions/stale/pull/874">actions/stale#874</a></li>
</ul>
<h1>[6.0.1]</h1>
<p>Update <code>@​actions/core</code> to v1.10.0 (<a
href="https://redirect.github.com/actions/stale/pull/839">#839</a>)</p>
<h1>[6.0.0]</h1>
<p>:warning: Breaking change :warning:</p>
<p>Issues/PRs default <code>close-issue-reason</code> is now
<code>not_planned</code>(<a
href="https://redirect.github.com/actions/stale/issues/789">#789</a>)</p>
<h1>[5.1.0]</h1>
<p><a href="https://redirect.github.com/actions/stale/issues/696">Don't
process stale issues right after they're marked stale</a>
[Add close-issue-reason option]<a
href="https://redirect.github.com/actions/stale/pull/764">#764</a><a
href="https://redirect.github.com/actions/stale/pull/772">#772</a>
Various dependabot/dependency updates</p>
<h2><a
href="https://github.com/actions/stale/compare/v3.0.19...v4.1.0">4.1.0</a>
(2021-07-14)</h2>
<h2>Features</h2>
<ul>
<li><a
href="https://github.com/actions/stale/commit/9912fa74d1c01b5d6187793d97441019cbe325d0">Ability
to exempt draft PRs</a></li>
</ul>
<h2><a
href="https://github.com/actions/stale/compare/v3.0.19...v4.0.0">4.0.0</a>
(2021-07-14)</h2>
<h3>Features</h3>
<ul>
<li><strong>options:</strong> simplify config by removing skip stale
message options (<a
href="https://redirect.github.com/actions/stale/issues/457">#457</a>)
(<a
href="https://github.com/actions/stale/commit/6ec637d238067ab8cc96c9289dcdac280bbd3f4a">6ec637d</a>),
closes <a
href="https://redirect.github.com/actions/stale/issues/405">#405</a> <a
href="https://redirect.github.com/actions/stale/issues/455">#455</a></li>
<li><strong>output:</strong> print output parameters (<a
href="https://redirect.github.com/actions/stale/issues/458">#458</a>)
(<a
href="https://github.com/actions/stale/commit/3e6d35b685f0b2fa1a69be893fa07d3d85e05ee0">3e6d35b</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>dry-run:</strong> forbid mutations in dry-run (<a
href="https://redirect.github.com/actions/stale/issues/500">#500</a>)
(<a
href="https://github.com/actions/stale/commit/f1017f33dd159ea51366375120c3e6981d7c3097">f1017f3</a>),
closes <a
href="https://redirect.github.com/actions/stale/issues/499">#499</a></li>
<li><strong>logs:</strong> coloured logs (<a
href="https://redirect.github.com/actions/stale/issues/465">#465</a>)
(<a
href="https://github.com/actions/stale/commit/5fbbfba142860ea6512549e96e36e3540c314132">5fbbfba</a>)</li>
<li><strong>operations:</strong> fail fast the current batch to respect
the operations limit (<a
href="https://redirect.github.com/actions/stale/issues/474">#474</a>)
(<a
href="https://github.com/actions/stale/commit/5f6f311ca6aa75babadfc7bac6edf5d85fa3f35d">5f6f311</a>),
closes <a
href="https://redirect.github.com/actions/stale/issues/466">#466</a></li>
<li><strong>label comparison</strong>: make label comparison case
insensitive <a
href="https://redirect.github.com/actions/stale/pull/517">#517</a>,
closes <a
href="https://redirect.github.com/actions/stale/pull/516">#516</a></li>
<li><strong>filtering comments by actor could have strange
behavior</strong>: &quot;stale&quot; comments are now detected based on
if the message is the stale message not <em>who</em> made the comment(<a
href="https://redirect.github.com/actions/stale/pull/519">#519</a>),
fixes <a
href="https://redirect.github.com/actions/stale/pull/441">#441</a>, <a
href="https://redirect.github.com/actions/stale/pull/509">#509</a>, <a
href="https://redirect.github.com/actions/stale/pull/518">#518</a></li>
</ul>
<h3>Breaking Changes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/stale/commit/28ca1036281a5e5922ead5184a1bbf96e5fc984e"><code>28ca103</code></a>
Upgrade Node to v20 (<a
href="https://redirect.github.com/actions/stale/issues/1110">#1110</a>)</li>
<li><a
href="https://github.com/actions/stale/commit/b69b346013879cedbf50c69f572cd85439a41936"><code>b69b346</code></a>
build(deps-dev): bump <code>@​types/node</code> from 18.16.18 to 20.5.1
(<a
href="https://redirect.github.com/actions/stale/issues/1079">#1079</a>)</li>
<li><a
href="https://github.com/actions/stale/commit/88a6f4f6cbe6984b66da80d3d448d5f0c4d1a42d"><code>88a6f4f</code></a>
build(deps-dev): bump typescript from 5.1.3 to 5.2.2 (<a
href="https://redirect.github.com/actions/stale/issues/1083">#1083</a>)</li>
<li><a
href="https://github.com/actions/stale/commit/796531a7b39835671cd9f436d7205b1556bbcff6"><code>796531a</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/stale/issues/1080">#1080</a>
from akv-platform/fix-delete-cache</li>
<li><a
href="https://github.com/actions/stale/commit/8986f6218b4935e254d5b844e8c36b4d953f2643"><code>8986f62</code></a>
Don not try to delete cache if it does not exists</li>
<li><a
href="https://github.com/actions/stale/commit/cab99b362baa5afbbd8c7c78234e6b071b80687d"><code>cab99b3</code></a>
fix typo proceeded/processed</li>
<li><a
href="https://github.com/actions/stale/commit/184e7afe930f6b5c7ce52c4b3f087692c6e912f3"><code>184e7af</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/stale/issues/1064">#1064</a>
from actions/dependabot/npm_and_yarn/typescript-esli...</li>
<li><a
href="https://github.com/actions/stale/commit/523885cf3c115dcdcbed573c7eaf0f5621f124e2"><code>523885c</code></a>
chore: update eslint-plugin, parser and eslint-plugin-jest</li>
<li><a
href="https://github.com/actions/stale/commit/2487a1dc2b501176ca975d2d8b4d3293d72180a2"><code>2487a1d</code></a>
build(deps-dev): bump
<code>@​typescript-eslint/eslint-plugin</code></li>
<li><a
href="https://github.com/actions/stale/commit/60c722ee97faafece819a7c41b03ec113f99717d"><code>60c722e</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/stale/issues/1063">#1063</a>
from actions/dependabot/npm_and_yarn/jest-29.6.2</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/stale/compare/v5...v9">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/stale&package-manager=github_actions&previous-version=5&new-version=9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant