Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex bug with initial optional parentheses #1

Closed
thomaskrause opened this issue Aug 17, 2012 · 2 comments
Closed

Regex bug with initial optional parentheses #1

thomaskrause opened this issue Aug 17, 2012 · 2 comments
Assignees
Labels
Milestone

Comments

@thomaskrause
Copy link
Member

When using Regex matches, an optional parentheses is ignored. For example:

pos=/(APPR)?ART/

only finds pos="ART", but not pos="APPR"

The following query works fine:

pos=/^(APPR)?ART/

Imported from Launchpad using lp2gh.

@thomaskrause
Copy link
Member Author

(by krause)
Seems like the bug Viktor found in PostgreSQL itself: http://archives.postgresql.org/message-id/20120706184951.GA91000%40client195-161.wlan.hu-berlin.de

This was fixed in PostgreSQL itself (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=628cbb50ba80c83917b07a7609ddec12cda172d0) and will be backported to 8.4 (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b9edaa784e407d1b6e890b776c607a26f3aa7e49) and 9.1 (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a9287de1760450e7fe3b4309ee1ba7ea2af39217)

We should wait for the official releases and if this fixes the problem make a public anouncement to upgrade.

thomaskrause added a commit that referenced this issue Aug 17, 2012
…eme.

This scheme consists of a separate "annotation_pool" table, containing all possible combinations of node and edge annotations. The facts table only holds a bigint refeference to the id of annotation_table. In order to allow the PostgreSQL optimizer to know about the selectivity of a certain node/edge annotation query, the matching annotation ID is calculated by a immutable SQL function which result is calculated and inserted into the query before the optimizer runs. Selecting a huge number of annotation IDs (lemma=/.*/ on tiger2) did not have a significant  impact on the query speed, since parsing the IDs is not necessary (their are included into the internal data structures).

It is still possible to use the old pure scheme, which is important as
a) a fallback
b) for benchmarking

These changes are made in order to improve the planners information about the selectivity of annotation based subqueries. Before the planner assumed statistical independence of the columns node/edge_annotation_name and node/edge_annotation_value (same for namespace, but normally people don't query explictly for it) which could be misleading in cases like "NN as value always means pos as name". Therefore a really simple index scheme is applied, which only indexes each column together with the corpus_ref or text_ref. This index could be improved but showed none or not huge disadvantages on the AQL test query set for tiger2. It also made Queries like

count pos="ART" & pos="NN" & pos="VAPP" & #1 . #2 & #2 .1,30 #3

where all nodes except one are not really selective at all pass in less <60 seconds on tiger2. These queries result in a timeout when using the old index scheme.

Also note that currently there are different SQL-getter functions for the possible combinations of namespace/name/value/regex annotation queries. This should be much improved, e.g. by using NULL and CASE in the SQL.
thomaskrause added a commit that referenced this issue Aug 17, 2012
@ghost ghost assigned thomaskrause Aug 20, 2012
@thomaskrause
Copy link
Member Author

korpling server is updated to the newly released PostgreSQL version and shows both ART and APPRART results.

The SFB server is not updated and has the bug, but it will not only find ART as described in this bug, but only APPART. Is this still the same bug or do we have different phenomena here.

thomaskrause added a commit that referenced this issue Nov 20, 2012
Move authentification and authorization from frontend to service

Until now the annis-gui was responsible for authentification and authorization. The annis-service itself was not secured at all (other than only listening to localhost).

This branch moves the responsibility to the service. Thus the search functionality can be exposed to the outer world using a HTTP proxy and the service.

The branch also replaces the self made security manager with the Apache Shiro library. This powerful external library has several advantages

powerful abstraction and more fine grained access control (e.g. only allow count but not subgraph retrievial for a specific user)
much more tested implementation
support for salted passwords
integration of other authentification techniques like LDAP
administrator can configure the security model with the shiro.ini file
thomaskrause added a commit that referenced this issue Oct 9, 2013
Precedence optimization fails when applied to spans which cover more than one token

Take e.g. this query on pcc2
NP & NP & NP &  #1 . #2 & #2 . #3

In ANNIS 2 this gave us 2 results, but since ANNIS 3 incorrectly applies the precedence optimization the query gets translated to

NP & NP & NP &  #1 . #2 & #2 . #3 & #1 . #3

and has only 1 match. The correct optimization would be

NP & NP & NP &  #1 . #2 & #2 . #3 & #1 .* #3

This commit adds proper test cases for this situation and gives a fix
thomaskrause added a commit that referenced this issue Nov 5, 2013
thomaskrause added a commit that referenced this issue Nov 21, 2013
tok="abc" . node

which could be either

tok="abc" & node & #1 . #2
or
tok & "abc" & node &  #1 = #2 & #2 . #3
zangsir added a commit that referenced this issue Oct 14, 2014
update changes from korpling for September
thomaskrause added a commit that referenced this issue Feb 15, 2016
… node definitions.

This removes an ambiquity for the "!=" token. E.g.

tok!="the"

could be interpreted as "All token which don't have "the" as value" or as

tok & "the" & #1 != #2

The latter one is semantically invalid (no binding) so the ambiquity is solved by not allowing the AQL operator "!=" and "==" in short AQL definitions.

This fixes #494.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant