Regex bug with initial optional parentheses #1

thomaskrause · 2012-08-17T09:20:00Z

When using Regex matches, an optional parentheses is ignored. For example:

pos=/(APPR)?ART/

only finds pos="ART", but not pos="APPR"

The following query works fine:

pos=/^(APPR)?ART/

Imported from Launchpad using lp2gh.

date created: 2012-07-18T10:05:28Z
owner: amir-zeldes
assignee: krause
the launchpad url was https://bugs.launchpad.net/bugs/1026055

The text was updated successfully, but these errors were encountered:

thomaskrause · 2012-08-17T09:20:18Z

(by krause)
Seems like the bug Viktor found in PostgreSQL itself: http://archives.postgresql.org/message-id/20120706184951.GA91000%40client195-161.wlan.hu-berlin.de

This was fixed in PostgreSQL itself (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=628cbb50ba80c83917b07a7609ddec12cda172d0) and will be backported to 8.4 (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b9edaa784e407d1b6e890b776c607a26f3aa7e49) and 9.1 (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a9287de1760450e7fe3b4309ee1ba7ea2af39217)

We should wait for the official releases and if this fixes the problem make a public anouncement to upgrade.

…eme. This scheme consists of a separate "annotation_pool" table, containing all possible combinations of node and edge annotations. The facts table only holds a bigint refeference to the id of annotation_table. In order to allow the PostgreSQL optimizer to know about the selectivity of a certain node/edge annotation query, the matching annotation ID is calculated by a immutable SQL function which result is calculated and inserted into the query before the optimizer runs. Selecting a huge number of annotation IDs (lemma=/.*/ on tiger2) did not have a significant impact on the query speed, since parsing the IDs is not necessary (their are included into the internal data structures). It is still possible to use the old pure scheme, which is important as a) a fallback b) for benchmarking These changes are made in order to improve the planners information about the selectivity of annotation based subqueries. Before the planner assumed statistical independence of the columns node/edge_annotation_name and node/edge_annotation_value (same for namespace, but normally people don't query explictly for it) which could be misleading in cases like "NN as value always means pos as name". Therefore a really simple index scheme is applied, which only indexes each column together with the corpus_ref or text_ref. This index could be improved but showed none or not huge disadvantages on the AQL test query set for tiger2. It also made Queries like count pos="ART" & pos="NN" & pos="VAPP" & #1 . #2 & #2 .1,30 #3 where all nodes except one are not really selective at all pass in less <60 seconds on tiger2. These queries result in a timeout when using the old index scheme. Also note that currently there are different SQL-getter functions for the possible combinations of namespace/name/value/regex annotation queries. This should be much improved, e.g. by using NULL and CASE in the SQL.

thomaskrause · 2012-08-21T07:58:45Z

korpling server is updated to the newly released PostgreSQL version and shows both ART and APPRART results.

The SFB server is not updated and has the bug, but it will not only find ART as described in this bug, but only APPART. Is this still the same bug or do we have different phenomena here.

Move authentification and authorization from frontend to service Until now the annis-gui was responsible for authentification and authorization. The annis-service itself was not secured at all (other than only listening to localhost). This branch moves the responsibility to the service. Thus the search functionality can be exposed to the outer world using a HTTP proxy and the service. The branch also replaces the self made security manager with the Apache Shiro library. This powerful external library has several advantages powerful abstraction and more fine grained access control (e.g. only allow count but not subgraph retrievial for a specific user) much more tested implementation support for salted passwords integration of other authentification techniques like LDAP administrator can configure the security model with the shiro.ini file

Precedence optimization fails when applied to spans which cover more than one token Take e.g. this query on pcc2 NP & NP & NP & #1 . #2 & #2 . #3 In ANNIS 2 this gave us 2 results, but since ANNIS 3 incorrectly applies the precedence optimization the query gets translated to NP & NP & NP & #1 . #2 & #2 . #3 & #1 . #3 and has only 1 match. The correct optimization would be NP & NP & NP & #1 . #2 & #2 . #3 & #1 .* #3 This commit adds proper test cases for this situation and gives a fix

… query like #1 . #2 & node & node

tok="abc" . node which could be either tok="abc" & node & #1 . #2 or tok & "abc" & node & #1 = #2 & #2 . #3

update changes from korpling for September

… node definitions. This removes an ambiquity for the "!=" token. E.g. tok!="the" could be interpreted as "All token which don't have "the" as value" or as tok & "the" & #1 != #2 The latter one is semantically invalid (no binding) so the ambiquity is solved by not allowing the AQL operator "!=" and "==" in short AQL definitions. This fixes #494.

This was referenced Aug 17, 2012

Search by "sentence" #6

Closed

WEKA: export metadata #9

Closed

Output corpus position in tokens for hits #11

Closed

gridtree seems to produce wrong output #14

Closed

Add island feature to grid #27

Closed

thomaskrause added a commit that referenced this issue Aug 17, 2012

- allow "#1 .tok,1,2" syntax

32ffc1a

ghost assigned thomaskrause Aug 20, 2012

thomaskrause closed this as completed Oct 30, 2012

amir-zeldes mentioned this issue Nov 2, 2012

Orphan/root tokens are missing in the tiger tree view #47

Closed

This was referenced Apr 2, 2013

Grid is broken in parallel corpora #98

Closed

Corpus explorer does not output alignment edges with no annotations #99

Closed

amir-zeldes mentioned this issue May 15, 2013

Hit marking in HTML visualizations #105

Closed

This was referenced May 24, 2013

Highlighting of matched tokens within matched tokens in a second color doesn't always work #115

Closed

Segmentation precedence operator not working correctly #125

Closed

Hit marking in KWIC for segmentations precedence queries is incorrect #126

Closed

This was referenced Jun 17, 2013

Match highlighting in KWIC is incorrect/missing in parallel corpus query of non-terminal elements #137

Closed

Bug in arity operator #138

Closed

thomaskrause mentioned this issue Sep 26, 2013

Query for frequency does not output any data #220

Closed

amir-zeldes mentioned this issue Oct 23, 2013

Very thin bars in frequency analysis #245

Closed

thomaskrause added a commit that referenced this issue Nov 5, 2013

splitting up listener for query nodes and for joins in order to allow…

a5cebcd

… query like #1 . #2 & node & node

thomaskrause added a commit that referenced this issue Nov 21, 2013

use "==" operator for identiy in order to solve ambiguity when parsing

3e897e0

tok="abc" . node which could be either tok="abc" & node & #1 . #2 or tok & "abc" & node & #1 = #2 & #2 . #3

amir-zeldes mentioned this issue Mar 13, 2014

New AQL operator: "near" #290

Closed

zangsir added a commit that referenced this issue Oct 14, 2014

Merge pull request #1 from korpling/develop

390495d

update changes from korpling for September

Annotation-123 mentioned this issue Nov 20, 2014

Nodes, Edges und SecEdges in Konstituentenstruktur (tree) sind leer oder werden nicht angezeigt #367

Closed

amir-zeldes mentioned this issue Mar 17, 2015

Frequencies fails on disjoint AQL #394

Closed

TFeige mentioned this issue Nov 21, 2016

AQL-editor and bidirectional text #540

Closed

otichy mentioned this issue Mar 21, 2017

edge annotation in frequency analysis #554

Open

otichy mentioned this issue Mar 19, 2019

Find relations with no edge label #604

Closed

LisaEggert mentioned this issue Jan 27, 2021

Complex Search with "OR" #686

Closed

lehmannx mentioned this issue Feb 22, 2023

CSV Export fails at large matches #816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex bug with initial optional parentheses #1

Regex bug with initial optional parentheses #1

thomaskrause commented Aug 17, 2012

thomaskrause commented Aug 17, 2012

thomaskrause commented Aug 21, 2012

Regex bug with initial optional parentheses #1

Regex bug with initial optional parentheses #1

Comments

thomaskrause commented Aug 17, 2012

thomaskrause commented Aug 17, 2012

thomaskrause commented Aug 21, 2012