Allow larger strings in annotation values #3

thomaskrause · 2012-08-17T09:20:01Z

Could the maximum of allowed characters in a single field be extended to 2000?

Imported from Launchpad using lp2gh.

date created: 2012-06-08T14:40:43Z
owner: claudia-schneider2
assignee: krause
the launchpad url was https://bugs.launchpad.net/bugs/1010500

…eme. This scheme consists of a separate "annotation_pool" table, containing all possible combinations of node and edge annotations. The facts table only holds a bigint refeference to the id of annotation_table. In order to allow the PostgreSQL optimizer to know about the selectivity of a certain node/edge annotation query, the matching annotation ID is calculated by a immutable SQL function which result is calculated and inserted into the query before the optimizer runs. Selecting a huge number of annotation IDs (lemma=/.*/ on tiger2) did not have a significant impact on the query speed, since parsing the IDs is not necessary (their are included into the internal data structures). It is still possible to use the old pure scheme, which is important as a) a fallback b) for benchmarking These changes are made in order to improve the planners information about the selectivity of annotation based subqueries. Before the planner assumed statistical independence of the columns node/edge_annotation_name and node/edge_annotation_value (same for namespace, but normally people don't query explictly for it) which could be misleading in cases like "NN as value always means pos as name". Therefore a really simple index scheme is applied, which only indexes each column together with the corpus_ref or text_ref. This index could be improved but showed none or not huge disadvantages on the AQL test query set for tiger2. It also made Queries like count pos="ART" & pos="NN" & pos="VAPP" & #1 . #2 & #2 .1,30 #3 where all nodes except one are not really selective at all pass in less <60 seconds on tiger2. These queries result in a timeout when using the old index scheme. Also note that currently there are different SQL-getter functions for the possible combinations of namespace/name/value/regex annotation queries. This should be much improved, e.g. by using NULL and CASE in the SQL.

Precedence optimization fails when applied to spans which cover more than one token Take e.g. this query on pcc2 NP & NP & NP & #1 . #2 & #2 . #3 In ANNIS 2 this gave us 2 results, but since ANNIS 3 incorrectly applies the precedence optimization the query gets translated to NP & NP & NP & #1 . #2 & #2 . #3 & #1 . #3 and has only 1 match. The correct optimization would be NP & NP & NP & #1 . #2 & #2 . #3 & #1 .* #3 This commit adds proper test cases for this situation and gives a fix

tok="abc" . node which could be either tok="abc" & node & #1 . #2 or tok & "abc" & node & #1 = #2 & #2 . #3

Merge into amir-zeldes:develop

updates

Remove Spring as dependency

thomaskrause mentioned this issue Aug 17, 2012

Search by "sentence" #6

Closed

thomaskrause closed this as completed Aug 17, 2012

thomaskrause mentioned this issue Sep 26, 2013

Query for frequency does not output any data #220

Closed

thomaskrause referenced this issue in thomaskrause/ANNIS Nov 20, 2013

use "==" operator for identiy in order to solve ambiguity when parsing

3e897e0

tok="abc" . node which could be either tok="abc" & node & #1 . #2 or tok & "abc" & node & #1 = #2 & #2 . #3

amir-zeldes pushed a commit that referenced this issue Sep 23, 2014

Merge pull request #3 from korpling/develop

10b6625

Merge into amir-zeldes:develop

This was referenced Nov 24, 2014

Disjunction fails depending on order #372

Closed

Highlighting failure in disjunction #373

Closed

thomaskrause pushed a commit that referenced this issue Oct 12, 2016

Merge pull request #3 from korpling/develop

751a522

updates

otichy mentioned this issue Mar 21, 2017

edge annotation in frequency analysis #554

Open

thomaskrause added a commit that referenced this issue Nov 21, 2018

Merge pull request #3 from thomaskrause/feature/graphannis-no-spring

00c6a3e

Remove Spring as dependency

LisaEggert mentioned this issue Jan 27, 2021

Complex Search with "OR" #686

Closed

amir-zeldes mentioned this issue Aug 21, 2021

Operator negation in AQL - part 1: negation with existence assumption korpling/graphANNIS#186

Closed

lehmannx mentioned this issue Feb 22, 2023

CSV Export fails at large matches #816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow larger strings in annotation values #3

Allow larger strings in annotation values #3

thomaskrause commented Aug 17, 2012

Allow larger strings in annotation values #3

Allow larger strings in annotation values #3

Comments

thomaskrause commented Aug 17, 2012