cannot annotate across a sentence break #786

akhondi · 2012-05-25T08:57:19Z

Is there a possibility to allow annotate across a sentence break? maybe if you point me to the script

spyysalo · 2012-05-28T03:15:25Z

For sentence breaks created by the brat sentence splitter, this can most easily be done by switching off the sentence splitting.

For "hard" sentence breaks (newlines in the source text), this is a bit more difficult, as the brat standoff format (http://brat.nlplab.org/standoff.html) is line-oriented and incorporates the annotated text in the .ann files, which would require newline characters in the source text to be escaped. This is currently not done. For these cases, it would be easiest to replace newline characters with space in the source text (if possible).

ghost · 2012-06-28T15:10:32Z

@spyysalo: I have a feeling this question may arise in the future as well, could you add the to the FAQ then close the issue?

spyysalo · 2012-06-29T03:37:49Z

@ninjin : I believe @amadanmath had some ideas on how to permit this if necessary using the mechanisms that were introduced for discontinuous annotations. I wouldn't want to close this without a resolution.

ghost · 2012-07-03T04:48:25Z

@spyysalo: Um... you mean to treat each sentence as a separate component of an annotation? Pardon my French, but that is a f*ckin' ugly hack.

amadanmath · 2012-07-03T05:16:59Z

Why? It would work. Having sentences as display-only fragments is not nearly as horrible as trying to fragment things so that they fit on the screen (which would be way more work, and an ugly hack to boot).

ghost · 2012-07-03T05:20:09Z

@amadanmath: The problem is with the format IMHO, the whole idea of the "comment" portion of it falls apart. I know it was well-intended but it shouldn't have been there.

amadanmath · 2012-07-03T05:21:57Z

Ah. Indeed. Also, that's so not a comment, if it merely duplicates the text segment (and enforces the identity!)

spyysalo · 2012-07-03T05:24:26Z

With discontinuous annotations, the newline character does not need to be part of the span. I thought that was the trick.

Yeah, it's not pretty, but it would allow us to resolve this issue.

ghost · 2012-07-03T05:25:26Z

@spyysalo: It isn't as ugly as it could have been. But lesson learnt, formats should store the information, sanity should go elsewhere. Supporting go-ahead with the "hack".

amadanmath · 2012-07-03T05:58:17Z

Indeed, if it's "real" discontinuous annotation, the newline wouldn't be a part of the span. However, then that's none of my business - rather, you need to do server-side pre-processing to generate the annotation file with discontinuities, and the file should render correctly without any intervention into clientside code.

What I meant is, split a span by sentences into fragment "on the fly" before rendering, in which case no pre-processing would be required. But in this case, Pontus's complaint stands.

spyysalo · 2012-07-03T06:06:13Z

OK, gotcha. Longer-term, we'll probably need to adjust the storage format to allow also newlines (and probably tabs, while we're at it). Standard C-style escaping would do the job.

For now, splitting into discontinuous spans server-side should do. Shall we add an option for whether to allow this?

spyysalo · 2012-07-03T06:10:19Z

Opened #819 for the longer-term solution.

akhondi · 2012-07-03T07:10:42Z

With all respect i totally disagree. We are annotating chemicals over OCR patents. Annotating across sentences is really important in our case.

spyysalo · 2012-07-03T07:20:46Z

@akhondi : thanks for the input -- could you please clarify which part you disagree with?

ghost · 2012-07-05T04:53:45Z

@spyysalo: I think it is the cross-sentence part.

@akhondi: What kind of annotations do you make to the patents? Is it something like entities, events or maybe section marking?

pflaquerre · 2012-10-02T15:37:00Z

Was any of this implemented in the end? I'm generating json annotation structures from text and some of the annotations often cross sentence boundaries. My use case is a language detection task, where I have one entity type per language, and large sections of text may be a single entity.

Util.embed doesn't like this at all, and collapses everything into a single, thin line (see below). I tried to use discontinuous annotations with the individual token indices as a workaround, but the results weren't visually pleasing. Right now the only solution seems to be to manually split across sentence boundaries.

ghost · 2012-10-03T03:21:39Z

@pflaquerre: No, this issue isn't resolved yet. I have just had a discussion with @spyysalo regarding how we resolve this, we have a resolution in the pipeline that will hopefully reach master very soon. I'll close the issue when this happens. We may have dragged our feet for a little bit too long on this one, the resolution may be simpler than we thought.

Assigning to @amadanmath, once you have removed the client blocking annotation across sentence breaks, assign to me for the back-end implementation. Although late, this one is going into v1.3.

amadanmath · 2012-10-03T04:20:20Z

Removed. There is a strange thing where the post-edit displays newline immediately after the span, and a reload kills the newline completely.

It works without modification on sentences that were introduced by senrtence splitter; I did not dare try to annotate a hard LF. :p

ghost · 2012-10-03T04:25:44Z

@amadanmath: Thanks, I'll harass you about the whole post-edit thing on the IM and get cracking with the back-end.

ghost · 2012-10-03T08:57:06Z

As of fdd275a you can annotate across newlines, both hard ones from your text and from the built-in sentence splitter. Good job team! Closing! File new issues if there are bugs, hopefully there are none.

amadanmath · 2012-10-04T04:05:41Z

Not bugs as such, but a concern: the sentence numbering will be different, and as a result sentence annotations and sentence links become unreliable.

spyysalo · 2012-10-04T04:26:36Z

@amadanmath: good point. Now that you mention it, this issue isn't entirely new, as e.g. switching the sentence splitter off, adding an annotation crossing a "soft" newline otherwise (i.e. from a tagger), or upgrading to a version fixing some sentence splitter errors would cause the same unreliability.

Perhaps we should anchor sentence annotations to offsets rather than index them by whatever the sentence splitting algorithm happens to do? Open a new issue?

amadanmath · 2012-10-04T05:07:02Z

Sure, go ahead. It might not be that hard to resolve, if you pack a sentence identifier together with sentence offsets array. (Offsets are not that good an idea since they're meaningless to the user.)

ghost assigned spyysalo May 28, 2012

ghost assigned amadanmath Oct 3, 2012

amadanmath pushed a commit that referenced this issue Oct 3, 2012

Issue #786 (commented out clientside restriction)

37cb4de

ghost self-assigned this Oct 3, 2012

spyysalo added a commit that referenced this issue Oct 3, 2012

Include space in ref text for discontinuous annotations (#786).

ae632da

spyysalo mentioned this issue Oct 3, 2012

issues with sentence breaking #886

Closed

ghost closed this as completed Oct 3, 2012

spyysalo mentioned this issue Oct 4, 2012

Stable references to sentences #893

Open

spyysalo mentioned this issue Nov 14, 2012

Prohibit selection across multiple 'sentences' #966

Open

spyysalo mentioned this issue Jan 15, 2013

Add "none" as an option for sentence splitting #954

Open

This was referenced May 22, 2016

Fixed annotations spanning newlines; added support for annotations spanning tabs #1174

Closed

Tab annotations #1175

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot annotate across a sentence break #786

cannot annotate across a sentence break #786

akhondi commented May 25, 2012

spyysalo commented May 28, 2012

ghost commented Jun 28, 2012

spyysalo commented Jun 29, 2012

ghost commented Jul 3, 2012

amadanmath commented Jul 3, 2012

ghost commented Jul 3, 2012

amadanmath commented Jul 3, 2012

spyysalo commented Jul 3, 2012

ghost commented Jul 3, 2012

amadanmath commented Jul 3, 2012

spyysalo commented Jul 3, 2012

spyysalo commented Jul 3, 2012

akhondi commented Jul 3, 2012

spyysalo commented Jul 3, 2012

ghost commented Jul 5, 2012

pflaquerre commented Oct 2, 2012

ghost commented Oct 3, 2012

amadanmath commented Oct 3, 2012

ghost commented Oct 3, 2012

ghost commented Oct 3, 2012

amadanmath commented Oct 4, 2012

spyysalo commented Oct 4, 2012

amadanmath commented Oct 4, 2012

cannot annotate across a sentence break #786

cannot annotate across a sentence break #786

Comments

akhondi commented May 25, 2012

spyysalo commented May 28, 2012

ghost commented Jun 28, 2012

spyysalo commented Jun 29, 2012

ghost commented Jul 3, 2012

amadanmath commented Jul 3, 2012

ghost commented Jul 3, 2012

amadanmath commented Jul 3, 2012

spyysalo commented Jul 3, 2012

ghost commented Jul 3, 2012

amadanmath commented Jul 3, 2012

spyysalo commented Jul 3, 2012

spyysalo commented Jul 3, 2012

akhondi commented Jul 3, 2012

spyysalo commented Jul 3, 2012

ghost commented Jul 5, 2012

pflaquerre commented Oct 2, 2012

ghost commented Oct 3, 2012

amadanmath commented Oct 3, 2012

ghost commented Oct 3, 2012

ghost commented Oct 3, 2012

amadanmath commented Oct 4, 2012

spyysalo commented Oct 4, 2012

amadanmath commented Oct 4, 2012