tokenise raw sentence input #26

jonorthwash · 2017-05-02T18:25:38Z

Take a bunch of sentences and import as multiple sentences.

arademaker · 2017-05-25T17:36:33Z

It would need a backend for run the parser, right? Isn't it out of the scope of this tool?

jonorthwash · 2017-05-25T17:49:53Z

No, it's not outside the scope, and doesn't need a backend. The sentence tokeniser would default to splitting on [.!?] (or similar) and would provide a simple interface for adjusting the boundaries. The interface could be something like the input text with highlighted characters to split on that allows for selecting/unselecting them, or even better, allows a "combine with next tokenised sentence" option and "split sentence here" option in the regular interface.

arademaker · 2017-05-25T17:52:08Z

ok, you are proposing a bootstrapping process. From a string, produce a list of tokens and let the user create the tree.

jonorthwash · 2017-05-25T18:05:47Z

Exactly. Several of our other issues here are also for bootstrapping tree-creation, I suppose.

maryszmary · 2017-06-24T11:07:44Z

Hm, so, this issue means something like "add an option to split several sentences" to what is done in #1?

jonorthwash · 2017-06-24T11:46:31Z

Yeah, more or less. Though for now supporting sentences one-per-line should probably be enough. @ftyers , what do you think?

maryszmary · 2017-06-24T16:14:57Z

What is the input format? A file, or through the textbox, or both? Then if the input is in the textbox, how should it store the tokenized sentences – like loaded from a file: it should store several sentences and let the user to move between them using previous sentence / next sentence buttons? Should they be automatically converted to CONLL-U or should they remain in plain text format before the user clicks the "convert" button?

jonorthwash · 2017-06-24T17:17:18Z

What is the input format? A file, or through the textbox, or both?

Probably both.

Then if the input is in the textbox, how should it store the tokenized sentences – like loaded from a file: it should store several sentences and let the user to move between them using previous sentence / next sentence buttons?

Yeah, I think that makes the most sense.

Should they be automatically converted to CONLL-U or should they remain in plain text format before the user clicks the "convert" button?

I think at any point that text is imported, it should be converted to the underlying format, which as discussed previously should probably be conllu. So "import" for plain text implies "convert to conllu" for me. In this case "convert to conllu" probably isn't quite the right approach for single sentences either.
I'm not committed to this idea though—we can discuss other options too. What do you think?

maryszmary · 2017-06-24T18:01:17Z

I think at any point that text is imported, it should be converted to the underlying format, which as discussed previously should probably be conllu. So "import" for plain text implies "convert to conllu" for me.

Ok, so, from your point of view, plain text should be converted to conllu instantly (and then be viewed as conllu)?

jonorthwash · 2017-06-24T18:34:56Z

Not necessarily "instantly"—we should wait for the user to finish typing / pasting / whatever. And it doesn't have to be displayed as conllu right away.

I guess I kind of imagine tabs across the top of the textbox (like with github's comment box:). The default tab is something like "automatic" and as soon as it detects a format of input, it switches to the tab for that format (seemlessly—i.e., it just changes which is highlighted). So in this case, it would switch to the "plain text" tab, but then you could click the conllu or CG tab to view it or edit it in those formats. Further modifications to the contents of the plain text tab would only add/remove/edit tokens, and not change any of the dependencies or POS info (etc.) visible in the other formats.

maryszmary · 2017-06-25T19:34:19Z

So, now the interface allows to import plain text from file and input it in textbox, and then convert it to conllu all the sentences at once. I tested it on the text "This is a sample plain text! Why is it here? It exists for testing how annotatrix works. It Works!".

This is how it looks when the text is loaded:

Then I press the button Convert to CoNLL-U:

I can also insert it in the textbox:

And then press Convert to CoNLL-U:

maryszmary · 2017-06-25T19:38:03Z

I guess I kind of imagine tabs across the top of the textbox
...
So in this case, it would switch to the "plain text" tab, but then you could click the conllu or CG tab to view it or edit it in those formats.

This functionality is probably a part of #40.

jonorthwash · 2017-06-26T05:38:12Z

Yes, though the "auto" mode is definitely related to this issue. It can certainly be dealt with as part of #40 though.

maryszmary · 2017-06-27T18:16:48Z

Ok, so, can this issue be closed then, or is there some other functionality to add, which is not described in #40?

jonorthwash · 2017-06-28T06:42:15Z

Yeah, I think this issue is done.

ftyers added this to the Phase 1 milestone Jun 2, 2017

jonorthwash mentioned this issue Jun 26, 2017

Ability to switch between presentation formats #40

Closed

jonorthwash closed this as completed Jun 28, 2017

maryszmary mentioned this issue Aug 26, 2017

turn "view sentence as X" buttons into tabs #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenise raw sentence input #26

tokenise raw sentence input #26

jonorthwash commented May 2, 2017

arademaker commented May 25, 2017

jonorthwash commented May 25, 2017 •

edited

arademaker commented May 25, 2017

jonorthwash commented May 25, 2017

maryszmary commented Jun 24, 2017

jonorthwash commented Jun 24, 2017

maryszmary commented Jun 24, 2017

jonorthwash commented Jun 24, 2017 •

edited

maryszmary commented Jun 24, 2017

jonorthwash commented Jun 24, 2017 •

edited

maryszmary commented Jun 25, 2017

maryszmary commented Jun 25, 2017

jonorthwash commented Jun 26, 2017

maryszmary commented Jun 27, 2017

jonorthwash commented Jun 28, 2017

tokenise raw sentence input #26

tokenise raw sentence input #26

Comments

jonorthwash commented May 2, 2017

arademaker commented May 25, 2017

jonorthwash commented May 25, 2017 • edited

arademaker commented May 25, 2017

jonorthwash commented May 25, 2017

maryszmary commented Jun 24, 2017

jonorthwash commented Jun 24, 2017

maryszmary commented Jun 24, 2017

jonorthwash commented Jun 24, 2017 • edited

maryszmary commented Jun 24, 2017

jonorthwash commented Jun 24, 2017 • edited

maryszmary commented Jun 25, 2017

maryszmary commented Jun 25, 2017

jonorthwash commented Jun 26, 2017

maryszmary commented Jun 27, 2017

jonorthwash commented Jun 28, 2017

jonorthwash commented May 25, 2017 •

edited

jonorthwash commented Jun 24, 2017 •

edited

jonorthwash commented Jun 24, 2017 •

edited