Sometimes an input string can have for example multiple spaces, or multiple sentences, or other characters that are being removed by the parser, and sometimes the parser returns a forest (list of trees, each representing a sentence).
I want to be able to correlate exactly from an input segment (start/end character), to the corresponding token in the output forest. How can I easily do it? I mean, the parser knows what characters were removed/trimmed, so it would be useful to add an API parameter (for the CoreNLP server), that instead of just returning the tokens, and their POS, also the start/end character in the corresponding input sentence.