Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some spans are missing begin offset field #1496

Closed
elizlee opened this issue Oct 31, 2019 · 2 comments
Closed

Some spans are missing begin offset field #1496

elizlee opened this issue Oct 31, 2019 · 2 comments
Assignees
Labels
🐛Bug Something isn't working
Projects
Milestone

Comments

@elizlee
Copy link

elizlee commented Oct 31, 2019

Describe the bug
When we process annotated documents exported from Inception as JSON files, we use the begin-end offsets of each span to identify words and phrases. In some annotated documents, one of the spans will only list the end offset, which causes an error when we attempt to process such documents. We suspect that the issue might come from annotating spans that begin at the first index of the document.

Snippet
In this example snippet from exported JSON, span 5368 lacks a begin field:

    {
        "12": {
            "_type": "Sofa",
            "sofaNum": 1,
            "sofaID": "_InitialView",
            "mimeType": "text",
            "sofaString": "__DOCUMENT_TEXT_REDACTED_FOR_IP_REASONS__"
        },
        "5363": {
            "_type": "CTEventSpan",
            "sofa": 12,
            "begin": 31,
            "end": 38,
            "negative_example": false
        },
        "5368": {
            "_type": "CTEventSpan",
            "sofa": 12,
            "end": 10,
            "negative_example": false
        }
    }

To Reproduce
Steps to reproduce the behavior:

  1. Create a new project
  2. In that project, open a document
  3. Create a span by highlighting one or more words starting from the very beginning of the document
  4. Click "Export" and export the document in UIMA CAS JSON format
  5. Navigate to the downloaded .zip file - unzip and open the .json file
  6. Below the document text, you should see a span object that contains the field end, but not begin.
  7. You can further investigate this by repeating steps 3-6, except by highlighting words in the middle of the document and checking that those spans, in contrast, contain a begin field.

Speculation as to cause
Because this only happens when end offsets are small, it might be the case that this is triggered when begin would have a value of zero. Perhaps something like @JsonInclude.NON_DEFAULT is being used? If you can point us to the relevant code, we are happy to look into this.

  • Version and build ID: 2019-05-30 12:43:10, build f086cc6
  • OS: macOS
  • Browser: Chrome
@reckart
Copy link
Member

reckart commented Nov 1, 2019

Hm, ok. We use the JsonWriter from DKPro Core and indeed, it is by default configured to omit default values.

It arrives in INCEpTION via a module we re-use from WebAnno: JsonFormatSupport:

https://github.com/webanno/webanno/blob/5ed87ce971b2a15f818bb685a70223cc2d503d7b/webanno-io-json/src/main/java/de/tudarmstadt/ukp/clarin/webanno/json/JsonFormatSupport.java#L56

So, you could add PARAM_OMIT_DEFAULT_VALUES, false there in the call to createEngineDescription.

However, I wonder if it wouldn't make sense to report the issue upstream to the UIMA issue tracker to suggest always including begin/end offsets even while other 0 values might still be omitted.

@reckart
Copy link
Member

reckart commented Nov 1, 2019

We could also consider changing the default for PARAM_OMIT_DEFAULT_VALUES in DKPro Core.

@reckart reckart added the 🐛Bug Something isn't working label Nov 22, 2019
@reckart reckart added this to the Bug backlog milestone Nov 22, 2019
@reckart reckart added 🐾 Good first issue Good for newcomers Requires upstream changes and removed 🐾 Good first issue Good for newcomers labels Mar 10, 2020
@reckart reckart self-assigned this May 30, 2023
@reckart reckart modified the milestones: 🦟 Bug backlog, 29.0 May 30, 2023
@reckart reckart added this to 🔖 To do in Kanban via automation May 30, 2023
@reckart reckart modified the milestones: 29.0, 28.2 May 30, 2023
reckart added a commit that referenced this issue May 30, 2023
- Disable omission of default values for legacy UIMA JSON format
- Add option to switch to previous behavior of omitting default values
- Updated documentation
reckart added a commit that referenced this issue May 30, 2023
reckart added a commit that referenced this issue May 30, 2023
reckart added a commit that referenced this issue May 30, 2023
…s-are-missing-begin-offset-field

#1496 - Some spans are missing begin offset field
@reckart reckart closed this as completed May 30, 2023
Kanban automation moved this from 🔖 To do to 🍹 Done May 30, 2023
reckart added a commit that referenced this issue May 30, 2023
* release/28.x:
  #1496 - Some spans are missing begin offset field
  #1511 - External recommender fails when CAS contains control characters
  #1496 - Some spans are missing begin offset field
reckart added a commit that referenced this issue Jun 13, 2023
* main: (189 commits)
  No issue. Minor additions to BioC format description
  #4062 - ViewportTracker should focus on block-like elements
  #4032 - Allow using externalized strings from backend code
  #4060 - Clean up redundant code in annotation handlers
  #4026: Support for error tracking with Sentry
  #3673 - Update dependencies
  update dead link to the new file
  #4055 - Editor scrolls up when left sidebar is opened/closed
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release inception-28.2
  #4052 - Admins no longer see all projects in the project overview
  #3673 - Update dependencies
  #4048 - Document navigation options not visible to manager when viewing other users document
  #3673 - Update dependencies
  #3673 - Update dependencies
  #1496 - Some spans are missing begin offset field
  #1511 - External recommender fails when CAS contains control characters
  #1496 - Some spans are missing begin offset field
  #4040 - Ability to store preferences from client-side code
  #1066 - Recommender status info
  ...

% Conflicts:
%	inception/inception-api-annotation/src/main/java/de/tudarmstadt/ukp/clarin/webanno/api/annotation/page/AnnotationPageBase.java
%	inception/inception-brat-editor/src/main/java/de/tudarmstadt/ukp/clarin/webanno/brat/annotation/BratAnnotationEditor.java
%	inception/inception-diam/src/main/java/de/tudarmstadt/ukp/inception/diam/service/DiamWebsocketController.java
%	inception/inception-documents/src/test/java/de/tudarmstadt/ukp/inception/documents/DocumentServiceImplConcurrencyTest.java
%	inception/inception-external-search-solr/pom.xml
%	inception/inception-html-editor/src/main/java/de/tudarmstadt/ukp/inception/htmleditor/docview/HtmlDocumentViewControllerImpl.java
%	inception/inception-html-editor/src/main/resources/META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports
%	inception/inception-preferences/src/main/java/de/tudarmstadt/ukp/inception/preferences/config/PreferencesServiceAutoConfig.java
%	inception/inception-recommendation/src/main/java/de/tudarmstadt/ukp/inception/recommendation/service/LearningRecordServiceImpl.java
%	inception/inception-recommendation/src/main/java/de/tudarmstadt/ukp/inception/recommendation/service/RecommendationServiceImpl.java
%	inception/inception-recommendation/src/main/java/de/tudarmstadt/ukp/inception/recommendation/tasks/TrainingTask.java
%	inception/inception-recommendation/src/test/java/de/tudarmstadt/ukp/inception/recommendation/footer/RecommendationEventWebsocketControllerImplTest.java
%	inception/inception-support/pom.xml
%	inception/inception-ui-core/pom.xml
%	inception/inception-websocket/pom.xml
%	inception/pom.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛Bug Something isn't working
Projects
Kanban
  
🍹 Done
Development

No branches or pull requests

2 participants