# INCEpTION - Annotations as one sentence and label per line

In this example, we annotate a text at the sentence level using for sentiment.

To get you quickly started with this example of converting an annotated text, we have prepared a simple annotated text file (UIMA CAS XMI format) and an annotation schema definition file (UIMA typesystem). 

After the conversion example, you will also find instructions on how to set up an INCEpTION project for annotating texts at the sentence level.

## Prepare example annotation

Run the two cells below to create the `sentiment.xmi` and `TypeSystem.xml` files that we will then read using DKPro Cassis and convert to the desired output format.

In [None]:
%%writefile sentiment.xmi
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmi:version="2.0"
         xmlns:xmi="http://www.omg.org/XMI" 
         xmlns:cas="http:///uima/cas.ecore" 
         xmlns:custom="http:///webanno/custom.ecore">
    <cas:NULL xmi:id="0"/>
    <custom:Sentiment xmi:id="668" sofa="1" begin="0" end="47" polarity="positive"/>
    <custom:Sentiment xmi:id="673" sofa="1" begin="48" end="103" polarity="neutral"/>
    <custom:Sentiment xmi:id="678" sofa="1" begin="104" end="160" polarity="negative"/>
    <custom:Sentiment xmi:id="693" sofa="1" begin="161" end="190" polarity="negative"/>
    <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" 
      sofaString="Coronet has the best lines of all day cruisers.&#10;Bertram has a deep V hull and runs easily through seas.&#10;Pastel-colored 1980s day cruisers from Florida are ugly.&#10;I dislike old cabin cruisers."/>
    <cas:View sofa="1" members="668 673 678 693"/>
</xmi:XMI>

In [None]:
%%writefile TypeSystem.xml
<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
  <types>
    <typeDescription>
        <name>webanno.custom.Sentiment</name>
        <description/>
        <supertypeName>uima.tcas.Annotation</supertypeName>
        <features>
          <featureDescription>
          <name>polarity</name>
          <description/>
          <rangeTypeName>uima.cas.String</rangeTypeName>
        </featureDescription>
      </features>
    </typeDescription>
  </types>    
</typeSystemDescription>


# Convert the Annotations

Now, we finally convert the exported annotations to the format we want. Each line should represent an annotated sentence. We start the line with the polarity and separated by a tab follows the sentence text.

In [None]:
!pip install dkpro-cassis > /dev/null

from cassis import *

with open('TypeSystem.xml', 'rb') as f:
  typesystem = load_typesystem(f)

with open('sentiment.xmi', 'rb') as f:
  doc = load_cas_from_xmi(f, typesystem=typesystem)

# Since Sentiment is a sentence-level annotation in INCEpTION, we get
# one annotation per sentence. So we can simply iterate over the 
# Sentiment annotations and write its polarity and the covered text
# to the output file
with open('sentiment-sentence-per-line.txt', 'w') as f:
  for sentiment in doc.select('webanno.custom.Sentiment'):
    f.write(f"{sentiment.polarity}\t{sentiment.get_covered_text()}\n")

# Now let's just load the file and check whether everything was written
# correctly
with open('sentiment-sentence-per-line.txt', 'r') as f:
  print(f.read())

positive	Coronet has the best lines of all day cruisers.
neutral	Bertram has a deep V hull and runs easily through seas.
negative	Pastel-colored 1980s day cruisers from Florida are ugly.
negative	I dislike old cabin cruisers.



# Prepare the annotations in INCEpTION

If you are only interested in how to convert your annotation data from INCEpTION's XMI format to another format using DKPro Cassis, then you do not need to perform the steps explained here. You can just skim this section to get an idea of the annotation schema used in this example. We will provide a suitable sample annotated text and annotation schema further down.

## Import Document

---
Copy/paste the text below into a file called `sentiment.txt` (*Source: https://en.wikipedia.org/wiki/Sentiment_analysis*)
```
Coronet has the best lines of all day cruisers.
Bertram has a deep V hull and runs easily through seas.
Pastel-colored 1980s day cruisers from Florida are ugly.
I dislike old cabin cruisers.
```
---

Go to the **Documents** tab on the project **Settings** page.

* Import the file `sentiment.txt` as using the format **Plain text (one sentence per line)**


## Configure Annotation Scheme

Go to the **Layers** tab on the project **Settings** page.

* Press the **Create** button in the **Layers** list to create a new layer
* In the **Layer Details** panel
  * Set the **Name** to `Sentiment`
  * Set the **Type** to `Span`
  * Set the **Granularity** to `Sentence-level`
  * Press the **Save** button
* Press the **Create** button in the **Features** list to create a new feature
* In the **Feature Details** panel
  * Set the **Name** to `polarity`
  * Set the **Type** to `Primitive: String`
  * Press the **Save** button

## Create Annotations

Go to the **Annotation** page.

* Open the `sentiment.txt` document.
* Select **Sentiment** from the **Layer** dropdown box on the right.
* **Double-click on a word** in the first line and enter a **polarity** into the text box appearing on the right. Press **Enter** to save the value.
* Repeat for the other lines.

## Export Annotations

In this example, we only export the `sentiment.txt` document with the annotations of the current user.

* Click on the **Export** icon in the action bar above the annotation editor area (symbolized by a page with an arrow pointing downwards).
* Select **UIMA CAS XMI (XML 1.0)** in the **Format** dropdown.
* Press the **Export** button.

This will download a ZIP file containing two files:

* `sentiment.xmi` - the annotated text
* `TypeSystem.xml` - the annotation schema

For your convenience, we provide reduced version of the the `sentiment.xmi` and the `TypeSystem.xml` containing only those annotations and types necessary for our little example. The files you get when you export from INCEpTION are considerably larger, containing e.g. also Token and Sentence annotations and many more annotation types in the annotation schema, so they are not easy to look at with the naked eye. But we will use DKPro Cassis to load the files, and it will work exactly the same, independent of whether you use the files you get from INCEpTION or the reduced versions at the beginning of this notebook.