# Best practices to work with your data

## Filter for documents that belong to one category

In Project 46 you want to get all documents that belong to category with ID 63.

In [17]:
from konfuzio_sdk.data import Project

prj = Project(id_=46)
category = prj.get_category_by_id(63)

category.documents()

[Gehalt.pdf: 44823,
 Festlohn.pdf: 44834,
 vermÃ¶genswirksame Leistungen.pdf: 44839,
 betriebliche Altersvorsorge AG finanziert.pdf: 44840,
 Weihnachtsgeld.pdf: 44841,
 Stundenlohn.pdf: 44842,
 Fahrtkostenzuschuss pauschal versteuert.pdf: 44843,
 Betirebliche Altersvorsorge Mischfinanzierung.pdf: 44845,
 Darlehen.pdf: 44846,
 Dienstwagen mit Gehaltsverzicht.pdf: 44847,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_1.pdf: 44848,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_2.pdf: 44850,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_4.pdf: 44851,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_3.pdf: 44852,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_5.pdf: 44853,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_6.pdf: 44854,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_7.pdf: 44855,
 Auswertungspaket - unterschiedliche B_N-Auswertungen.pdf_8.pdf: 44856,
 Auswertungspaket - unterschiedliche B_N-Auswertunge

## Edit an Annotation that is online

In [18]:
doc = prj.get_document_by_id(44864)
annotations = doc.annotations(start_offset=10, end_offset=200)
annotations

[Austellungsdatum (159, 169)]

Let's look into the first Annotation

In [19]:
first_annotation = annotations[0]
first_annotation.__dict__

{'id_local': 6789,
 'is_correct': True,
 'revised': True,
 'normalized': '2017-12-21',
 'translated_string': None,
 'document': 2022-02-13 16:19:30.684745: 44864,
 '_spans': [Span (159, 169)],
 'id_': 672245,
 'confidence': 1.0,
 'label': Austellungsdatum,
 'label_set': Lohnabrechnung (63),
 'annotation_set': AnnotationSet(78791) of Lohnabrechnung (63) in 2022-02-13 16:19:30.684745: 44864.,
 'selection_bbox': {'bottom': 44.129,
  'page_index': 0,
  'top': 35.129,
  'x0': 468.48,
  'x1': 526.8,
  'y0': 797.551,
  'y1': 806.551},
 'page_number': None,
 'top': 35.129,
 'x0': 468.48,
 'x1': 526.8,
 'y0': 797.551,
 'y1': 806.551,
 'bottom': 44.129,
 'bboxes': [{'bottom': 44.129,
   'end_offset': 169,
   'line_number': 2,
   'offset_string': '21.12.2017',
   'offset_string_original': '21.12.2017',
   'page_index': 0,
   'start_offset': 159,
   'top': 35.129,
   'x0': 468.48,
   'x1': 526.8,
   'y0': 797.551,
   'y1': 806.551}],
 '_tokens': [],
 '_regex': None}

We want to change the revised status to False.

In [20]:
first_annotation.revised = False

Now we have it locally, but not online. So save it to save it online.

In [21]:
first_annotation.save()

False

In [22]:
first_annotation.__dict__

{'id_local': 6789,
 'is_correct': True,
 'revised': False,
 'normalized': '2017-12-21',
 'translated_string': None,
 'document': 2022-02-13 16:19:30.684745: 44864,
 '_spans': [Span (159, 169)],
 'id_': 672245,
 'confidence': 1.0,
 'label': Austellungsdatum,
 'label_set': Lohnabrechnung (63),
 'annotation_set': AnnotationSet(78791) of Lohnabrechnung (63) in 2022-02-13 16:19:30.684745: 44864.,
 'selection_bbox': {'bottom': 44.129,
  'page_index': 0,
  'top': 35.129,
  'x0': 468.48,
  'x1': 526.8,
  'y0': 797.551,
  'y1': 806.551},
 'page_number': None,
 'top': 35.129,
 'x0': 468.48,
 'x1': 526.8,
 'y0': 797.551,
 'y1': 806.551,
 'bottom': 44.129,
 'bboxes': [{'bottom': 44.129,
   'end_offset': 169,
   'line_number': 2,
   'offset_string': '21.12.2017',
   'offset_string_original': '21.12.2017',
   'page_index': 0,
   'start_offset': 159,
   'top': 35.129,
   'x0': 468.48,
   'x1': 526.8,
   'y0': 797.551,
   'y1': 806.551}],
 '_tokens': [],
 '_regex': None}

## More Details of Annotations

Keep the information of Text in a Document.

Todo: the endpoint test_get_project_labels does no longer include the document annotation_sets, as the relation of
a label and a annotation_set can be configured by a user while labeling. We might ne to model the relation of many
Annotations to one AnnotationSet in a more explicit way.

Example document: "I earn 15 Euro per hour."

Assume the word "15" should be labeled. The project contains the labels "Amount" and "Tax".

# CREATE

Annotations can be created by:

- Human: Who is using the web interface
- Import: A human user imports extractions and uses "Copy extractions to annotations" admin action
- Training: Using the konfuzio package you create an annotation online, via an Bot user
- Text FB: Text Feedback - External API user, sends new extraction without ID, which contains only the offset string
- Extraction: Internal Process after we receive a new document from an External API user
- Extraction FB: External Feedback - External API user, sends feedback to existing extraction incl. ID

ID column: relates to the Annotation instance created in the database
is_revised: A human revisor had a look at this annotation
correct: Human claims that this annotation should be extracted in future documents

The KONFUZIO package will use annotations which are revised or (no XOR) correct.

| ID | Creator       | is_revised  | correct       | User      | Label   | Action  |
|:---|:--------------|:------------|:------------- |:----------|:--------|:--------|
| 1  | Human         | False       | True          | Human     | Amount  | ALLOWED |
| 2  | Import        | False       | False         | None      | Amount  | ALLOWED | Extraction.created_by_import
| 3  | Training      | False       | False         | Bot       | Amount  | ALLOWED |
| 4  | Extraction    | False       | False         | External  | Amount  | ALLOWED | one annotation per extraction
| X  | Text FB       | -----       | -----         | ---       | Amount  | see 2   | only create extraction

# REVISE

Annotations, as they heave been created, can be revised by:

- Human: Who is using the web interface
- Revise Feedback: ?

## Positive Feedback will change

| ID | Revisor       | is_revised  | correct       | User      | Label   | Action  |
|:---|:--------------|:------------|:------------- |:----------|:--------|:--------|
| 1  | Human         | NA          | NA            | NA        | Amount  | HIDDEN  |
| 2  | Human         | True        | True          | Human     | Amount  | ALLOWED |
| 3  | Human         | True        | True          | Bot       | Amount  | ALLOWED | -> ? does PUT update User
| 4  | Human         | NA          | NA            | External  | Amount  | HIDDEN  |
| 1  | Extraction FB | True        | True          | Human     | Amount  | ALLOWED |
| 2  | Extraction FB | ----        | ----          | ----      | ----    | ----    | External user does not get ID
| 3  | Extraction FB | ----        | ----          | ----      | ----    | ----    | External user does not get ID
| 4  | Extraction FB | True        | True          | Bot       | Amount  | ALLOWED |

As positive feedback displays the annotation in the interface but stores them as correct examples, the
word "15" should NOT be labeled anew. This time the creator might choose between label "Amount" and "Tax".

| ID | Creator       | is_revised  | correct       | User      | Label   | Action  |
|:---|:--------------|:------------|:------------- |:----------|:--------|:--------|
| 5  | Human         | False       | True          | Human     | Amount  | DENIED  |
| 6  | Import        | True        | False         | None      | Amount  | DENIED  |
| 7  | Training      | False       | False         | Bot       | Amount  | DENIED  |
| 8  | Extraction FB | ?           | ?             | ?         | Amount  | DENIED  |
| 9  | Human         | False       | True          | Human     | Tax     | DENIED  |
| 10 | Import        | ----        | ----          | ----      | Tax     | DENIED  | External user does not get ID
| 11 | Training      | ----        | ----          | ----      | Tax     | DENIED  | External user does not get ID
| 12 | Extraction FB | ?           | ?             | ?         | Tax     | DENIED  |

## Negative Feedback will change

- The user clicks on delete button next to the annotation in the web interface.
- Incorrect or deleted annotations will no longer be displayed in the web interface.

| ID | Revisor       | is_revised  | correct       | User      | Label   | Action  |
|:---|:--------------|:------------|:------------- |:----------|:--------|:--------|
| 1  | Human         | DELETED     | DELETED       | DELETED   | Amount  | ALLOWED | delete revised=F, correct=T
| 2  | Human         | True        | False         | None      | Amount  | ALLOWED | Update three fields
| 3  | Human         | True        | False         | Bot       | Amount  | ALLOWED | Does update is_revised field
| 4  | Human         | ?           | ?             | ?         | Amount  | ALLOWED |
| 1  | Extraction FB | True        | False         | ?         | Amount  | ALLOWED |
| 2  | Extraction FB | ----        | ----          | ----      | Amount  | ALLOWED | External user does not get ID
| 3  | Extraction FB | ----        | ----          | ----      | Amount  | ALLOWED | External user does not get ID
| 4  | Extraction FB | True        | False         | External  | Amount  | ALLOWED |

As negative feedback removed any annotation from the web interface but stores them as incorrect examples, the
word "15" can be labeled anew. This time the creator might choose between label "Amount" and "Tax".

| ID | Creator       | is_revised  | correct       | User      | Label   | Action  |
|:---|:--------------|:------------|:------------- |:----------|:--------|:--------|
| 5  | Human         | False       | True          | Human     | Amount  | ?DENIED | -> in contrast to annotation 1
| 6  | Import        | ---         | ---           | ---       | ---     | DENIED  |
| 7  | Training      | ---         | ---           | ---       | ---     | DENIED  |
| 8  | Extraction FB | ?           | ?             | ?         | Amount  | NA      | Need to send new document
| 9  | Human         | False       | True          | Human     | Tax     | ALLOWED | now we have 2 annotations
| 10 | Import        | False       | False         | None      | Tax     | ALLOWED |
| 11 | Training      | False       | False         | Bot       | Tax     | ALLOWED |
| 12 | Extraction FB | ?           | ?             | ?         | Tax     | NA      | Need to send new document