Skip to content

Commit

Permalink
Update/doccano 1.5.5 (#14)
Browse files Browse the repository at this point in the history
* Fixing Data Annotation Issues

When uploading datasets, the code uses a `bulk_create` to upload
Examples and Labels. It then filters the data from the database
based on when it was created. However, [Django doesn't enforce
the list order when calling filter](https://stackoverflow.com/questions/7163640/what-is-the-default-order-of-a-list-returned-from-a-django-filter-call)
unless ordering is specified. The previous behavior mismatched labels
and examples. When this was shown in the UI, the data would show labels
for incorrect examples (i.e. a label for message #2 would be shown on
message #1). This fix enforces that the data is returned in the order
it was inserted so that the data, label pair is as expected.

* move later to copy files in Dockerfile.prod

* fix client-side types about comment as backend returns

* add annotation link in commentList page

* Add admin interface for AutoLabelingConfigs. Solves doccano#1423

Thanks to @uklft for the idea.

* Sort imports

* Return a Response with a status if the task is not yet ready.

* Remove unneeded query

Bulk create returns the created objects in the same order as they have been added.

In Postgres, the query was wrong, because ordering was not guaranteed.

* Remove unneed import

* removing debugging statement

* iss1348: fix colors when importing labels

Signed-off-by: Dimid Duchovny <dimidd@localize.city>

* Updated various dependency and image versions

* Python version pinning fix

* update cloudformation template to modify the sample env file, now that
all the config params are stored in environment variables as per commit
5728636

* show a check button for annotators

* filter by role in the confirm API

* add a property to the ExampleState model

* separate confirm status for each role or user

* fix flake8

* fix TestExampleStateConfirmCollaborative

* fix isort

* move ExampleSerializer tests to test_document.py

* add tests

* Sequence labelling: fix background color in dark mode

* add confirmed count to statistics api response

* receive confirmed count value in frontend statistics models

* make progress data per role

* show progress of each role

* not display legend of bar-chart

* Increase the allowed max length for uploaded dataset filepath

* Bump django from 3.2.4 to 3.2.5

Bumps [django](https://github.com/django/django) from 3.2.4 to 3.2.5.
- [Release notes](https://github.com/django/django/releases)
- [Commits](django/django@3.2.4...3.2.5)

---
updated-dependencies:
- dependency-name: django
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add EntityEditor

* Fix flake8 warnings

* Update Dockerfiles

* Add v-annotator

* Update ner demo

* Update sequence labeling page

* Support RTL in sequence labeling

* Update index.md

* Update package

* Add fields to SequenceLabelingProject

* Update serializer in ProjectDetail

* Enable to handle allowOverlapping and graphemeMode option in sequence labeling page

* Enable to create project with allowOverlapping and graphemeMode option

* Remove unused import

* Update v-annotator to fix the problem

The problem occurred when the user changes the state of RTL.
Once the state changes, the entities are visually disappeared.

* Show shortcut key on menu

* Add explanation for nested mode

* Add explanation for grapheme mode

* Update shortcut on menu

* Update package version

* Enable to pass grapheme-mode to EntityEditor.vue

* Add explanation for project creation

* Support doccano init on windows

* Fix cli

* Add dependency, fix doccano#1481

* Update cli, fix doccano#1408

* Add explanation on create user, close doccano#1410

* Update faq, close doccano#1496

* Remove old tests

* Update test config

* Update components, fix doccano#1541

* Add test for FormGuideline component

* Update the name of test case

* Apply linter

* Update eslint config

* Update docker-compose.dev.yml, fix doccano#1536

* Change example id from auto field to uuid field

* Update import method of urls

* Add test cases for ingest classification data

* Move test data

* Rename classification.jsonl

* Fix CoNLLDataset

* Add test cases for ingesting sequence labeling data

* Refactor test_tasks.py

* Move test data

* Add test cases for ingesting seq2seq data

* Update test cases for ingesting data to check mapping

* Improve error handling for jsonl parser

* Improve error handling for json parser

* Improve error handling for excel parser

* Add csv test case

* Add conll test case

* Change doc/example id type from number to string

* Update order of examples

* Revert primary key change

* Add migration file

* Update task queue command to support windows

* Create FUNDING.yml

* Update README.md

* Update compose files, fix doccano#1546

* Update CsvWriter, fix doccano#1497

* Sort exported labels, fix doccano#1466

* Add keyboard shortcut back to accept button

* Add how to use PostgreSQL

* Assign label colors automatically

* Add a test case for generating color function

* Fix typo: injest -> ingest

* Add PostgreSQL related env in docker compose mode

* Update README.md

* Add a validator to the text field

* Enable to ingest lines without errors even if an exception occurs during parsing

* Fix TextLineDataset to raise exception

* Enable to delete relation if one of the entities are deleted

* Update Span model

* Add a migration

* Refactor CoNLLDataset

* Enable to return line number of exception occured

* Update Cleaner to change error the message by the project type

* Install mdi font

* Set icons locally

* Support offline font

* Remove font awesome script

* Add a demo image to show it in offline environment

* Fix speech to text demo

* Remove unused scripts

* Update publish-image.yml

* Enable to list all labels

* Fix unique constraint

* Add clean up after closing menu

* Update the way of clean up selected items

* Wrap by nexttick

* Update Dockerfile to change the default value of DEBUG, fix doccano#1457

* Update cleanup method

* Update unique constraint of Span

* Handle unique constraint exception

* Add try/catch to update/delete method

* Show number of deleting rows only in confirm dialog, resolve doccano#1077

* Speed up fetching comment

Co-authored-by: zanussbaum <zanussbaum@gmail.com>
Co-authored-by: youichiro <cinnamon416@gmail.com>
Co-authored-by: ayanamizuta <ayanamizuta832@gmail.com>
Co-authored-by: Roland Szabo <rolisz@gmail.com>
Co-authored-by: Dimid Duchovny <dimidd@localize.city>
Co-authored-by: rcarew@xelerance.com <rcarew@xelerance.com>
Co-authored-by: Dale Evans <dale.evans@mycanadapayday.com>
Co-authored-by: Colin Darie <colin@darie.eu>
Co-authored-by: Yosua Michael M <yosua.maranatha@grabtaxi.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Hironsan <light.tree.1.13@gmail.com>
Co-authored-by: Hiroki Nakayama <hiroki.nakayama.py@gmail.com>
Co-authored-by: Talha Oz <oztalha@users.noreply.github.com>
Co-authored-by: Fynn Schmitt-Ulms <fynnsu@outlook.com>
Co-authored-by: Zader Zheng <yumaoshu@gmail.com>
Co-authored-by: Gerhard Haß <gerhard.hass@neofonie.de>
  • Loading branch information
17 people committed Jan 25, 2022
1 parent cdc31e2 commit b490247
Show file tree
Hide file tree
Showing 92 changed files with 1,205 additions and 702 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/publish-image.yml
Expand Up @@ -5,10 +5,9 @@ on:
- cron: '0 10 * * *' # everyday at 10am
push:
branches:
- '**'
- master
tags:
- 'v*.*.*'
pull_request:

jobs:
docker:
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Expand Up @@ -57,7 +57,7 @@ RUN chown -R doccano:doccano .
VOLUME /data
ENV DATABASE_URL="sqlite:////data/doccano.db"

ENV DEBUG="True"
ENV DEBUG="False"
ENV SECRET_KEY="change-me-in-production"
ENV PORT="8000"
ENV WORKERS="2"
Expand Down
55 changes: 54 additions & 1 deletion README.md
Expand Up @@ -68,6 +68,49 @@ doccano task

Go to <http://127.0.0.1:8000/>.

By default, sqlite3 is used for the default database. If you want to use PostgreSQL, install the additional dependency:

```bash
pip install 'doccano[postgresql]'
```

Create an .env file with variables in the following format, each on a new line:

```bash
POSTGRES_USER=doccano
POSTGRES_PASSWORD=doccano
POSTGRES_DB=doccano
```

Then, pass it to docker run with the --env-file flag:

```bash
docker run --rm -d \
-p 5432:5432 \
-v postgres-data:/var/lib/postgresql/data \
--env-file .env \
postgres:13.3-alpine
```

And set `DATABASE_URL` environment variable:

```bash
# Please replace each variable.
DATABASE_URL=postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@localhost:5432/${POSTGRES_DB}?sslmode=disable
```

Now run the command as before:

```bash
doccano init
doccano createuser --username admin --password pass
doccano webserver --port 8000

# In another terminal.
# Don't forget to set DATABASE_URL
doccano task
```

### Docker

As a one-time setup, create a Docker container as follows:
Expand Down Expand Up @@ -107,12 +150,22 @@ _Note for Windows developers:_ Be sure to configure git to correctly handle line
git clone https://github.com/doccano/doccano.git --config core.autocrlf=input
```

Set the superuser account credentials in the `./config/env.example` file:
Then, create an `.env` file with variables in the following format(see [./config/.env.example](https://github.com/doccano/doccano/blob/master/config/.env.example)):

```plain
# platform settings
ADMIN_USERNAME=admin
ADMIN_PASSWORD=password
ADMIN_EMAIL=admin@example.com
# rabbit mq settings
RABBITMQ_DEFAULT_USER=doccano
RABBITMQ_DEFAULT_PASS=doccano
# database settings
POSTGRES_USER=doccano
POSTGRES_PASSWORD=doccano
POSTGRES_DB=doccano
```

#### Production
Expand Down
19 changes: 19 additions & 0 deletions backend/api/migrations/0018_alter_label_background_color.py
@@ -0,0 +1,19 @@
# Generated by Django 3.2.8 on 2021-11-17 05:56

import api.models
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('api', '0017_example_uuid'),
]

operations = [
migrations.AlterField(
model_name='label',
name='background_color',
field=models.CharField(default=api.models.generate_random_hex_color, max_length=7),
),
]
30 changes: 30 additions & 0 deletions backend/api/migrations/0019_auto_20211124_0506.py
@@ -0,0 +1,30 @@
# Generated by Django 3.2.8 on 2021-11-24 05:06

from django.db import migrations, models
import django.db.models.expressions


class Migration(migrations.Migration):

dependencies = [
('api', '0018_alter_label_background_color'),
]

operations = [
migrations.AlterUniqueTogether(
name='span',
unique_together=set(),
),
migrations.AddConstraint(
model_name='span',
constraint=models.CheckConstraint(check=models.Q(('start_offset__gte', 0)), name='startOffset >= 0'),
),
migrations.AddConstraint(
model_name='span',
constraint=models.CheckConstraint(check=models.Q(('end_offset__gte', 0)), name='endOffset >= 0'),
),
migrations.AddConstraint(
model_name='span',
constraint=models.CheckConstraint(check=models.Q(('start_offset__lt', django.db.models.expressions.F('end_offset'))), name='start < end'),
),
]
14 changes: 14 additions & 0 deletions backend/api/migrations/0020_merge_20211203_1558.py
@@ -0,0 +1,14 @@
# Generated by Django 3.2.9 on 2021-12-03 15:58

from django.db import migrations


class Migration(migrations.Migration):

dependencies = [
('api', '0018_merge_20211110_1607'),
('api', '0019_auto_20211124_0506'),
]

operations = [
]
45 changes: 34 additions & 11 deletions backend/api/models.py
@@ -1,3 +1,4 @@
import random
import string
import uuid
from typing import Literal
Expand Down Expand Up @@ -106,6 +107,10 @@ def is_task_of(self, task: Literal['text', 'image', 'speech']):
return task == 'image'


def generate_random_hex_color():
return f'#{random.randint(0, 0xFFFFFF):06x}'


class Label(models.Model):
text = models.CharField(max_length=100, db_index=True)
prefix_key = models.CharField(
Expand All @@ -131,7 +136,7 @@ class Label(models.Model):
on_delete=models.CASCADE,
related_name='labels'
)
background_color = models.CharField(max_length=7, default='#209cee')
background_color = models.CharField(max_length=7, default=generate_random_hex_color)
text_color = models.CharField(max_length=7, default='#ffffff')
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
updated_at = models.DateTimeField(auto_now=True)
Expand Down Expand Up @@ -289,18 +294,36 @@ class Span(Annotation):
start_offset = models.IntegerField()
end_offset = models.IntegerField()

def clean(self):
if self.start_offset >= self.end_offset:
raise ValidationError('start_offset > end_offset')
def validate_unique(self, exclude=None):
allow_overlapping = getattr(self.example.project, 'allow_overlapping', False)
is_collaborative = self.example.project.collaborative_annotation
if allow_overlapping:
super().validate_unique(exclude=exclude)
return

overlapping_span = Span.objects.exclude(id=self.id).filter(example=self.example).filter(
models.Q(start_offset__gte=self.start_offset, start_offset__lt=self.end_offset) |
models.Q(end_offset__gt=self.start_offset, end_offset__lte=self.end_offset) |
models.Q(start_offset__lte=self.start_offset, end_offset__gte=self.end_offset)
)
if is_collaborative:
if overlapping_span.exists():
raise ValidationError('This overlapping is not allowed in this project.')
else:
if overlapping_span.filter(user=self.user).exists():
raise ValidationError('This overlapping is not allowed in this project.')

def save(self, force_insert=False, force_update=False, using=None,
update_fields=None):
self.full_clean()
super().save(force_insert, force_update, using, update_fields)

class Meta:
unique_together = (
'example',
'user',
'label',
'start_offset',
'end_offset'
)
constraints = [
models.CheckConstraint(check=models.Q(start_offset__gte=0), name='startOffset >= 0'),
models.CheckConstraint(check=models.Q(end_offset__gte=0), name='endOffset >= 0'),
models.CheckConstraint(check=models.Q(start_offset__lt=models.F('end_offset')), name='start < end')
]


class EntitySpan(Annotation):
Expand Down
16 changes: 12 additions & 4 deletions backend/api/tasks.py
Expand Up @@ -10,9 +10,9 @@
from .models import Example, Label, Project, EntitySpan
from .views.download.factory import create_repository, create_writer
from .views.download.service import ExportApplicationService
from .views.upload.exception import FileParseException
from .views.upload.factory import (get_data_class, get_dataset_class,
get_label_class)
from .views.upload.exception import FileParseException, FileParseExceptions
from .views.upload.factory import (create_cleaner, get_data_class,
get_dataset_class, get_label_class)
from .views.upload.utils import append_field

logger = get_task_logger(__name__)
Expand Down Expand Up @@ -89,7 +89,7 @@ def create(self, examples, user, project):


@shared_task
def injest_data(user_id, project_id, filenames, format: str, **kwargs):
def ingest_data(user_id, project_id, filenames, format: str, **kwargs):
project = get_object_or_404(Project, pk=project_id)
user = get_object_or_404(get_user_model(), pk=user_id)
response = {'error': []}
Expand All @@ -110,6 +110,7 @@ def injest_data(user_id, project_id, filenames, format: str, **kwargs):
label_class=Label,
annotation_class=project.get_annotation_class()
)
cleaner = create_cleaner(project)
while True:
try:
example = next(it)
Expand All @@ -118,6 +119,13 @@ def injest_data(user_id, project_id, filenames, format: str, **kwargs):
except FileParseException as err:
response['error'].append(err.dict())
continue
except FileParseExceptions as err:
response['error'].extend(list(err))
continue
try:
example.clean(cleaner)
except FileParseException as err:
response['error'].append(err.dict())

buffer.add(example)
if buffer.is_full():
Expand Down
12 changes: 9 additions & 3 deletions backend/api/tests/api/test_annotation.py
@@ -1,7 +1,7 @@
from rest_framework import status
from rest_framework.reverse import reverse

from ...models import DOCUMENT_CLASSIFICATION, Category
from ...models import DOCUMENT_CLASSIFICATION, SEQUENCE_LABELING, Category
from .utils import (CRUDMixin, make_annotation, make_doc, make_label,
make_user, prepare_project)

Expand Down Expand Up @@ -79,11 +79,17 @@ def test_denies_unauthenticated_user_to_annotate(self):
class TestAnnotationDetail(CRUDMixin):

def setUp(self):
self.project = prepare_project(task=DOCUMENT_CLASSIFICATION)
self.project = prepare_project(task=SEQUENCE_LABELING)
self.non_member = make_user()
doc = make_doc(self.project.item)
label = make_label(self.project.item)
annotation = make_annotation(task=DOCUMENT_CLASSIFICATION, doc=doc, user=self.project.users[0])
annotation = make_annotation(
task=SEQUENCE_LABELING,
doc=doc,
user=self.project.users[0],
start_offset=0,
end_offset=1
)
self.data = {'label': label.id}
self.url = reverse(viewname='annotation_detail', args=[self.project.item.id, doc.id, annotation.id])

Expand Down
4 changes: 2 additions & 2 deletions backend/api/tests/api/test_comment.py
Expand Up @@ -50,7 +50,7 @@ def setUp(self):
def test_allows_project_member_to_list_comments(self):
for member in self.project.users:
response = self.assert_fetch(member, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
self.assertEqual(response.data['count'], 1)

def test_denies_non_project_member_to_list_comments(self):
self.assert_fetch(self.non_member, status.HTTP_403_FORBIDDEN)
Expand All @@ -70,7 +70,7 @@ def test_allows_project_member_to_delete_comments(self):
for member in self.project.users:
self.assert_bulk_delete(member, status.HTTP_204_NO_CONTENT)
response = self.client.get(self.url)
self.assertEqual(len(response.data), 0)
self.assertEqual(response.data['count'], 0)

def test_denies_non_project_member_to_delete_comments(self):
self.assert_fetch(self.non_member, status.HTTP_403_FORBIDDEN)
Expand Down
15 changes: 9 additions & 6 deletions backend/api/tests/api/utils.py
Expand Up @@ -49,7 +49,8 @@ def make_project(
task: str,
users: List[str],
roles: List[str] = None,
collaborative_annotation=False):
collaborative_annotation=False,
**kwargs):
create_default_roles()

# create users.
Expand All @@ -70,7 +71,8 @@ def make_project(
_model=project_model,
project_type=task,
users=users,
collaborative_annotation=collaborative_annotation
collaborative_annotation=collaborative_annotation,
**kwargs
)

# assign roles to the users.
Expand Down Expand Up @@ -111,18 +113,18 @@ def make_auto_labeling_config(project):
return mommy.make('AutoLabelingConfig', project=project)


def make_annotation(task, doc, user):
def make_annotation(task, doc, user, **kwargs):
annotation_model = {
DOCUMENT_CLASSIFICATION: 'Category',
SEQUENCE_LABELING: 'Span',
SEQ2SEQ: 'TextLabel',
SPEECH2TEXT: 'TextLabel',
ENTITY_RECOGNITION: 'EntitySpan'
}.get(task)
return mommy.make(annotation_model, example=doc, user=user)
return mommy.make(annotation_model, example=doc, user=user, **kwargs)


def prepare_project(task: str = 'Any', collaborative_annotation=False):
def prepare_project(task: str = 'Any', collaborative_annotation=False, **kwargs):
return make_project(
task=task,
users=['admin', 'approver', 'annotator'],
Expand All @@ -131,7 +133,8 @@ def prepare_project(task: str = 'Any', collaborative_annotation=False):
settings.ROLE_ANNOTATION_APPROVER,
settings.ROLE_ANNOTATOR,
],
collaborative_annotation=collaborative_annotation
collaborative_annotation=collaborative_annotation,
**kwargs
)


Expand Down
2 changes: 1 addition & 1 deletion backend/api/tests/data/seq2seq/example.csv
@@ -1,5 +1,5 @@
text,label
,label2
exampleA,label1
exampleB,
,label2
,
@@ -0,0 +1 @@
{"text": "exampleA", "label": [[0, 1, "LOC"], [0, 1, "LOC"]], "meta": {"wikiPageID": 1}}

0 comments on commit b490247

Please sign in to comment.