In [37]:
! pip install git+https://git@github.com/meganno/labeler-client.git

Collecting git+https://****@github.com/meganno/labeler-client.git
  Cloning https://****@github.com/meganno/labeler-client.git to /private/var/folders/cq/3tgr3bys4tldx82sfkl8lyf40000gn/T/pip-req-build-q6_zv0nu
  Running command git clone -q 'https://****@github.com/meganno/labeler-client.git' /private/var/folders/cq/3tgr3bys4tldx82sfkl8lyf40000gn/T/pip-req-build-q6_zv0nu
  Resolved https://****@github.com/meganno/labeler-client.git to commit f3c6ee53ce0bcb39e76f0a73dc24b9e0531d2c28
Collecting labeler-ui@ git+https://github.com/meganno/labeler-ui.git
  Cloning https://github.com/meganno/labeler-ui.git to /private/var/folders/cq/3tgr3bys4tldx82sfkl8lyf40000gn/T/pip-install-hwi3e5mx/labeler-ui_2c1bb06ad0f24c44ab849984ea7f6a48
  Running command git clone -q https://github.com/meganno/labeler-ui.git /private/var/folders/cq/3tgr3bys4tldx82sfkl8lyf40000gn/T/pip-install-hwi3e5mx/labeler-ui_2c1bb06ad0f24c44ab849984ea7f6a48
  Resolved https://github.com/meganno/labeler-ui.git to commit 93ded5810





In [1]:
from labeler_ui import Authentication, Dashboard
from labeler_client import Service
import pandas as pd
import os

labeler-ui: 1.0.7
labeler-client: 1.0.3


# Connecting to the service
We provide a shared project to simuliate the real-world collaboration among data science practitioners. This project is pre-loaded with a small set of [tweets about US airlines](https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment). Required labeling schema and meta-data are also pre-loaded (shown as screen shots in later cells). Please do not change these setups.

To start, connect to our service using one of our demo users's token. Note token sharing is for the demostration purposes only, and might cause potential overwrite of other's annotation. In real projects, each user own their unique token.

**Run the cells below to connect to the demo project**

In [18]:
# getting demo token.
demo_user_id=5 # choose between 1-5
tokens = pd.read_csv('tokens.csv',index_col='id')
demo_token = tokens.loc[demo_user_id,'token'] 

In [19]:
# connecting to service
demo = Service(project='development',token=demo_token,
               host='http://labeler-demo-service-alb-1400746965.us-west-1.elb.amazonaws.com')

# View label schema
Here the schema is pre-set to collect one label named *sentiment* which corresponds to a classification task and a label named *sp* which correponds to an extraction task (selecting spans out of a document).

**Run the cell below to check the schema**

In [7]:
# view the pre-set demo schema
demo.get_schemas().value(active=True)

[{'active': True,
  'created_on': 1670461487394,
  'schemas': {'label_schema': [{'level': 'record',
     'name': 'sentiment',
     'options': [{'text': 'positive', 'value': 'pos'},
      {'text': 'negative', 'value': 'neg'},
      {'text': 'neutral', 'value': 'neu'}]},
    {'level': 'span_ch',
     'name': 'sp',
     'options': [{'text': 'positive', 'value': 'pos'},
      {'text': 'negative', 'value': 'neg'}]}]},
  'uuid': '362a4d4e-270e-47a8-b533-203636654cd7'}]

# Try annotating

Here we show the basic annoation widget.

**Run the cell below to bring up the widget to annotate the first 10 examples**
You can switch between the *single* and *table* view on the top-right corner. Switch back to the table view, check the top checkbox on the second column and click **submit** button on the top-left corner to send your annotations to the backend. 

You can also click the *Annotating* button with a dropdown on the top-right corner to switch between annotation and reconciling mode. In the reconciling mode, you could look over distributions of existing annotations from all annotators and resolve conflicts.

(Note some data might already have annotations if someone else sharing the same token already annotated the data point.)

In [22]:
# search results => subset s1
s1 = demo.search(keyword='', limit=10, start=0)
# bring up a widget 
s1.show({'view':'table'})

show(7f9207936820, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7f92078e5c10>)

LayoutWidget(Layout(show(7f9207936820, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7f92078e5c10…

# Show contexual data
There are cases when contextual information could help or speed up the annotatin process. Here we demonstrate with a pre-loaded "hashtag" metadata, which was generated using the code shown in this screenshot:
![generating the hashtag meta-data](Figures/hashtag.png)

In real projects, the metadata could be generated using user-defined functions, like in the screenshot.

**Run the cell below to see how hashtags are shown as auxiliary information**

In [20]:
s2= demo.search(keyword='delay',
                  limit=50,
                  start=0, 
                  meta_names=['hashtag'])
s2.show()
#hover over in the table view to see the hashtage for each data ponint.

show(7f92056f6340, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7f92078e5d00>)

LayoutWidget(Layout(show(7f92056f6340, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7f92078e5d00…

# Exploratory annotation: heuristic-based search

In [26]:
s3= demo.search(keyword='fail',
                  limit=50,
                  start=0)
s3.show({'view':'table'})

show(7f9207c96160, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7f920795aac0>)

LayoutWidget(Layout(show(7f9207c96160, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7f920795aac0…

# Exploratory annotation: similarity-based suggestions 
In addition to relying on user's heuristic, meganno also provide automated suggestions, based on all kinds of metadata. Here we give an example using sentence bert embeddings. The embeddings have been generated ahead of time using the script shown in the screenshot below:

![generating bert embedding](./Figures/bert.png)
Based on the embedding, suggest the most similar datapoints from the database.

In real projects, users can bring
*Try Below*

In [54]:
s4 = demo.search(keyword='delay', limit=3, start=0)
# try replacing the keyword here to test the similarity search 
s5 = s4.suggest_similar('bert-embedding', limit=4)# needs to provide a valid meta_name
s5.show()

show(7fd706ed9a00, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7fd7162753d0>)

LayoutWidget(Layout(show(7fd706ed9a00, self=<labeler_ui.widgets.Annotation.Annotation object at 0x7fd7162753d0…

# Analysis Dashboard to track project progress.
At any stage of the project, you can run the cell below to track the progress of all annnotators, look at class distibutions, etc

In [21]:
#Analysis
from labeler_ui import Dashboard
dash_wg = Dashboard(demo)
dash_wg.show()


show(7f9206098d60, self=<labeler_ui.widgets.Dashboard.Dashboard object at 0x7f9207c66a90>)

LayoutWidget(Layout(show(7f9206098d60, self=<labeler_ui.widgets.Dashboard.Dashboard object at 0x7f9207c66a90>)…

In [63]:
# exporting all annotations to csv if need to bring data out.
demo.export()

Unnamed: 0,data_id,content,annotator,label_name,label_value
0,567840103122509760,@USAirways UR service is so shitty. Pilot neve...,demo-1668825955,sentiment,[neg]
1,567840732305858560,@SouthwestAir Being old I will miss my connect...,demo-1668825955,sentiment,[neg]
2,567842015418978304,@united my flight was at 1 pm. Still at the ai...,demo-1668825955,sentiment,[neg]
3,567847571223388160,@SouthwestAir what's up with these delays?! Th...,demo-1668825955,sentiment,[neu]
4,567852704980017216,@united you guys suck. You delay flights but c...,demo-1668825955,sentiment,[neg]
5,567859601900740608,@USAirways flights keep getting delayed and Ca...,demo-1668825955,sentiment,[neg]
6,567860775752208384,"@USAirways car services to and from the hotel,...",demo-1668825955,sentiment,[neg]
7,567862181208805376,@united and that plane was staying overnight a...,demo-1668825955,sentiment,[neg]
8,567864395264573440,@USAirways My daughter is stranded in charlott...,demo-1668825955,sentiment,[neg]
