<a href="https://colab.research.google.com/github/michaelachmann/social-media-lab/blob/main/notebooks/2023_12_14_Create_LabelStudio_Text_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Export to LabelStudio (Text) [![DOI](https://zenodo.org/badge/660157642.svg)](https://zenodo.org/badge/latestdoi/660157642)
![Notes on (Computational) Social Media Research Banner](https://raw.githubusercontent.com/michaelachmann/social-media-lab/main/images/banner.png)

## Overview

This Jupyter notebook is a part of the social-media-lab.net project, which is a work-in-progress textbook on computational social media analysis. The notebook is intended for use in my classes.

The **Export to LabelStudio (Text)** Notebook provides an automated workflow to create annotation projects and to automatically import **text** documents.

### Project Information

- Project Website: [social-media-lab.net](https://social-media-lab.net/)
- GitHub Repository: [https://github.com/michaelachmann/social-media-lab](https://github.com/michaelachmann/social-media-lab)

## License Information

This notebook, along with all other notebooks in the project, is licensed under the following terms:

- License: [GNU General Public License version 3.0 (GPL-3.0)](https://www.gnu.org/licenses/gpl-3.0.de.html)
- License File: [LICENSE.md](https://github.com/michaelachmann/social-media-lab/blob/main/LICENSE.md)


## Citation

If you use or reference this notebook in your work, please cite it appropriately. Here is an example of the citation:

```
Michael Achmann. (2023). michaelachmann/social-media-lab: 2023-12-04 (v0.0.6). Zenodo. https://doi.org/10.5281/zenodo.8199901
```

In [1]:
!pip -q install label-studio-sdk

### Let's read the text master from the previous sessions

In [5]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/2023-12-01-Export-Posts-Text-Master.csv')

In my [video on GPT text classification](https://youtu.be/QcYGwC4QzW0) I mentioned the problem of the unique identifier, as we also need a unique identifier for the annotations. Use the code below in our text classification notebook when working with multidocument classifications!

In [7]:
df['identifier'] = df.apply(lambda x: f"{x['shortcode']}-{x['Text Type']}", axis=1)

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,shortcode,Text,Text Type,Policy Issues,identifier
0,0,CyMAe_tufcR,#Landtagswahl23 🤩🧡🙏 #FREIEWÄHLER #Aiwanger #Da...,Caption,['1. Political parties:\n- FREIEWÄHLER\n- Aiwa...,CyMAe_tufcR-Caption
1,1,CyL975vouHU,Die Landtagswahl war für uns als Liberale hart...,Caption,['Landtagswahl'],CyL975vouHU-Caption
2,2,CyL8GWWJmci,Nach einem starken Wahlkampf ein verdientes Er...,Caption,['1. Wahlkampf und Wahlergebnis:\n- Wahlkampf\...,CyL8GWWJmci-Caption
3,3,CyL7wyJtTV5,So viele Menschen am Odeonsplatz heute mit ein...,Caption,"['Israel', 'Terrorismus', 'Hamas', 'Entwicklun...",CyL7wyJtTV5-Caption
4,4,CyLxwHuvR4Y,Herzlichen Glückwunsch zu diesem grandiosen Wa...,Caption,['1. Wahlsieg und Parlamentseinstieg\n- Wahlsi...,CyLxwHuvR4Y-Caption


In [12]:
#@title ## LabelStudio Setup
#@markdown Please specify the the URL and API-Key for you LabelStudio Instance.
import json
from google.colab import userdata

labelstudio_key_name = "label2-key" #@param {type: "string"}
labelstudio_key = userdata.get(labelstudio_key_name)
labelstudio_url = "https://label2.digitalhumanities.io" #@param {type: "string"}

## Export to LabelStudio

### Create LabelStudio Interface
Before creating the LabelStudio project you will need to define your labelling interface. Once the project is set up you will only be able to edit the interface in LabelStudio.

In [9]:
interface = """
<View style="display:flex;">
  <View style="flex:33%">
    <Text name="Text" value="$Text"/>
  </View>
  <View style="flex:66%">
"""

In [None]:
#@title ### Codes
#@markdown Do you want add codes (Classification) to the images? Please name your coding instance and add options. <br/> **By running this cell multiple times you're able to add multiple variables (not recommended)**

coding_name = "Sentiment" #@param {type:"string"}
coding_values = "Positive,Neutral,Negative" #@param {type:"string"}
coding_choice = "single" #@param ["single", "multiple"]

coding_interface = '<Header value="{}" /><Choices name="{}" choice="{}" toName="Text">'.format(coding_name, coding_name,coding_choice)

for value in coding_values.split(","):
  value = value.strip()
  coding_interface += '<Choice value="{}" />'.format(value)

coding_interface += "</Choices>"

interface += coding_interface

print("Added {}".format(coding_name))

**Don't forget to run the next line! It closes the interface XML!**

In [10]:
interface += """
        </View>
    </View>
    """

In [14]:
#@title ## Create LabelStudio Project
#@markdown In this step we will create a LabelStudio project and configure cloud storage and the interface.
from label_studio_sdk import Client
import contextlib
import io

project_name = "vSMA Test 1"  #@param {type: "string"}
text_column = "Text"  #@param {type: "string"}
identifier_column = "identifier"  #@param {type: "string"}
#@markdown Percentage for drawing a sample to annotate, e.g. 30%
sample_percentage = 30  #@param {type: "number", min:0, max:100}
#@markdown Number of project copies. **Start testing with 1!**
num_copies = 1  #@param {type: "number", min:0, max:3}

sample_size = round(len(df) * (sample_percentage / 100))

ls = Client(url=labelstudio_url, api_key=labelstudio_key)


# Import all tasks
df_tasks = df[[identifier_column, text_column]]
df_tasks = df_tasks.sample(sample_size)
df_tasks = df_tasks.fillna("")

for i in range(0, num_copies):
  project_name = f"{project_name} #{i}"
  # Create the project
  project = ls.start_project(
      title=project_name,
      label_config=interface,
      sampling="Uniform sampling"
  )

  with contextlib.redirect_stdout(io.StringIO()):
    project.import_tasks(
          df_tasks.to_dict('records')
        )

  print(f"All done, created project #{i}! Visit {labelstudio_url}/projects/{project.id}/ and get started labelling!")



All done, created project #0! Visit https://label2.digitalhumanities.io/projects/61/ and get started labelling!
