<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>
<h1 align="right">KSO Notebook #8: Analyse / Aggregate Zooniverse classifications</h1>
<h3 align="right"><a href="https://colab.research.google.com/github/ocean-data-factory-sweden/kso/blob/main/notebooks/08_Analyse_Aggregate_Zooniverse_Annotations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a></h3>
<h3 align="right">Written by the KSO team</h3>

This notebook takes you through the process of:
* Connecting to a Zooniverse project
* Retrieving the annotations provided by citizen scientists from your workflow of interest
* Aggregating the annotations based on aggregation thresholds and minimum number of users
* Exploring the aggregated classifications to inspect the effect of your aggregation settings.
* Downloading the aggregated or unaggregated classifications for further analyses.
* *If you were inspecting videos*, you can use the aggregation settings you decided on for Notebook 4, where you cut the videos into images and upload them to Zooniverse for the 2nd part of the workflow.
* *If you were inspecting frames*, you can export the frame annotations into YOLO format, which is needed for ML purposes in Notebook 5.

If you do not have a project with us yet, you can run the template project to get a taste of how the aggregation of the annotations works for videos. However, it is not yet possible to do this for frames, and therefore also not to do the exports.

🔴 <span style="color:red">&nbsp;NOTE: If you want to run another project than the template project, you need to have a Zooniverse account and be a member of the corresponding project.  </span>

# Set up KSO requirements

### Install all the requirements

Installing the requirements in Google Colab takes ~4 mins and might automatically crash/restart the session. Please run this cell until you get the "Successful installation!" message.

In [1]:
import sys
import os

# Check if notebook is running in colab
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB:
    # Clone kso repo and install requirements
    if not os.path.exists("kso"):
        print("Installing all dependencies...")
        !git clone https://github.com/ocean-data-factory-sweden/kso.git
        !pip install -r /content/kso/requirements_colab.txt

    # Enable external widgets and navigate to the kso tutorial folder
    try:
        from google.colab import output

        output.enable_custom_widget_manager()
        os.chdir("kso/notebooks")
    except ImportError:
        pass

# Prepare the dev settings if needed
try:
    if "kso_utils" not in sys.modules:
        sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), "..")))
        import kso_utils

        print("Using development version...")
        # Enables testing changes in utils
        %load_ext autoreload
        %autoreload 2
except ImportError:
    print("Installing latest version from PyPI...")
    %pip install -q kso-utils

if IN_COLAB:

    def restart_runtime():
        os.kill(os.getpid(), 9)

    # Check if there are any issues with previously imported packages
    try:
        from kso_utils.project import ProjectProcessor
    except Exception as e:
        print(f"Error importing package: {e}")
        print("Restarting runtime to apply package changes...")
        restart_runtime()

# Avoid issues with widgets not displaying properly
!jupyter nbextension enable --user --py widgetsnbextension
!jupyter nbextension enable --user --py jupyter_bbox_widget
!jupyter nbextension enable --user --py ipysheet

# Load the clear output function to keep things clean
from IPython.display import clear_output

clear_output()
print("Successful installation... you're good to go!")

Successful installation... you're good to go!


### Import Python packages

In [2]:
# Import required modules for tut#8
import kso_utils.widgets as kso_widgets
import kso_utils.project_utils as p_utils
from kso_utils.project import ProjectProcessor, MLProjectProcessor

print("Packages loaded successfully")

DEBUG:matplotlib:matplotlib data path: /opt/tljh/user/lib/python3.10/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/jupyter-benjamin.hoeree@st-2de9f/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:matplotlib:CACHEDIR=/tmp/matplotlib-_xo8k9il
DEBUG:matplotlib.font_manager:font search path [PosixPath('/opt/tljh/user/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/ttf'), PosixPath('/opt/tljh/user/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/afm'), PosixPath('/opt/tljh/user/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts')]
INFO:matplotlib.font_manager:generated new fontManager


Packages loaded successfully


### Choose your project

In [3]:
project_name = kso_widgets.choose_project()

Dropdown(description='Project:', options=('Template project', 'Koster_Seafloor_Obs', 'Spyfish_Aotearoa', 'SGU'…

### Initiate project's database

In [4]:
# Save the name of the project
project = p_utils.find_project(project_name=project_name.value)

# Initiate pp
pp = ProjectProcessor(project)

INFO:root:Koster_Seafloor_Obs loaded succesfully
INFO:root:Running locally, no external connection to server needed.
INFO:root:Running locally so no csv files were downloaded from the server.
INFO:root:Updated species table from the temporary database
INFO:root:Updated sites table from the temporary database
INFO:root:Updated photos table from the temporary database
INFO:root:Updated movies table from the temporary database


### Specify to request (or not) the latest Zooniverse info

In [5]:
latest_zoo_info = kso_widgets.request_latest_zoo_info()

interactive(children=(RadioButtons(description='Do you want to request the most up-to-date Zooniverse informat…

### Connect and retrieve information from the Zooniverse project

In [6]:
pp.connect_zoo_project(latest_zoo_info.result)

Enter your Zooniverse user ········
Enter your Zooniverse password ········


INFO:root:Connected to Zooniverse
INFO:root:Retrieving subjects from Zooniverse
INFO:root:subjects retrieved successfully
INFO:root:Retrieving workflows from Zooniverse
INFO:root:workflows retrieved successfully
INFO:root:Retrieving classifications from Zooniverse
INFO:root:classifications retrieved successfully


# Select Zooniverse workflow id and version of interest

##### Note: Make sure your workflows in Zooniverse have different names to avoid issues while selecting the workflow id

### Choose the workflows and versions of interest

In [12]:
pp.choose_zoo_workflows()

WidgetWorkflowSelection(children=(BoundedIntText(value=0, description='Number of workflows:', max=24, style=De…

### Sample and process Zooniverse classifications from the workflows of interest

In [14]:
pp.process_zoo_classifications()

INFO:root:127 Zooniverse classifications have been retrieved from 125 subjects


KeyError: 'subject_type'

### Display individual/non-aggregated classifications of a subject

In [9]:
pp.explore_processed_classifications_per_subject()

AttributeError: 'ProjectProcessor' object has no attribute 'processed_zoo_classifications'

# Aggregate classifications received from the workflows of interest

### Select users to aggregate their annotations

In [11]:
users = kso_widgets.choose_aggregation_users(pp.processed_zoo_classifications)

AttributeError: 'ProjectProcessor' object has no attribute 'processed_zoo_classifications'

### Specify the aggregation parameters

In [None]:
agg_params = kso_widgets.choose_agg_parameters(
    pp.workflow_widget.checks["Subject type: #0"],
)

### Aggregate classifications based on parameters

In [None]:
pp.aggregate_zoo_classifications(agg_params, users.result)

🔴 <span style="color:red">&nbsp;NOTE: If the output from the cell above says that 0 classifications are aggregated, you can experiment with other agreement thresholds, or you need to wait for more annotations to be made in Zooniverse.   </span>

# Explore the aggregated classifications

### Summarise the number of aggregated classifications

In [None]:
pp.aggregated_zoo_classifications.groupby("label")["subject_ids"].agg("count")

### Display all the aggregated classifications in a table

In [None]:
pp.launch_classifications_table()

### Display a subject and its aggregated classifications

In [None]:
pp.launch_classifications_viewer()

🔴 <span style="color:red">&nbsp;NOTE: If you did the aggregation on clips (videos), the rest of this notebook is not relevant for you. You can use the explored aggregation settings to clip the videos into images with Notebook 4.Only if you were working with frames in the previous steps, you can continue with this notebook.  </span>

# OPTIONAL #1 - Export frame aggregated classifications in YOLO format (For ML purposes)

## Prepare the labelled frames

In [None]:
# Initialise mlp
mlp = MLProjectProcessor(pp)

### Specify path to store the labelled frames and annotations

In [None]:
# Specify path to store the labelled frames and annotations
output_folder = kso_widgets.choose_folder(".", "output")

### Determine your training parameters

In [None]:
# Determine your training parameters
percentage_test = kso_widgets.choose_test_prop()

### Run the preparation script

In [None]:
# Run the preparation script
mlp.prepare_dataset(
    agg_df=pp.aggregated_zoo_classifications,
    out_path=output_folder.selected,
    img_size=(720, 540),
    perc_test=percentage_test.value,
    out_format="yolo",
    track_frames=True,
)


## Preview and adjust aggregated annotations

### Preview and adjust annotations

In [None]:
pp.get_annotations_viewer(
    output_folder.selected, annotation_classes=mlp.species_of_interest
)


# OPTIONAL #2 - Download the raw/aggreated classifications in a csv file for further analysis (e.g. comparisons between citizen scientists and experts)

In [None]:
pp.download_classications_csv(pp.processed_zoo_classifications)
# Uncomment the following line to download the aggregated classifications
# pp.download_classications_csv(pp.aggregated_zoo_classifications)


# OPTIONAL #3 - Download aggregated annotations in GBIF/OBIS format (For biodiversity purposes)

In [None]:
pp.download_gbif_occurrences("citizen_scientists", pp.aggregated_zoo_classifications)

In [None]:
# END