<a href="https://colab.research.google.com/github/ocean-data-factory-sweden/kso-data-management/blob/main/tutorials/08_Analyse_Aggregate_Zooniverse_Annotations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>


<h1 align="right">KSO Tutorials #8: Analyse / Aggregate Zooniverse classifications</h1>
<h3 align="right">Written by the KSO team</h3>

This notebook takes you through the process of: 
* Connecting to a Zooniverse project
* Retrieving the classifications/annotations that citizens have provided on the videos, from your workflow of interest 
* Aggregating the annotations you have received based on an aggregation threshold and a minimum number of users.
* Exploring these aggregated classifications to inspect the effect of your aggregation settings.
* *If you were inspecting videos*, you can use the aggregation settings you decided on for Notebook 4, where you cut the videos into images and upload them to Zooniverse for the 2nd part of the workflow. The rest of this notebook is only for frames.
* *If you were inspecting frames*, you can continue with this notebook to:
  * Export these frames into YOLO format, which is needed for ML purposes in Notebook 5.
  * Export these frames into GBIF/OBIS format for biodiveristy purposes.

If you do not have a project with us yet, you can run the template project to get a taste of how the aggregation of the annotations works for videos. However, it is not yet possible to do this for frames, and therefore also not to do the exports.

🔴 <span style="color:red">&nbsp;NOTE: If you want to run another project than the template project, you need to have a Zooniverse account and be a member of the corresponding project.  </span>

# Set up KSO requirements

### Install all the requirements

In [None]:
from IPython.display import clear_output
import os
import sys

try:
    # Enable external widgets
    from google.colab import output

    output.enable_custom_widget_manager()

    IN_COLAB = True
    print("Running in Colab...")

    # Clone repo
    !git clone --recurse-submodules -b main https://github.com/ocean-data-factory-sweden/kso.git
    %pip install -qr <(sed '/Pillow/d;/ipywidgets/d' kso/yolov5_tracker/requirements.txt) -qr <(sed '/Pillow/d;/ipywidgets/d' kso/yolov5_tracker/yolov5/requirements.txt) -qr <(sed '/Pillow/d;/ipywidgets/d' kso/kso_utils/requirements.txt)

    # Fix libmagic issue
    !apt-get -qq update && apt-get -qq install -y libmagic-dev > /dev/null

    # Navigate to the correct folder
    os.chdir("kso/tutorials")

except:
    IN_COLAB = False


# Ensure widgets are shown properly
!jupyter nbextension enable --user --py widgetsnbextension
!jupyter nbextension enable --user --py jupyter_bbox_widget
!jupyter nbextension enable --user --py ipysheet

clear_output()
if IN_COLAB == True:
    print("Running in Colab: All packages are installed and ready to go!")
else:
    print("Running locally... you're good to go!")

### Import Python packages

In [None]:
# Set the directory of the libraries
try:
    if "kso_utils" not in sys.modules:
        sys.path.append("..")
        import kso_utils.kso_utils

        sys.modules["kso_utils"] = kso_utils.kso_utils
        print("Using development version...")
        # Enables testing changes in utils
        %load_ext autoreload
        %autoreload 2
except:
    print("Installing latest version from PyPI...")
    %pip install -q kso-utils

# Import required modules
import kso_utils.tutorials_utils as t_utils
import kso_utils.project_utils as p_utils
import kso_utils.widgets as kso_widgets
from kso_utils.project import ProjectProcessor, MLProjectProcessor


print("Packages loaded successfully")

### Import Python packages

In [None]:
# @title <font size="5"><i>Choose your project</font> { vertical-output: true }
project_name = kso_widgets.choose_project()

### Import Python packages

In [None]:
# @title <font size="5"><i>Initiate project's database</font> { vertical-output: true }
# Save the name of the project
project = p_utils.find_project(project_name=project_name.value)

# Initiate pp
pp = ProjectProcessor(project)

### Import Python packages

In [None]:
# @title <font size="5"><i>Select the information to retrieve from Zooniverse</font> { vertical-output: true }

retrieve_info = kso_widgets.select_retrieve_info()

# Specify Zooniverse workflow of interest

### Select Zooniverse workflow id and version of interest

##### Note: A manual export in Zooniverse is required to get the most up-to-date classifications here*

##### Make sure your workflows in Zooniverse have different names to avoid issues while selecting the workflow id

In [None]:
# Display a selectable list of workflow names and a list of versions of the workflow of interest
pp.choose_workflows(retrieve_info.result)

### Retrieve the information from Zooniverse

In [None]:
pp.get_zoo_info()

### Retrieve classifications from the workflow of interest

In [None]:
# Retrieve classifications from the workflow of interest
class_df = pp.get_classifications(
    pp.workflow_widget.checks,
    pp.zoo_info["workflows"],
    pp.workflow_widget.checks["Subject type: #0"],
    pp.zoo_info["classifications"],
)

🔴 <span style="color:red">&nbsp;NOTE: If the output from the cell above says that 0 classifications have been retrieved, choose another selection of the workflows. The one you selected now did not contain any annotations. </span>

# Aggregate classifications received on the workflow of interest

### Specify agreement threshold among cit scientists

In [None]:
agg_params = kso_widgets.choose_agg_parameters(
    pp.workflow_widget.checks["Subject type: #0"]
)

### Aggregate classifications based on threshold

In [None]:
agg_df, raw_df = pp.process_classifications(
    pp.zoo_info["classifications"],
    pp.workflow_widget.checks["Subject type: #0"],
    agg_params,
    summary=False,
)

🔴 <span style="color:red">&nbsp;NOTE: If the output from the cell above says that 0 classifications are aggregated, you can experiment with other agreement thresholds, or you need to wait for more annotations to be made in Zooniverse.   </span>

# Explore the aggregated classifications

### Summarise the number of aggregated classifications

In [None]:
agg_df.groupby("label")["subject_ids"].agg("count")

### Display all the aggregated classifications in a table

In [None]:
t_utils.launch_table(agg_df, pp.workflow_widget.checks["Subject type: #0"])

### Display a subject and its aggregated classifications

In [None]:
t_utils.launch_viewer(agg_df, pp.workflow_widget.checks["Subject type: #0"])

### Display the individual/non-aggregated classifications of a subject

In [None]:
t_utils.explore_classifications_per_subject(
    raw_df, pp.workflow_widget.checks["Subject type: #0"]
)

🔴 <span style="color:red">&nbsp;NOTE: If you did the aggregation on clips (videos), the rest of this notebook is not relevant for you. You can use the explored aggregation settings to clip the videos into images with Notebook 4.Only if you were working with frames in the previous steps, you can continue with this notebook.  </span>

# OPTIONAL Task #1 - Export aggregated classifications in YOLO format (For ML purposes)

## Prepare the labelled frames

In [None]:
# Initialise mlp
mlp = MLProjectProcessor(pp)

### Specify path to store the labelled frames and annotations

In [None]:
# Specify path to store the labelled frames and annotations
output_folder = kso_widgets.choose_folder(".", "output")

### Determine your training parameters

In [None]:
# Determine your training parameters
percentage_test = t_utils.choose_test_prop()

### Run the preparation script

In [None]:
# Run the preparation script
mlp.prepare_dataset(
    agg_df=agg_df,
    out_path=output_folder.selected,
    img_size=(720, 540),
    perc_test=percentage_test.value,
)


## Preview and adjust aggregated annotations

### Preview and adjust annotations

In [None]:
t_utils.get_annotations_viewer(
    output_folder.selected, species_list=mlp.species_of_interest
)


# OPTIONAL Task #2 - Export observations in GBIF/OBIS format (For biodiversity purposes)

### Format the classifications to Darwin Core Standard occurrences

In [None]:
occurrence_df = pp.format_to_gbif(
    agg_df=agg_df, subject_type=pp.workflow_widget.checks["Subject type: #0"]
)

In [None]:
# Preview occurence df
occurrence_df

### Save the occurence df locally

In [None]:
occurrence_df.to_csv("occurrence_for_gbif.csv", index=False)
print("The observations are now saved in occurrence_for_gbif.csv")

In [None]:
# END