<a href="https://colab.research.google.com/github/ocean-data-factory-sweden/kso-data-management/blob/main/tutorials/08_Analyse_Aggregate_Zooniverse_Annotations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>


<h1 align="right">KSO Tutorials #8: Analyse / Aggregate Zooniverse classifications</h1>
<h3 align="right">Written by the KSO team</h3>

This notebook takes you through the process of: 
* Connecting to a Zooniverse project
* Retrieving the classifications/annotations that citizens have provided on the videos, from your workflow of interest 
* Aggregating the annotations you have received based on an aggregation threshold and a minimum number of users.
* Exploring these aggregated classifications to inspect the effect of your aggregation settings.
* *If you were inspecting videos*, you can use the aggregation settings you decided on for Notebook 4, where you cut the videos into images and upload them to Zooniverse for the 2nd part of the workflow. The rest of this notebook is only for frames.
* *If you were inspecting frames*, you can continue with this notebook to:
  * Export these frames into YOLO format, which is needed for ML purposes in Notebook 5.
  * Export these frames into GBIF/OBIS format for biodiveristy purposes.

If you do not have a project with us yet, you can run the template project to get a taste of how the aggregation of the annotations works for videos. However, it is not yet possible to do this for frames, and therefore also not to do the exports.

🔴 <span style="color:red">&nbsp;NOTE: If you want to run another project than the template project, you need to have a Zooniverse account and be a member of the corresponding project.  </span>

# Set up KSO requirements

In [None]:
# @title <font size="5">↓ ឵឵<i>Install kso_data_management and its requirements</font> { vertical-output: true }

from IPython.display import clear_output

try:
    import google.colab
    import os

    IN_COLAB = True
    print("Running in Colab...")

    # Clone kso-data-management repo
    !git clone --quiet --recurse-submodules -b main https://github.com/ocean-data-factory-sweden/kso-data-management.git
    %pip install -q --upgrade pip
    %pip install -r <(sed '/boto3/d;/ipywidgets/d' kso-data-management/requirements.txt)
    %pip install -r <(sed '/boto3/d;/ipywidgets/d' kso-data-management/kso_utils/requirements.txt)

    # Fix libmagic issue
    !apt-get -qq update && apt-get -qq install -y libmagic-dev > /dev/null

    # Enable external widgets
    from google.colab import output

    output.enable_custom_widget_manager()

    os.chdir("kso-data-management/tutorials")
    try:
        clear_output()
        print("All packages are installed and ready to go!")
    except:
        clear_output()
        print("There have been some issues installing the packages!")
except:
    IN_COLAB = False

    # Install requirements
    %pip install -q --no-warn-script-location --upgrade pip
    %pip install -qr ../requirements.txt
    %pip install -qr ../kso_utils/requirements.txt

    !jupyter nbextension install --user --py widgetsnbextension
    !jupyter nbextension enable --user --py widgetsnbextension
    !jupyter nbextension install --user --py jupyter_bbox_widget
    !jupyter nbextension enable --user --py jupyter_bbox_widget
    !jupyter nbextension enable --user --py ipysheet

    clear_output()
    print("Running locally... you're good to go!")

In [None]:
# @title <font size="5">↓ ឵឵<i>Import Python packages</font> { vertical-output: true }

# Set the directory of the libraries
import sys

try:
    if "kso_utils" not in sys.modules:
        sys.path.append("..")
        import kso_utils.kso_utils

        sys.modules["kso_utils"] = kso_utils.kso_utils
        print("Using development version...")
        # Enables testing changes in utils
        %load_ext autoreload
        %autoreload 2
except:
    print("Installing latest version from PyPI...")
    %pip install -q kso-utils

# Import required modules
import kso_utils.tutorials_utils as t_utils
import kso_utils.project_utils as p_utils
import kso_utils.t5_utils as t5
import kso_utils.t8_utils as t8
from kso_utils.project import ProjectProcessor, MLProjectProcessor


print("Packages loaded successfully")

In [None]:
# @title <font size="5"><i>Choose your project</font> { vertical-output: true }
project_name = t_utils.choose_project()

In [None]:
# @title <font size="5"><i>Initiate project's database</font> { vertical-output: true }
# Save the name of the project
project = p_utils.find_project(project_name=project_name.value)

# Initiate pp
pp = ProjectProcessor(project)

In [None]:
# @title <font size="5"><i>Select the information to retrieve from Zooniverse</font> { vertical-output: true }

retrieve_info = t_utils.select_retrieve_info()

In [None]:
# @title <font size="5"><i>Retrieve the information from Zooniverse</font> { vertical-output: true }
pp.get_zoo_info(retrieve_info.result)

# Specify Zooniverse workflow of interest

In [None]:
# @title <font size="5"><i>Select Zooniverse workflow id and version of interest</font> { vertical-output: true }

# Note: A manual export in Zooniverse is required to get the most up-to-date classifications here*

# Make sure your workflows in Zooniverse have different names to avoid issues while selecting the workflow id

# Display a selectable list of workflow names and a list of versions of the workflow of interest
workflows_df = pp.zoo_info["workflows"]
wm = t8.WidgetMaker(workflows_df)
wm

In [None]:
# @title <font size="5"><i>Retrieve classifications from the workflow of interest</font> { vertical-output: true }

# Retrieve classifications from the workflow of interest
class_df = t8.get_classifications(
    wm.checks,
    workflows_df,
    wm.checks["Subject type: #0"],
    pp.zoo_info["classifications"],
    pp.db_info["db_path"],
    project,
)

🔴 <span style="color:red">&nbsp;NOTE: If the output from the cell above says that 0 classifications have been retrieved, choose another selection of the workflows. The one you selected now did not contain any annotations. </span>

# Aggregate classifications received on the workflow of interest

In [None]:
# @title <font size="5"><i>Specify agreement threshold among cit scientists</font> { vertical-output: true }
agg_params = t8.choose_agg_parameters(wm.checks["Subject type: #0"])

In [None]:
# @title <font size="5"><i>Aggregate classifications based on threshold</font> { vertical-output: true }
agg_df, raw_df = pp.process_classifications(
    pp.zoo_info["classifications"],
    wm.checks["Subject type: #0"],
    agg_params,
    summary=False,
)

🔴 <span style="color:red">&nbsp;NOTE: If the output from the cell above says that 0 classifications are aggregated, you can experiment with other agreement thresholds, or you need to wait for more annotations to be made in Zooniverse.   </span>

# Explore the aggregated classifications

In [None]:
# @title <font size="5"><i>Summarise the number of aggregated classifications</font> { vertical-output: true }

agg_df.groupby("label")["subject_ids"].agg("count")

In [None]:
# @title <font size="5"><i>Display all the aggregated classifications in a table</font> { vertical-output: true }

t8.launch_table(agg_df, wm.checks["Subject type: #0"])

In [None]:
# @title <font size="5"><i>Display a subject and its aggregated classifications</font> { vertical-output: true }

t8.launch_viewer(agg_df, wm.checks["Subject type: #0"])

In [None]:
# @title <font size="5"><i>Display the individual/non-aggregated classifications of a subject</font> { vertical-output: true }

t8.explore_classifications_per_subject(raw_df, wm.checks["Subject type: #0"])

🔴 <span style="color:red">&nbsp;NOTE: If you did the aggregation on clips (videos), the rest of this notebook is not relevant for you. You can use the explored aggregation settings to clip the videos into images with Notebook 4.Only if you were working with frames in the previous steps, you can continue with this notebook.  </span>

# OPTIONAL Task #1 - Export aggregated classifications in YOLO format (For ML purposes)

## Prepare the labelled frames

In [None]:
# Initialise mlp
mlp = MLProjectProcessor(pp)

In [None]:
# @title <font size="5"><i>Specify path to store the labelled frames and annotations</font> { vertical-output: true }
# Specify path to store the labelled frames and annotations
output_folder = t_utils.choose_folder(".", "output")

In [None]:
# @title <font size="5"><i>Determine your training parameters</font> { vertical-output: true }
# Determine your training parameters
percentage_test = t5.choose_test_prop()

In [None]:
# @title <font size="5"><i>Run the preparation script</font> { vertical-output: true }
# Run the preparation script
mlp.prepare_dataset(
    agg_df=agg_df,
    out_path=output_folder.selected,
    img_size=(720, 540),
    perc_test=percentage_test.value,
)


## Preview and adjust aggregated annotations

In [None]:
# @title <font size="5"><i>Preview and adjust annotations</font> { vertical-output: true }
t8.get_annotations_viewer(output_folder.selected, species_list=mlp.species_of_interest)


# OPTIONAL Task #2 - Export observations in GBIF/OBIS format (For biodiversity purposes)

In [None]:
# @title <font size="5"><i>Format the classifications to Darwin Core Standard occurrences</font> { vertical-output: true }
occurrence_df = pp.format_to_gbif(
    agg_df=agg_df, subject_type=wm.checks["Subject type: #0"]
)

In [None]:
# Preview occurence df
occurrence_df

In [None]:
# @title <font size="5"><i>Save the occurence df locally</font> { vertical-output: true }
occurrence_df.to_csv("occurrence_for_gbif.csv", index=False)
print("The observations are now saved in occurrence_for_gbif.csv")

In [None]:
# END