<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>


<h1 align="right">KSO Tutorials #12: Analyse Zooniverse classifications</h1>
<h3 align="right">Written by @jannesgg and @vykanton</h3>
<h5 align="right">Last updated: Nov 25th, 2021</h5>

# Set up and requirements

### Import Python packages

In [1]:
# Set the directory of the libraries
import sys
sys.path.append('..')

# Set to display dataframes as interactive tables
from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)

# Import required modules
import kso_utils.tutorials_utils as t_utils
import kso_utils.t12_utils as t12
import kso_utils.zooniverse_utils as zoo

print("Packages loaded successfully")

<IPython.core.display.Javascript object>

Packages loaded successfully


### Choose your project

In [2]:
project = t_utils.choose_project()

Dropdown(description='Project:', options=('Koster_Seafloor_Obs', 'Spyfish_Aotearoa', 'SGU'), value='Koster_Sea…

### Populate SQL database with sites, movies and species and connect to Zoo

In [3]:
# Initiate db
db_info_dict = t_utils.initiate_db(project.value)

Enter your username for SNIC server········
Enter your password for SNIC server········
Updated sites
Updated movies
Updated species


In [4]:
# Connect to Zooniverse project
zoo_project = t_utils.connect_zoo_project(project.value)

Enter your Zooniverse user········
Enter your Zooniverse password········


### Retrieve Zooniverse information

In [5]:
# Specify the Zooniverse information required throughout the tutorial
zoo_info = ["subjects", "workflows", "classifications"]

zoo_info_dict = t_utils.retrieve__populate_zoo_info(project_name = project.value, 
                           db_info_dict = db_info_dict,
                           zoo_project = zoo_project,
                           zoo_info = zoo_info)   

Retrieving subjects from Zooniverse
subjects were retrieved successfully
Retrieving workflows from Zooniverse
workflows were retrieved successfully
Retrieving classifications from Zooniverse
classifications were retrieved successfully
Updated subjects
The database has a total of 2342 frame subjects and 7362 clip subjects have been updated


### Step 1: Specify the Zooniverse workflow id and version of interest

*Note:  A manual export in Zooniverse is required to get the most up-to-date classifications here**

Make sure your workflows in Zooniverse have different names to avoid issues while selecting the workflow id

In [16]:
# Display a selectable list of workflow names and a list of versions of the workflow of interest
workflows_df = zoo_info_dict["workflows"]
wm = t12.WidgetMaker(workflows_df)
wm

WidgetMaker(children=(IntText(value=0, description='Number of workflows:', style=DescriptionStyle(description_…

Output()

In [21]:
# Retrieve classifications from the workflow of interest
class_df = t12.get_classifications(wm.checks,
                                 workflows_df, 
                                 'frame', 
                                 zoo_info_dict["classifications"], 
                                 db_info_dict["db_path"])

There are 1764 classifications out of 15022 missing subject info. Maybe the subjects have been removed from Zooniverse?
Zooniverse classifications have been retrieved


### Step 2: Aggregate classifications received on the workflow of interest

In [22]:
# Specify the agreement threshold required among cit scientists
agg_params = t12.choose_agg_parameters(subj_type.value)

FloatSlider(value=0.8, continuous_update=False, description='Aggregation threshold:', max=1.0, readout_format=…

IntSlider(value=3, continuous_update=False, description='Min numbers of users:', max=15, min=1, style=SliderSt…

FloatSlider(value=0.8, continuous_update=False, description='Object threshold:', max=1.0, readout_format='.1f'…

FloatSlider(value=0.5, continuous_update=False, description='IOU Epsilon:', max=1.0, readout_format='.1f', sty…

FloatSlider(value=0.8, continuous_update=False, description='Inter user agreement:', max=1.0, readout_format='…

In [23]:
agg_class_df, raw_class_df = t12.aggregrate_classifications(class_df, subj_type.value, project.value, agg_params)

Aggregrating the classifications
814 classifications aggregated out of 2027 unique subjects available


### Step 3: Summarise the number of classifications based on the agreement specified

In [24]:
agg_class_df.groupby("label")["subject_ids"].agg("count")

Unnamed: 0_level_0,subject_ids
label,Unnamed: 1_level_1


### Step 4: Display the aggregated classifications in a table

In [25]:
# Display the dataframe into a table
if subj_type.value == "clip":
    a = agg_class_df[["subject_ids","label","how_many","first_seen"]]
else:
    a = agg_class_df
a

Unnamed: 0,subject_ids,label,x,y,w,h,https_location,subject_type,movie_id


### Step 5: Use the subject explorer widget to visualise subjects and their aggregated classifications

In [26]:
# Launch the subject viewer
t12.launch_viewer(agg_class_df, subj_type.value)

Combobox(value='', description='Subject id:', ensure_option=True, options=('45881626', '45881627', '45881628',…

Output()

### Step 6: Use the subject explorer widget to get more information about specific subjects and their "raw" classifications

In [27]:
# Launch the classifications_per_subject explorer
t12.explore_classifications_per_subject(raw_class_df, subj_type.value)

Combobox(value='', description='Subject id:', ensure_option=True, options=('45881560', '45881712', '45881623',…

Output()

In [None]:
# END