<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>


<h1 align="right">KSO Tutorials #12: Analyse Zooniverse classifications</h1>
<h3 align="right">Written by @jannesgg and @vykanton</h3>
<h5 align="right">Last updated: March 9, 2022</h5>

# Set up and requirements

### Import Python packages

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Set the directory of the libraries
import sys
sys.path.append('..')

# Set to display dataframes as interactive tables
from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)

# Import required modules
import kso_utils.tutorials_utils as t_utils
import kso_utils.project_utils as p_utils
import kso_utils.t12_utils as t12

print("Packages loaded successfully")

<IPython.core.display.Javascript object>

Packages loaded successfully


### Choose your project

In [3]:
project_name = t_utils.choose_project()

Dropdown(description='Project:', options=('Koster_Seafloor_Obs', 'Spyfish_Aotearoa', 'SGU'), value='Koster_Sea…

In [4]:
project = p_utils.find_project(project_name=project_name.value)

### Set up initial information

In [5]:
db_info_dict, zoo_project, zoo_info_dict = t12.setup_initial_info(project)

Enter your username for SNIC server········
Enter your password for SNIC server········
Updated sites
Updated movies
Updated species
Enter your Zooniverse user········
Enter your Zooniverse password········
Retrieving subjects from Zooniverse
subjects were retrieved successfully
Retrieving workflows from Zooniverse
workflows were retrieved successfully
Retrieving classifications from Zooniverse
classifications were retrieved successfully
Updated subjects
The database has a total of 4295 frame subjects and 7362 clip subjects have been updated


### Step 1: Specify the Zooniverse workflow id and version of interest

*Note:  A manual export in Zooniverse is required to get the most up-to-date classifications here**

Make sure your workflows in Zooniverse have different names to avoid issues while selecting the workflow id

In [6]:
# Display a selectable list of workflow names and a list of versions of the workflow of interest
workflows_df = zoo_info_dict["workflows"]
wm = t12.WidgetMaker(workflows_df)
wm

WidgetMaker(children=(IntText(value=0, description='Number of workflows:', style=DescriptionStyle(description_…

Output()

In [67]:
# Retrieve classifications from the workflow of interest
class_df = t12.get_classifications(wm.checks,
                                   workflows_df,
                                   wm.checks['Subject type: #0'], 
                                   zoo_info_dict["classifications"], 
                                   db_info_dict["db_path"],
                                   project)

       subject_ids        id  \
0         39431409      <NA>   
1         39431413      <NA>   
2         39431414      <NA>   
3         39431419      <NA>   
4         39431420      <NA>   
...            ...       ...   
15017     59838026  59838026   
15018     59837930  59837930   
15019     59837993  59837993   
15020     59837936  59837936   
15021     59837989  59837989   

                                          https_location  
0                                                    NaN  
1                                                    NaN  
2                                                    NaN  
3                                                    NaN  
4                                                    NaN  
...                                                  ...  
15017  https://panoptes-uploads.zooniverse.org/subjec...  
15018  https://panoptes-uploads.zooniverse.org/subjec...  
15019  https://panoptes-uploads.zooniverse.org/subjec...  
15020  https://panoptes-u

### Step 2: Aggregate classifications received on the workflow of interest

In [8]:
# Specify the agreement threshold required among cit scientists
agg_params = t12.choose_agg_parameters(wm.checks['Subject type: #0'])

FloatSlider(value=0.8, continuous_update=False, description='Aggregation threshold:', max=1.0, readout_format=…

IntSlider(value=3, continuous_update=False, description='Min numbers of users:', max=15, min=1, style=SliderSt…

FloatSlider(value=0.8, continuous_update=False, description='Object threshold:', max=1.0, readout_format='.1f'…

FloatSlider(value=0.5, continuous_update=False, description='IOU Epsilon:', max=1.0, readout_format='.1f', sty…

FloatSlider(value=0.8, continuous_update=False, description='Inter user agreement:', max=1.0, readout_format='…

In [60]:
agg_class_df, raw_class_df = t12.aggregrate_classifications(class_df, 
                                                            wm.checks['Subject type: #0'], 
                                                            project, 
                                                            agg_params)

Aggregrating the classifications
496 classifications aggregated out of 2027 unique subjects available


### Step 3: Summarise the number of classifications based on the agreement specified

In [61]:
agg_class_df.groupby("label")["subject_ids"].agg("count")

Unnamed: 0_level_0,subject_ids
label,Unnamed: 1_level_1


### Step 4: Display the aggregated classifications in a table

In [62]:
# Display the dataframe into a table
t12.launch_table(agg_class_df, wm.checks['Subject type: #0'])

Unnamed: 0,subject_ids,label,x,y,w,h,https_location,subject_type,filename


### Step 5: Use the subject explorer widget to visualise subjects and their aggregated classifications

In [63]:
# Launch the subject viewer
t12.launch_viewer(agg_class_df, wm.checks['Subject type: #0'])

Combobox(value='', description='Subject id:', ensure_option=True, options=('45881633', '45881634', '45881636',…

Output()

### Step 6: Use the subject explorer widget to get more information about specific subjects and their "raw" classifications

In [64]:
# Launch the classifications_per_subject explorer
t12.explore_classifications_per_subject(raw_class_df, wm.checks['Subject type: #0'])

Combobox(value='', description='Subject id:', ensure_option=True, options=('45881560', '45881712', '45881623',…

Output()

In [None]:
# END