<a href="https://colab.research.google.com/github/emcdona1/fmnh_scripts/blob/main/Microplant_Mystery_Zooniverse_Processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Preparation steps:**

  1. Download your Classifications CSV from Zooniverse.
  2. Go through this file and remove any old versions of your workflows.  (e.g. use Excel or Google Sheets)
  3.  Upload the new CSV file to your Google Drive.

-----
**Google Colab steps:**

**1. Press the play button on the cell below to start up Google Colab and connect it to your Google Drive.**

   1. You may be prompted to open up a URL (blue link) - click the link and follow the prompts to authorize your Google account.
   2. After you connect, you'll see an Authorization Code (long string of gibberish), copy and paste this text into the Colab prompt box (below).
   3. Click Enter and wait 5-10 seconds while Google Drive connects.

In [None]:
import os
import sys
import io
import shutil
import ast
import pandas as pd
from statistics import mode, StatisticsError
from google.colab import drive

drive.mount('/content/drive')
drive_folder = '/content/drive/MyDrive'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**2. Upload your Zooniverse classification file:**

When prompted, type the name of the CSV file in your Google Drive -- be careful to type the name exactly!

In [None]:
csv_file = input('Type the name of your CSV file: ')
input_filename = csv_file
csv_file_path = os.path.join(drive_folder, csv_file)
output_filename = input_filename[:-4] + '-processed'
if not os.path.exists(csv_file_path):
  print(f'WARNING!  The file "{csv_file}" is not in your Google Drive!  Did you type the name correctly?')

Type the name of your CSV file: my_file.csv


**3. Apply some basic formatting to the CSV file, extract the metadata, and identify the workflows.**

In [None]:
zooniverse_classifications = pd.read_csv(csv_file_path)
def clean_formatting_and_parse(data: str):
    data = data.replace('null', 'None')
    data = data.replace('true', 'True')
    data = data.replace('false', 'False')
    return ast.literal_eval(data)

zooniverse_classifications.loc[:, 'annotations'] = zooniverse_classifications['annotations'].apply(clean_formatting_and_parse)
zooniverse_classifications.loc[:, 'metadata'] = zooniverse_classifications['metadata'].apply(clean_formatting_and_parse)
zooniverse_classifications.loc[:, 'subject_data'] = zooniverse_classifications['subject_data'].apply(clean_formatting_and_parse)

zooniverse_classifications['metadata-user_agent'] = zooniverse_classifications['metadata'].apply(lambda d: d['user_agent'])
zooniverse_classifications['image_filename'] = zooniverse_classifications['subject_data'].apply(lambda d: d[list(d.keys())[0]]['Filename'])

workflows = list(set(zooniverse_classifications.loc[:, 'workflow_name']))
print('Zooniverse Workflows found:')
for workflow in workflows:
  print(f'-- {workflow}')

Zooniverse Workflows found:
-- Stem and Branching Patterns 
-- Determining the Gender of a Liverwort


**4. Parse the tasks (from annotations column) into separate columns.**

In [None]:
results_by_workflow = list()
for workflow_name in workflows:
    results = zooniverse_classifications[zooniverse_classifications['workflow_name'] == workflow_name].copy()
    for idx, row in results.iterrows():
        max_count = 0
        tasks = row['annotations']
        for count, task in enumerate(tasks):
            name = f'Task{count+1}'
            results.loc[idx, f'{name}-id'] = task['task']
            results.loc[idx, f'{name}-label'] = task['task_label']
            results.loc[idx, f'{name}-value'] = str(task['value'])
        if count > max_count: max_count = count
    results_by_workflow.append(results)
    print(f'{workflow_name.strip()} expanded into {max_count} task(s).')

Stem and Branching Patterns expanded into 1 task(s).
Determining the Gender of a Liverwort expanded into 2 task(s).


**5. Extract the (x, y) coordinates.**

In [None]:
def extract_coordinate_info(workflow, idx, pt_info, colname):
    workflow.loc[idx, f'{colname}-x'] = pt_info['x']
    workflow.loc[idx, f'{colname}-y'] = pt_info['y']
    workflow.loc[idx, f'{colname}-width'] = pt_info['width']
    workflow.loc[idx, f'{colname}-height'] = pt_info['height']
    workflow.loc[idx, f'{colname}-value_label'] = pt_info['tool_label']

for workflow_name, workflow in zip(workflows, results_by_workflow):
    value_columns = [val for val in list(workflow.columns) if 'value' in val]
    for idx, row in workflow.iterrows():
        for col in value_columns:
            try:
                points = ast.literal_eval(row[col])
                if type(points) == list:
                    for pt_idx, point in enumerate(points):
                        colname = col.split('-')[0] + '-Pt' + str(pt_idx + 1)
                        extract_coordinate_info(workflow, idx, points[0], colname)
                workflow.loc[idx, col] = None
            except: pass
    found_points = list(set([a.split('-')[0]+'-'+a.split('-')[1] for a in list(workflow.columns) if 'Pt' in a]))
    print(f'Workflow: {workflow_name}')
    if len(found_points) > 0:
        print(f'-- Point info found:', found_points)
    else:
        print('-- No point data found.')
    print()

Workflow: Stem and Branching Patterns 
-- No point data found.

Workflow: Determining the Gender of a Liverwort
-- Point info found: ['Task3-Pt1', 'Task3-Pt2']



**6. Save the files.**

In [None]:
results_folder = './results'
if not os.path.exists(results_folder):
  os.makedirs(results_folder)
for (workflow_name, workflow) in zip(workflows, results_by_workflow):
  file_location = os.path.join(results_folder, workflow_name.strip() + '.csv')
  workflow.to_csv(file_location, index=False, encoding='UTF-8')

save_location = shutil.make_archive(output_filename, 'zip', results_folder)
print(f'Saved to {save_location}')

Saved to /content/my_file-processed.zip


**7. On the left side, click the Folder icon, and find your new ZIP file.  Click the 3 dots (right side) and then click Download.**