# Camera-Trap-Data-Pipeline

Pre-configured scripts for easy running of the different code blocks. See https://github.com/marco-willi/camera-trap-data-pipeline for more details.

The following codes can be run by selecting a cell (via mouse or arrow keys) and then pressing CTRL+Enter.

In [None]:
# Load important modules
import os
import pandas as pd

## Parameters 

Here we select / define the paramters of the current run: Choose the appropriate cell and execute or modify it. For a new season: create a new cell using the 'Insert' menu at the top.

In [None]:
###################################
# Template
####################################

SITE=''

SEASON=''

PROJECT_ID=''

WORKFLOW_ID=''

WORKFLOW_VERSION_MIN=''

ATTRIBUTION="''"

LICENSE="''"

In [None]:
###################################
# Grumeti
####################################

SITE='GRU'

SEASON='GRU_S1'

PROJECT_ID='5115'

WORKFLOW_ID='4979'

WORKFLOW_VERSION_MIN='275'

ATTRIBUTION="'University of Minnesota Lion Center + Snapshot Safari + Singita Grumeti + Tanzania'"

LICENSE="'Snapshot Safari + Singita Grumeti'"

In [None]:
###################################
# RUA
####################################

SITE='RUA'

SEASON='RUA_S1'

PROJECT_ID='5155'

WORKFLOW_ID='4889'

WORKFLOW_VERSION_MIN='797'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# Mountain Zebra
####################################

SITE='MTZ'

SEASON='MTZ_S1'

PROJECT_ID='5124'

WORKFLOW_ID='8814'

WORKFLOW_VERSION_MIN='247'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# Karoo
####################################

SITE='KAR'

SEASON='KAR_S1'

PROJECT_ID='7679'

WORKFLOW_ID='8789'

WORKFLOW_VERSION_MIN='237.7'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# Karoo TEST
####################################

SITE='KAR_TEST'

SEASON='KAR_S1'

PROJECT_ID='7679'

WORKFLOW_ID='8789'

WORKFLOW_VERSION_MIN='237.7'

ATTRIBUTION=''

LICENSE=''

In [None]:
###################################
# KRU 
####################################

SITE='KRU'

SEASON='KRU_S1'

PROJECT_ID=''

WORKFLOW_ID=''

WORKFLOW_VERSION_MIN=''

ATTRIBUTION=''

LICENSE=''

### Verify Parameters

Lets check the paramters.

In [None]:
print("Selected: site: {} \nseason: {} \nproject_id: {} \nworkflow_id: {} \nworkflow_version_min: {} \nattribution: {} \nlicense: {}".format(
    SITE, SEASON, PROJECT_ID, WORKFLOW_ID, WORKFLOW_VERSION_MIN, ATTRIBUTION, LICENSE))

## Start the Pre-Processing

In [None]:
# Check Input Structure
!python3 -m pre_processing.check_input_structure \
--root_dir /home/packerc/shared/albums/{SITE}/{SEASON}/ \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# check for duplicates - this can take a while for large batches >> 100k images
!python3 -m pre_processing.check_for_duplicates \
--root_dir /home/packerc/shared/albums/{SITE}/{SEASON}/ \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Create Image Inventory
!python3 -m pre_processing.create_image_inventory \
--root_dir /home/packerc/shared/albums/{SITE}/{SEASON}/ \
--output_csv /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory_basic.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Perform basic Checks
!python3 -m pre_processing.basic_inventory_checks \
--inventory /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory_basic.csv \
--output_csv /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--n_processes 16

In [None]:
# Perform basic Checks (QSUB Version for larger seasons)
!qsub -v SITE={SITE},SEASON={SEASON} basic_inventory_checks.pbs

If we use the qsub version of the script we need to wait until it has completed. We can check the status of the script using the following command. 'Q' indicates the job is in the qeue and has not yet executed. 'C' means the job has finished. 'R' means the job is running.

In [None]:
!qstat

Once the job is running we can check the status of the log file using this command:

In [None]:
!tail /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Extract EXIF data
!python3 -m pre_processing.extract_exif_data \
--inventory /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory.csv \
--update_inventory \
--output_csv /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_exif_data.csv \
--exiftool_path /home/packerc/shared/programs/Image-ExifTool-11.31/exiftool \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Group Images into Captures
!python3 -m pre_processing.group_inventory_into_captures \
--inventory /home/packerc/shared/season_captures/{SITE}/inventory/{SEASON}_inventory.csv \
--output_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--no_older_than_year 2017 \
--no_newer_than_year 2019

In [None]:
# Rename all images
!python3 -m pre_processing.rename_images \
--inventory /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Generate Action List
!python3 -m pre_processing.create_action_list \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--action_list_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_action_list.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/ \
--plot_timelines

In [None]:
# Generate Actions
!python3 -m pre_processing.generate_actions \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--action_list /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_action_list.csv \
--actions_to_perform_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_actions_to_perform.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Apply Actions
!python3 -m pre_processing.apply_actions \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--actions_to_perform /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_actions_to_perform.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Update Captures
!python3 -m pre_processing.update_captures \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures.csv \
--captures_updated /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures_updated.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

In [None]:
# Create Cleaned File
cp /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures_updated.csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_captures_cleaned.csv

In [None]:
# (optional) Start over if issues persist
!python3 -m pre_processing.create_action_list \
--captures /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_captures_updated.csv \
--action_list_csv /home/packerc/shared/season_captures/{SITE}/captures/{SEASON}_action_list2.csv \
--log_dir /home/packerc/shared/season_captures/{SITE}/log_files/

## Upload Data to Zooniverse

The following codes can be used to upload data to Zooniverse.

In [None]:
!python3 -m zooniverse_uploads.generate_manifest \
--captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--output_manifest_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--images_root_path /home/packerc/shared/albums/{SITE}/ \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--manifest_id {SEASON} \
--attribution {ATTRIBUTION} \
--license {LICENSE}

In [None]:
# (Optional) - Generate machine learning file
!python3 -m zooniverse_uploads.create_machine_learning_file \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json

In [None]:
# Generating Predictions -- Need to Run in a separate Terminal

In [None]:
# Add model predictions to Manifest
!python3 -m zooniverse_uploads.add_predictions_to_manifest \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json

In [None]:
# Split / Batch Manifest
!python3 -m zooniverse_uploads.split_manifest_into_batches \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--max_batch_size 20000

In [None]:
# Upload Manifest
!python3 -m zooniverse_uploads.upload_manifest \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--project_id {PROJECT_ID} \
--password_file ~/keys/passwords.ini \
--image_root_path /home/packerc/shared/albums/{SITE}/

In [None]:
# We need to specify the subject_set_id in case of a failure
SUBJECT_SET_ID=''

In [None]:
# In case of an error when uploading the Manifest
!python3 -m zooniverse_uploads.upload_manifest \
--manifest /home/packerc/shared/zooniverse/Manifests/{SITE}/{SEASON}__complete__manifest.json \
--log_dir /home/packerc/shared/zooniverse/Manifests/{SITE}/ \
--project_id {PROJECT_ID} \
--subject_set_id {SUBJECT_SET_ID} \
--image_root_path /home/packerc/shared/albums/{SITE}/ \
--password_file ~/keys/passwords.ini

## Download Zooniverse Data

We can download Zooniverse data through Python. However, we first need to request the "Classification", as well as the "Subject" export on Zooniverse and wait for the mails indicating the exports are ready.

In [None]:
# Download Classifications
!python3 -m zooniverse_exports.get_zooniverse_export \
--password_file ~/keys/passwords.ini \
--project_id {PROJECT_ID} \
--output_file /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_classifications.csv \
--export_type classifications \
--log_dir /home/packerc/shared/zooniverse/Exports/{SITE}/

In [None]:
# Get Zooniverse Subject Data
!python3 -m zooniverse_exports.get_zooniverse_export \
--password_file ~/keys/passwords.ini \
--project_id {PROJECT_ID} \
--output_file /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects.csv \
--export_type subjects \
--log_dir /home/packerc/shared/zooniverse/Exports/{SITE}/

In [None]:
# (Optional) Get Worfklow ID / Worfklow Version
!python3 -m zooniverse_exports.extract_annotations \
--classification_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_classifications.csv \
--output_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_annotations.csv

In [None]:
# Extract Annotations
!python3 -m zooniverse_exports.extract_annotations \
--classification_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_classifications.csv \
--output_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_annotations.csv \
--workflow_id {WORKFLOW_ID} \
--workflow_version_min {WORKFLOW_VERSION_MIN}

In [None]:
# Extract Subject Data
!python3 -m zooniverse_exports.extract_subjects \
--subject_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects.csv \
--output_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects_extracted.csv

## Aggregate Annotations

The following codes are for aggregating annotations using the plurality algorithm.

In [None]:

!python3 -m zooniverse_aggregations.aggregate_annotations_plurality \
--annotations /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_annotations.csv \
--output_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality.csv \
--log_dir /home/packerc/shared/zooniverse/Aggregations/{SITE}/

In [None]:
# Now we add subject infos to the aggregations
!python3 -m zooniverse_exports.merge_csvs \
--base_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality.csv \
--to_add_csv /home/packerc/shared/zooniverse/Exports/{SITE}/{SEASON}_subjects_extracted.csv \
--output_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality_info.csv \
--key subject_id

## Create Reports

The following codes create several reports by joining subject / season data to the aggregated annotations.

In [None]:
# Reporting of Zooniverse exports - All Captures
!python3 -m reporting.add_aggregations_to_season_captures \
--season_captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--aggregated_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality_info.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/ \
--default_season_id {SEASON} \
--deduplicate_subjects

In [None]:
# Create Report Statistics
!python3 -m reporting.create_report_stats \
--report_path /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all_stats.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

In [None]:
# Reporting of Zooniverse exports - Only Consensus (and blanks)
!python3 -m reporting.add_aggregations_to_season_captures \
--season_captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--aggregated_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality_info.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_all.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/ \
--default_season_id {SEASON} \
--export_only_with_aggregations \
--deduplicate_subjects \
--export_only_consensus

In [None]:
# Create Report Statistics
!python3 -m reporting.create_report_stats \
--report_path /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_stats.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

In [None]:
# Create a Sample Report
!python3 -m reporting.sample_report \
--report_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_samples.csv \
--sample_size 300 \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

In [None]:
# Reporting of Zooniverse exports - only captures with consensus species
!python3 -m reporting.add_aggregations_to_season_captures \
--season_captures_csv /home/packerc/shared/season_captures/{SITE}/cleaned/{SEASON}_cleaned.csv \
--aggregated_csv /home/packerc/shared/zooniverse/Aggregations/{SITE}/{SEASON}_annotations_aggregated_plurality_info.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/ \
--default_season_id {SEASON} \
--export_only_species \
--deduplicate_subjects \
--export_only_consensus

In [None]:
# Create Report Statistics
!python3 -m reporting.create_report_stats \
--report_path /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species_stats.csv \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/

In [None]:
# Create a Sample Report
!python3 -m reporting.sample_report \
--report_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species.csv \
--output_csv /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/{SEASON}_report_species_samples.csv \
--sample_size 300 \
--log_dir /home/packerc/shared/zooniverse/ConsensusReports/{SITE}/