# Using the ml4qc package with SurveyCTO

This workbook demonstrates how to use the `ml4dq` Python package to work with [SurveyCTO](https://www.surveycto.com) data.

## Reading credentials and configuration

This example workbook begins by loading credentials and configuration from an `.ini` file stored in `~/.ocl/surveydata-surveycto-examples.ini`. The `~` in the path refers to the current user's home directory, and the `.ini` file contents are as follows:

    [aws]
    accesskeyid=idhere
    accesskeysecret=secrethere
    s3bucketname=bucketnamehere
    region=regionnamehere
    ddbtablename=tablenamehere

    [surveycto]
    server=servernamehere
    username=emailhere
    password=passwordhere
    formid=formidhere
    privatekey=-----BEGIN RSA PRIVATE KEY-----
     FROM THE SECOND LINE TO THE LAST
     EACH KEY LINE
     MUST BE INDENTED
     BY AT LEAST ONE SPACE
     -----END RSA PRIVATE KEY-----

Feel free to update the path in `inifile_location` below, and only include properties as needed for the example cases you wish to execute.

*A note on SurveyCTO access:* While the `surveydata` Python package requires only a read-only login to SurveyCTO, this `ml4qc` package offers the option to submit submission reviews, which also requires write access.

In [1]:
# for convenience, auto-reload modules when they've changed
%load_ext autoreload
%autoreload 2

import configparser
import os

# load credentials and other configuration from a local ini file
inifile_location = os.path.expanduser("~/.ocl/surveydata-surveycto-examples.ini")
inifile = configparser.RawConfigParser()
inifile.read(inifile_location)

# load SurveyCTO credentials and configuration
scto_server=inifile.get("surveycto", "server")
scto_username=inifile.get("surveycto", "username")
scto_password=inifile.get("surveycto", "password")
scto_formid=inifile.get("surveycto", "formid")
scto_private_key=inifile.get("surveycto", "privatekey")

# load AWS credentials and configuration
aws_accesskey_id = inifile.get("aws", "accesskeyid")
aws_accesskey_secret = inifile.get("aws", "accesskeysecret")
s3_bucketname = inifile.get("aws", "s3bucketname")
aws_region = inifile.get("aws", "region")
ddb_tablename = inifile.get("aws", "ddbtablename")

## Synchronizing data between SurveyCTO and local file system

To start, we'll synchronize data directly between SurveyCTO and the local file system. The synchronization process will be efficient, using a stored cursor to only download and store new or updated data, and it will include both submission data and all attachments. We'll load all data and [text audits](https://docs.surveycto.com/02-designing-forms/01-core-concepts/03zd.field-types-text-audit.html) into DataFrames. We'll also specify that we want submissions with *any* review status (pending, approved, or rejected), as the default sync only includes approved submissions.

The SurveyCTO credentials and form ID will be those loaded earlier, from the `.ini` file. We recommend creating a new user role for API access, which allows API access as well as read-only access to forms and data (as well as *Can modify or delete data* access if you will be submitting submission reviews later on). The `privatekey` property in the `.ini` file is optional, to be used when the SurveyCTO form is encrypted.

In this example, data is synchronized to the `~/Files/surveydata/formid/` folder tree, where `~` refers to the current user's home directory and `formid` is the SurveyCTO form ID loaded from the `.ini` file.

In [2]:
from ml4qc.surveyctomlplatform import SurveyCTOMLPlatform
from surveydata.filestorage import FileStorage

# initialize the survey platform connection
platform = SurveyCTOMLPlatform(scto_server, scto_username, scto_password, scto_formid, scto_private_key)

# initialize the local file storage location
storage = FileStorage(os.path.expanduser("~/Files/surveydata/" + scto_formid + "/"))

# synchronize data to ensure storage is up-to-date
new_submissions = platform.sync_data(storage, review_statuses=["pending", "approved", "rejected"])
print(f"Count of new submissions sync'd to storage: {len(new_submissions)}")
print(f"List of new submissions sync'd to storage: {new_submissions}")
print()

# output details about what's present in storage
print(f"Submissions in storage: {storage.list_submissions()}")
print()
print(f"Attachments in storage: {storage.list_attachments()}")
print()

# load all submissions into DataFrame and describe contents
submissions_df = SurveyCTOMLPlatform.get_submissions_df(storage)
print("Submission DataFrame field counts:")
print(submissions_df.count(0))
print()

# summarize submission review statuses
print("Submission DataFrame review statuses:")
print(submissions_df.review_status.value_counts())
print()

# load all text audits into DataFrame and describe contents
textaudit_df = SurveyCTOMLPlatform.get_text_audit_df(storage, location_strings=submissions_df.textaudit)
if textaudit_df is not None:
    print("Text audit DataFrame field counts:")
    print(textaudit_df.count(0))
else:
    print("No text audits found.")

Count of new submissions sync'd to storage: 0
List of new submissions sync'd to storage: []

Submissions in storage: ['uuid:a45f2d93-af11-43db-842f-e2227f022c6e', 'uuid:5e5a40ce-bce2-4225-856e-224f13f3fafa', 'uuid:d8880923-8a5d-4c7a-9b66-994a496b2ae8', 'uuid:7fac8029-4b31-49f5-83e1-ef9bf7ac1db0', 'uuid:d5f8b82d-9ef1-41ff-afc2-16eab7b8275d', 'uuid:8d744680-238a-454d-8e83-d168b1da1aaf', 'uuid:f32f8fde-44e1-44fb-9a45-0fc4960b7a77', 'uuid:66767ff8-919e-44d3-b6db-784091a3de37']

Attachments in storage: [{'name': 'AA_f32f8fde-44e1-44fb-9a45-0fc4960b7a77_AFTER_0S.m4a', 'submission_id': 'uuid:f32f8fde-44e1-44fb-9a45-0fc4960b7a77', 'location_string': 'file:/Users/crobert/Files/surveydata/all_fields_for_testing_enc/uuid%3Af32f8fde-44e1-44fb-9a45-0fc4960b7a77/AA_f32f8fde-44e1-44fb-9a45-0fc4960b7a77_AFTER_0S.m4a'}, {'name': 'TA_f32f8fde-44e1-44fb-9a45-0fc4960b7a77.csv', 'submission_id': 'uuid:f32f8fde-44e1-44fb-9a45-0fc4960b7a77', 'location_string': 'file:/Users/crobert/Files/surveydata/all_fields

## Submitting submission updates (commenting and/or updating review status and quality classification)

Below is example code for updating submissions with comments and/or reviews. Any number of updates can be submitted in a single batch, but *Can modify or delete data* access is required for those updates to be accepted by the server.

After running the following cell, you should run the above cell to re-sync with the server, fetching updated submission data.

In [51]:
# try submitting one or more submission updates

# organize update bundle (list of individual updates)
#   example: just add a comment to a pending submission
submission_updates=[{"submissionID": "uuid:5e5a40ce-bce2-4225-856e-224f13f3fafa", "comment": "Another custom comment added via Python"}]
#   example: update submission with "okay" quality classification
#submission_updates=[{"submissionID": "uuid:5e5a40ce-bce2-4225-856e-224f13f3fafa", "qualityClassification": "okay"}]
#   example: reject submission with no quality classification
#submission_updates=[{"submissionID": "uuid:5e5a40ce-bce2-4225-856e-224f13f3fafa", "reviewStatus": "rejected"}]
#   example: reject submission with "poor" quality classification
#submission_updates=[{"submissionID": "uuid:5e5a40ce-bce2-4225-856e-224f13f3fafa", "reviewStatus": "rejected", "qualityClassification": "poor"}]
#   example: revert submission back to pending (unreviewed) status
#submission_updates=[{"submissionID": "uuid:5e5a40ce-bce2-4225-856e-224f13f3fafa", "reviewStatus": "none", "comment": "(Example custom comment to explain why we're reverting back to pending status)"}]
#   example: revert submission back to pending (unreviewed) status
#submission_updates=[{"submissionID": "uuid:a45f2d93-af11-43db-842f-e2227f022c6e", "reviewStatus": "none", "comment": "(Example custom comment to explain why we're reverting back to pending status)"}]

#   submit bundle of reviews
platform.update_submissions(submission_updates)

## Loading data from SurveyCTO export

Here, we'll take the simplest case: wide-format data exported from [SurveyCTO Desktop](https://docs.surveycto.com/05-exporting-and-publishing-data/02-exporting-data-with-surveycto-desktop/01.using-desktop.html). The `surveydata` package makes it easy to load all submissions into a Pandas DataFrame — and also load all [text audits](https://docs.surveycto.com/02-designing-forms/01-core-concepts/03zd.field-types-text-audit.html) into a DataFrame when needed.

This example doesn't utilize any external services or require any credentials, so it doesn't use anything loaded from the `.ini` file above. It references a wide-format export file in the location exported by SurveyCTO Desktop, and it presumes that the `media` subdirectory is also present with all attachments.

In [58]:
%%time

from ml4qc.surveyctomlplatform import SurveyCTOMLPlatform
from surveydata.surveyctoexportstorage import SurveyCTOExportStorage

# initialize local storage with wide-format export and attachments_available=True since media subdirectory is present
storage = SurveyCTOExportStorage(export_file=os.path.expanduser("~/ml4qc-data/collab1/cati1/collab1.csv"), attachments_available=True)

# output details about what's present in storage
print(f"Submissions in storage: {storage.list_submissions()}")
print()
# note that we can't list attachments in storage since the media directory can mix attachments from multiple forms
#print(f"Attachments in storage: {storage.list_attachments()}")
#print()

# load all submissions into DataFrame and describe contents
submissions_df = SurveyCTOMLPlatform.get_submissions_df(storage)
print("Submission DataFrame field counts:")
print(submissions_df.count(0))
print()

# summarize submission review status and quality
print("Submission DataFrame review status and quality:")
print(submissions_df.review_status.value_counts())
print(submissions_df.review_quality.value_counts())
print()

# load all text audits into DataFrame and describe contents
textaudit_df = SurveyCTOMLPlatform.get_text_audit_df(storage, location_strings=submissions_df.TA)
if textaudit_df is not None:
    print("Text audit DataFrame field counts:")
    print(textaudit_df.count(0))
else:
    print("No text audits found.")

Submissions in storage: ['uuid:3a471cdd-713e-4438-a24b-23d139e3d2cd', 'uuid:5929c06a-b550-48f2-94a4-22395829f030', 'uuid:602b477a-c9e8-4c67-b2df-8f7678ec8a82', 'uuid:95b1f038-53c2-4bb9-89b7-9e9eb639a8bc', 'uuid:73444084-fadf-4b92-8f82-bf854f27ba32', 'uuid:331f942c-e10f-4c8d-a01d-263cd1bf8d39', 'uuid:69f6b2da-88dd-4545-b2be-c6fe8ea9e1d7', 'uuid:482fdb19-d48e-4180-95b7-d4047cef7938', 'uuid:3c7c24d3-04a1-41e2-a7e7-c45ebd8e0e7f', 'uuid:e02be47b-2d68-46c0-a5ce-169c25f8fb79', 'uuid:7731b3db-b0a0-4607-930b-2caf0c78903f', 'uuid:994c15a6-154d-450a-b06b-38f89399ec9e', 'uuid:afcc9212-b1c1-46ec-9589-1f60ba77e27f', 'uuid:0db26d04-8537-4e16-a1dd-3cfa7e8fcfc8', 'uuid:02b1ddc8-04a9-4c6b-91a3-ab4497e87a49', 'uuid:8c431018-5d14-494b-beb6-2d386cbe63a5', 'uuid:4a26a26d-de19-4e4d-ba6b-0d14f7c64272', 'uuid:f1bded65-cf6e-4e7d-907c-4dfd1aff8cb4', 'uuid:1ebc55c1-fd7a-496a-8711-6c32e7e3e11b', 'uuid:e0727fb9-b35a-42f8-b7a9-9f3407366cdb', 'uuid:8a9ce6b4-e270-479c-a532-cc6fe7246b1b', 'uuid:24e3a4d9-79d8-4549-a96f-

In [45]:
# TBD TBD TBD TESTING

from pytz import timezone

# create combined textaudit column
submissions_df["all_ta"] = submissions_df["textaudit"] + submissions_df["textaudit_full"]

# load combined text audits into DataFrame and describe contents
textaudit_combined_df = SurveyCTOMLPlatform.get_text_audit_df(storage, location_strings=submissions_df["all_ta"])
if textaudit_combined_df is not None:
    print("Text audit DataFrame field counts:")
    print(textaudit_combined_df.count(0))
else:
    print("No text audits found.")

ta_summary = SurveyCTOMLPlatform.process_text_audits(textaudit_combined_df, submissions_df["starttime"], submissions_df["endtime"], storage.get_data_timezone(), timezone("US/Eastern"))

Text audit DataFrame field counts:
device_time      46
form_time_ms     46
field            53
event            46
duration_ms      37
Choice values     2
Choice labels     2
duration_s       16
visited_s        16
dtype: int64
