## Advanced project building

In [None]:
%load_ext lab_black

In [None]:
import pandas as pd
import getpass
import glob
import os
import io

### Define a handy reference to the data location for this tutorial

In [None]:
dataDirectory = ??

### Import the Panoptes Python Client 
The Panoptes Python Client allows you to manage your project using Python scripts. This can be very useful if you need to add large subject sets, request data exports or modify the properties of your subjects. It also allows you to set up advanced subject-specific training and feedback behaviour.

You may need to install the `panoptes_client` and `pandas` packages. If you do, then run the code in the next cell. Otherwise, skip it.

In [None]:
from panoptes_client import Panoptes, SubjectSet, Subject, Project

#### Authenticate with the Panoptes API.

In [None]:
username = input("Enter Panoptes Username:")
password = getpass.getpass("Enter Panoptes Password:")

In [None]:
panopteClient = Panoptes.connect(username=username, password=password)

#### Find our project
Enter your project's ID in the field below.

In [None]:
projectId = ??

We can use the `find` method on the `Project` class, passing our project's ID.

In [None]:
project = Project.find(projectId)

Let's check we found the right project and display some details about it.

In [None]:
print("Name:", project.display_name)
print("Description:", project.description)
print("Introduction:", project.introduction)
print("Number of Subjects:", project.subjects_count)
print("Number of Classifications:", project.classifications_count)

### Now let's create a new subject set using the Python API
Start by instantiating a new `SubjectSet` object and giving it a `display_name`

In [None]:
subjectSet = SubjectSet()
subjectSet.display_name = "Python Demo Set"

We must also link this subject set with our project. We do so like this.

In [None]:
subjectSet.links.project = project

For now, our subject set only exists in our computer's memory. To actually add it to our project, we must use the `save` method. When we save our new subject set the Panoptes API, sends us a response to let us know that the save succeded.

In [None]:
response = subjectSet.save()

We've successfully created a new subject set, but it's empty! We need to add some subjects to it!

### Creating new subjects
Subjects are created in Python as instances of the `Subject` class, for example
```python
newSubject = Subject()
```
Once we have created an empty `Subject`, we need to link it with our `Project`, like this:
```python
newSubject.links.project = project
```
Now we can add one or more images (or other media types) to it using its `add_location` method.
```python
newSubject.add_location(["image1.jpg", "image2.jpg"])
```
We can also use the Python client to add metadata to our `Subject` by assigning a Python `dict` to its `metadata` attribute like this
```python
newSubjectMetadata = { "Number of Images" : 2 }
newSubject.metadata = newSubjectMetadata
```
Just like `SubjectSet`s, subjects must also be `save`d.
```python
newSubject.save()
```
Finally, we should add our new subject to our `SubjectSet`.
```python
subjectSet.add(newSubject)
```
Now let's try this out for real! We'll use the images in our second demo test set (`demoset_10-20`). Let's get a list of those files.

In [None]:
imageFiles = glob.glob(os.path.join(dataDirectory, "demoset_10-20/*jpg"))
imageFiles

We'll use those files to create a list of new `Subjects` and add some metadata.

In [None]:
newSubjects = []

for imageFile in imageFiles:

    newSubject = Subject()
    newSubject.links.project = project
    newSubject.add_location(imageFile)
    newSubject.metadata = {
        "Origin": "Python demo",
        "image": os.path.basename(imageFile),
        "#Local File Path": imageFile,
    }
    newSubject.save()

    newSubjects.append(newSubject)

Now we can assign our newly uploaded `Subject`s to the `SubjectSet` we already created.

In [None]:
subjectSet.add(newSubjects)

### Improving the volunteer experience
Our volunteers are an enthusiastic bunch so we decide to reward them by providing some links to extra information about the subjects they're classifying. We'll add these links to the existing subjects as metadata.

**But**, we don't want to influence our volunteers' decisions, so we'll add metadata that can only be seen from the _Talk_ interface, **after** the volunteers have finished their classifications. To do that, we prepend our metadata names with a `!` character.

Fortunately, _Adam McMaster_ has already done the hard work of finding and associatimg the information with the SuperWASP targets. We just need to use the Python client (also developed by Adam, by the way!) to update the metadata of our subjects. We can load those data from a CSV file to get started.

In [None]:
extraSubjectInfo = pd.read_csv(
    os.path.join(dataDirectory, "extendedManifest.csv")
).set_index("Subject")
extraSubjectInfo

### Getting a list of subject sets for our project

Let's imagine that we haven't already got a reference to our subject sets. Maybe we are updating our project after it's run for a few months. How would we easily get a list of all the subject sets to update? If we know our project's ID number, it's easy!

In [None]:
project = Project.find(projectId)
subjectSets = project.links.subject_sets
for subjectSet in subjectSets:
    print(subjectSet.display_name)

Now we can loop over the subjects in all our sets and update their metadata appropriately. Every `SubjectSet` has an attribute `subjects` which points to a list of the `Subject`s it contains. 

Luckily, all our subjects have an `image` metadatum that we can use to look up the extra metadata in the table we loaded.

We just use a bit of Python magic to turn the data in the right row into a `dict` with the table column headings as the keys and the row entries as values. We mustn't forget to add a `!` character to the beginning of the key strings. Once that's done we can use our `dict` to update each `Subject`'s metadata.

Don't forget to `save` the subject afterwards!

In [None]:
for subjectSet in subjectSets:
    for subject in subjectSet.subjects:
        imageId = int(subject.metadata["image"][:-4])
        extraData = {"!" + k: v for k, v in dict(extraSubjectInfo.loc[imageId]).items()}
        subject.metadata.update(extraData)
        subject.save()

Now we can check in our web browser to see whether the metadata were added properly!

### Downloading Classification and Subject Data
The `Project` and `Workflow` classes are both "exportable". This means that you can donload a data export in the same way you would from the _Exports_ tab in the project builder.

To download a data export we call the `get_export` method on a `Project` or `Workflow` instance. When you call `get_export` you **must** specify an export type as either `"subjects"` or `"classifications"`. You also have a few other options:

* `generate`  - If true then the requested data export will be generated if it does not already exist and regenerated if it does. The default is `False`.
* `wait` -  If (re)generation is requested and `wait` is `True`, then your program will wait for the new export to be generated and then download it (remember, for large exports this might take a long time). On the other hand if `wait` is `False` (the default) then a new export will be requested but the program will not wait. You should wait until you get an email telling you that your export is ready, then call `get_export` again with `generate=False`.
* `wait_timeout` - the maximum length of time (in seconds) that you are willing to wait for a new export to be generated.

Here's how to get the classification and subject exports for our project using Python. In the first case, we know that we already generated a classifications export from the project builder, so we leave `generate` as its default `False` value. In the second case, we specify that we want to generate a new subjects export (because we've updated the metadata) and that we're willing to wait for the export to finish.

In [None]:
classificationsResponse = project.get_export("classifications")
subjectsResponse = project.get_export("subjects", generate=True, wait=True)

These calls return Python `Response` objects, which aren't particularly useful without some processing. Luckily, we can use the _Pandas_ package to get thim into a more recognizeable form.

In [None]:
classifications = pd.read_csv(io.BytesIO(classificationsResponse.content))
subjects = pd.read_csv(io.BytesIO(subjectsResponse.content))

In [None]:
subjects

### Setting up Feedback and Volunteer Training
The Zooniverse platform includes an **experimental** feature that allows you to provide immediate feedback to volunteers after they make a classification. This can be very useful if you want to improve volunteers' confidence when they first join the project, or provide training if your analysis task is somewhat complicated.

The fact that feedback is an experimental feature means that you need to submit a request for it to be activated on your project. Send an email to `contact@zooniverse.org` explaining that you would like feedback to be switched on and your reasons form making the request. Once feedback is enabled a new set of options will appear in the task definition section of the project builder.

Feedback is available for several of the classification and annotation tools, including:
1. Point mark drawing tools - you can specify a circular or elliptical region into which the volunteer's mark must fall in order to be deemed "correct". If they place a mark in the region, the system informs them that they scored a "hit", otherwise they are informed that they missed, and the correct solution is displayed.
2. Question tasks - if a **single** answer is required, you can specify the correct answer and inform the volunteer if they got it right. 

For our project all our tasks are _single-answer questions_, so we'll demonstrate how to set up that type of feedback. 

1. The first step in setting up feedback is to specify a set of _task-level_ "feedback rules" in the project builder. 
2. Once you have done that we need some way of specifying what the correct answer for each subject is. Since the correct answer for each task will likely vary from subject to subject, the we use _hidden_ metadata on specific subjects to specify it.
3. If the feedback system doesn't find any relevant hidden metadata then no feedback is displayed - the subject is treated as normal.

So how do we specify those hidden metadata? Well, all metadata associated with the feeback system have names starting with `#feedback_`. The next part of the name (before a final suffix) is an integer like `1`, or `2`, so our metadatum name might become `#feedback_1` or `#feedback_2`. The number allows you to associate several different feedback _rules_ with the same subject. When you set up rules in the project builder, you assign them all a unique ID number. The number in the metadatum name lets you map the information in that metadatum to a particular rule. However, **be careful**! The number in the metadatum name is not _neccessarily_ the rule ID number. We link a metadata name number with a rule ID using the `_id` suffix for the metadatum name. For example if we want to link all metadata for a particular subject with the number `2` in their **name** to the **rule with ID** `3`, we would add a metadatum to that subject like this:
```python
subject.metadata.update({ "#feedback_2_id" : "3" })
```
Yes, it's a bit complicated but it does provide extra flexibility to the system so that you can have different feedback for different subjects and tasks in a workflow. Now, what other metadata do we need to specify? 

For the single answer question task, we need to specify the correct answer for each subject too! We do that using the `_answer` suffix. The value of this metadatum corresponds to the position of the correct answer in the list of options you specified in the project builder, **starting from zero**. So for example, if the possible answers were:
* Good
* Bad
* Ugly

and the correct answer was "Ugly" (at position `2`), then we would add a metadatum to our subject like this:
```python
subject.metadata.update({ "#feedback_2_answer" : 2 })
```
Remember the `2` in `"#feedback_2_answer"` refers to the rule-to-metadatum mapping, while the value `2` refers to the position of the correct answer.

Finally we also have an option to add subject-specific messages that will be shown to the volunteer. If you don't specify a subject-specific message, then the workflow/task generic message that must be specified in the project builder will be displayed. 

Let's assume that we knew a particular subject was really difficult and we wanted to especially congratulate (or comisserate) volunteers who got the right answer (or the wrong answer). We can do that using the `_successMessage` and `failureMessage` suffixes. For our example we would add the following metadata.

```python
subject.metadata.update({ 
    "#feedback_2_successMessage" : "Well done! That one was tricky!" ,
    "#feedback_2_failureMessage" : "Hard luck. That one was very difficult!"
})
```

Okay, let's add feedback metadata to some of the subjects in our project. We'll assume that we've already inspected the images, so we know that correct answers for a set of images. We'll also assume that we identified a couple of difficults ones too. We'll read those data in from a file using the _Pandas_ package.

In [None]:
correctAnswers = pd.read_csv(
    os.path.join(dataDirectory, "demoset_0-10/correctAnswers.csv")
).set_index("image")
correctAnswers

Now we can loop over all our subjects again and add feedback to the ones we know the answer for (as it happens the're all from the set we uploaded in the project builder!). For each subject, we'll check whether we have an answer and if we do we'll create some new metadata and add it to our subject.

In [None]:
def makeFeedbackMetadata(answer, rule=1, number=1):
    feedbackMetadata = {
        f"#feedback_{number}_id": f"{rule}",
        f"#feedback_{number}_answer": f"{answer.category_position}",
    }
    if answer.difficult:
        feedbackMetadata.update(
            {
                f"#feedback_{number}_successMessage": "Well done! That one was tricky!",
                f"#feedback_{number}_failureMessage": "Hard luck. That was a tricky one",
            }
        )

    return feedbackMetadata


for subjectSet in subjectSets:
    for subject in subjectSet.subjects:
        imageId = int(subject.metadata["image"][:-4])
        if imageId in correctAnswers.index:
            subject.metadata.update(makeFeedbackMetadata(correctAnswers.loc[imageId]))
        subject.save()

### Tapering feedback frequency
Your volunteers will probably benefit more from feedback when they first join the project and after a while, it might just become annoying!

The Zooniverse plaform allows you to control how frequently subjects with feedback are shown to volunteers, depending on how many classifications they have performed. Note that because the system needs to know how many subjects a volunteer has classified, it **only works for users who have a Zooniverse account and have logged in**.

Technically, this works by controlling the how frequently subjects are selected from **different subject sets**, so you would achieve a tapered feedback by putting all your subjects with feedback in their own subject set and then tapering the frequency with which subjects are drawn from that set.

Luckily, we have two subject sets and we just added feedback metadata to one of them, so let's see how we'd implement tapered feedback. We do this by setting some attributes of the _Workflow_ configuration.

Zooniverse workflows are represented in Python as instances of the `Workflow` class. We can get a reference to the workflow for our project like this.

In [None]:
workflow = project.links.workflows[0]
print(workflow.display_name)

Now that we have access to our workflow in Python we need to edit its configuration `dict` and add three new entries.

1. `training_set_ids` - a Python `list` holding IDs of the subject sets that contain "training subjects" i.e. those with feedback metadata.
2. `training_chances` - a Python `list` containing values between 0 and 1. Each value specifies the probability that a particular subject that the volunteer sees will be drawn from one of the training sets. The position of the value in the list corresponds to the number of classifications by the volunteer, so if the first value in the list is 1, then the first subject the volunteer sees is guaranteed to be a training subject. If the fifth value in the list is 0.5 then there is only a 50% chance that this will be a training subject.
3. `training_default_chance` - Once the `training_chances` list is exhausted, then the probability for selecting subjects from the training sets will assume this value.

For this example, we will force the first three subjects to be training subjects, specify a 50% chance for the next three and then default to a probability of zero thereafter. We do that like this:

In [None]:
workflow.configuration["training_set_ids"] = [
    int(sset.id)
    for sset in project.links.subject_sets
    if sset.display_name == "Demo Set"
]
workflow.configuration["training_chances"] = [1, 1, 1, 0.5, 0.5, 0.5]
workflow.configuration["training_default_chance"] = 0

Time to try it out!