<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>Citzen Science Notebook</b> <br>
Contact author: Clare Higgs & Eric Rosas <br>
Last verified to run: 2022-10-20 <br>
LSST Science Piplines version: Weekly 2022_40 <br>
Container size: medium <br>


## 1.0 Introduction
This notebook is intended to guide a PI through the process of sending data from the Rubin Science Platform (RSP) to the Zooniverse.
A detailed guide to Citizen Science projects, outlining the process, requirements and support available is here: (*link to citscipiguide*)
The data sent can be currated on the RSP as a necessary and take many forms. Here, we include an example of sending png cutout images. 
We encourage PIs new to the Rubin dataset to explore the tutorial notebooks and documentation.

As explained in the guide, this notebook will restrict the number of object sent to the Zooniverse to 100 objects. This limit is intended to demonstrate your project prior to full approval from the EPO Data Rights Panel. 

Support is available and questions are welcome - (*some email/link etc*)


**DEBUG VERSION note that this version of the notebook contains additional debugging and the first cell will need to be run once**

### Log in to the Zooniverse Platform & Activate Citizen Science SDK

If you haven't already, create a Zooniverse account here. and create your project. Your project must be set to "public". To set your project to public, select the "Visibility" tab. Note you will need to enter your username, password, and project slug below.

After creating your account and project, return to this notebook.

---

Supply your email and project slug below. 

A "slug" is the string of your Zooniverse username and your project name without the leading forward slash, for instance: "username/project-name". 

For more details, see: https://www.zooniverse.org/talk/18/967061?comment=1898157&page=1.

IMPORTANT: Your Zooniverse project must be set to "public", a "private" project will not work. Select this setting under the "Visibility" tab, (it does not need to be set to live). The following code will not work if you have not authenticated in the cell titled "Log in to Zooniverse".

In [1]:
email = "beckynevin@gmail.com" # Please continue to use the same email address moving forward, as this is how we associate 
slugName = "rebecca-dot-nevin/test-project" # Replace this placholder text with your slug name, do not include the leading forward-slash
%run Citizen_Science_SDK.ipynb

Installing external dependencies...
Done installing external dependencies!
Enter your Zooniverse credentials...


Username:  rebecca.nevin
 ········


You now are logged in to the Zooniverse platform.
Loaded Citizen Science SDK


## 2.0 Make a Subject Set to Send

Here, the subject set of objects to send to Zooniverse should be curated. This can (and should!) be modified to create your own subject set. Your subject set must have 100 objects or less in the testing phase before your project is approved by the EPO Data Rights panel. 

Currently, this example makes a set of image cutouts of extended sources. 

In [2]:
import utils

config = 'dp02'
collection = '2.2i/runs/DP0.2'
service, butler, skymap = utils.setup_butler(config, collection)   

In [3]:
max_rec=5 # make 100 for full subject set test
use_center_coords = "62, -37"
use_radius = "1.0"

Query can be modified to other sources - currently just selecting 10 objects (change max_rec above)

In [4]:
query = "SELECT TOP " + str(max_rec) + " " + \
        "objectId, coord_ra, coord_dec, detect_isPrimary " + \
        "g_cModelFlux, r_cModelFlux, r_extendedness, r_inputCount " + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND r_extendedness = 1 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 18.0 " + \
        "ORDER by r_cModelFlux DESC"
results = service.search(query)
assert len(results) == max_rec

In [5]:
results_table = results.to_table().to_pandas()
results_table['dataId'] = results_table.apply(lambda x: utils.get_bandtractpatch(x['coord_ra'], x['coord_dec'], skymap), axis=1)

### Additional Data to Send
You may desire to send additional data in addition to the image cutout. The fields represented as strings within the `fields_to_add` array will be sent along with each image. If there are any fields that you do not need then feel free to remove them from the array.

__Note:__ : Object ID is always included.

In [6]:
# In-memory manifest file as an array of dicts
manifest = []

# Specify the directory that the cutouts will be output to:
batch_dir = "./cutouts/"

# Create directory if it does not already exist
if os.path.isdir(batch_dir) == False:
    os.mkdir(batch_dir)

# Loop over results_table, or any other iterable provided by the PI:
for index, row in results_table.iterrows():
    # Use the Butler to get data based on the data within the iterable
    deepCoadd = butler.get('deepCoadd', dataId=row['dataId'])
    filename = "cutout"+str(row['objectId'])+".png"
    figout = utils.make_figure(deepCoadd, batch_dir + filename)
    
    # Create the CSV-file-row-as-dict 
    csv_row = {
        "filename": filename, # required column, do not change the column name
        "sourceId": row.objectId, # required column, do not change the column name
        # Add your desired columns:
        "coord_ra": row.coord_ra,
        "coord_dec": row.coord_dec,
        "g_cModelFlux": row.g_cModelFlux,
        "r_cModelFlux": row.r_cModelFlux,
        "r_extendedness": row.r_extendedness,
        "r_inputCount": row.r_inputCount
    }
    manifest.append(csv_row)
    utils.remove_figure(figout)
    

## 3.0 Preparing the Manifest File

The manifest file _must_ abide by [RFC4180](https://datatracker.ietf.org/doc/html/rfc4180.html) as the backend service that parses the manifest file expects that this is so. In addition, you may have a column with no values, but there _must_ be an empty column value indicated with a comma. E.g.:

Valid syntax for empty column:
```
column1,column2,empty_column,column4
1,1,,4
1,1,,4
1,1,,4
```

**Important!**: The manifest file must be named `metadata.csv` in order for the processing on the backend to work correctly!

### Option 1: Write the manifest file to the filesystem automatically

Running the below cell should take care of writing the `metadata.csv` file to the filesystem, which will ultimately be used as the manifest file by Zooniverse. You are welcome to edit the automatically created manifest file, just ensure that it's format abides by RFC4180.


In [7]:
manifest_path = write_metadata_file(manifest, batch_dir)

print("The manifest CSV file can be found at the following relative path:")
print(manifest_path)

The manifest CSV file can be found at the following relative path:
./cutouts/metadata.csv


### Option 2: Specify the path to your own manifest file
If desirable, specify the manifest CSV file manually. This is a simple matter of ensuring that it is named `metadata.csv` and placed in the `./cutouts/` folder (or whatever you renamed the `batch_dir` variable to)

## 4.0 Send the cutouts to Zooniverse

Send your subject set to the Zooniverse. This cell will let you send one subject set. If you already have a set on Zooniverse, it will notify you and fail. If you want to send more data, delete what is on the Zooniverse and send again. You *may* get a warning that your set still exists or a "Could not find subject_set with id=' '" error. If so, wait (~10min) and try again, as Zooniverse takes a minute to process your changes. You may also have re-run the "Look up your project cell". Don't click the below cell multiple times, the upload will fail if multiple runs are attempted.

It has successfully worked if you get nofication and an email saying your data has been sent.

### Name the new subject set
Name your subject set as it will appear on the Zooniverse. Try not to reuse names. 

In [11]:
subject_set_name = "set_2"
subject_set_name

'set_2'

In [12]:
__cit_sci_data_type = _HIPS_CUTOUTS # Important: DO NOT change this value. Update - this value may be changed.
send_data(subject_set_name, batch_dir, manifest)

'1. Checking batch status'



'2. Zipping up all the astro cutouts - this can take a few minutes with large data sets, but unlikely more than 10 minutes.'

'3. Uploading the citizen science data'

'4. Creating a new Zooniverse subject set'

'5. Notifying the Rubin EPO Data Center of the new data, which will finish processing of the data and notify Zooniverse'

'6. Success! The URL to the manifest file can be found here:'

'https://storage.googleapis.com/citizen-science-data-public/bc6e24c1-9261-46d2-a03e-9a787c04abdb/manifest.csv'

'7. Sending the manifest URL to Zooniverse'

'** Information: subject_set.id: 112644; manifest: https://storage.googleapis.com/citizen-science-data-public/bc6e24c1-9261-46d2-a03e-9a787c04abdb/manifest.csv'

'8. Transfer process complete, but further processing is required on the Zooniverse platform and you will receive an email at beckynevin@gmail.com'

In [13]:
batch_dir

'./cutouts/'

## Download Batch Metadata
This functionality is in an experimental/alpha state and as such unexpected behavior may occur. Do not attempt to run this cell without first running the top cell in this notebook that prompts you to log in to the Zooniverse platform.

In [None]:
test = download_batch_metadata()
test

### Explicitly check the status of your data batch
Is the send_data() call above stalling on "Notifying the Rubin EPO Data Center..." step? Run the below cell every few minutes to check the status of your data. Large datasets can cause the response to get lost, but that does not necessarily mean that your data was not sent to Zooniverse.

In [None]:
res = check_status()
print("Status:")
print(res["status"])
print("Manifest:")
print(res["manifest_url"])
print("Messages:")
print(res["messages"])
if res["status"] == "success":
    global manifest_url
    manifest_url = res["manifest_url"]
    send_zooniverse_manifest()

## Retrieve the data
There are two ways to do this:
1) Programatically (we show this here), 
2) By directly going to your Zooniverse project and downloading the output csv files

First, let's do it programatically. (Note that this currently only works if you are the owner of the project.)

In [17]:
import panoptes_client
client = panoptes_client.Panoptes.connect(login="interactive")

# The above should be happening already in the SDK cell

###############################################################################################

# First get the workflow ID from the Zooniverse API:

clares_project_id = 19539
sets = client.get(f"/workflows?project_id={clares_project_id}", {}, {
  'Accept': 'application/vnd.api+json; version=1',
  'Content-Type': 'application/json'
})

print("Sets : ")
print(sets)

# Then get the href/URL for the workflow classification output:

workflow_id = '23254' # change this workflow.id found in the above print(sets) output
workflow = client.get(f"/workflows/{workflow_id}", {}, {
  'Accept': 'application/vnd.api+json; version=1',
  'Content-Type': 'application/json'
})

print("\n\nWorkflow : ")
print(workflow)

# Then attempt to retrieve the classification export:

classification_export_href = '/workflows/23254' # Change this to the workflows.classifications_export.href value in the above print(workflow) output

# The below line fails for eric
classification = client.get(classification_export_href, {}, {
  'Accept': 'application/vnd.api+json; version=1',
  'Content-Type': 'application/json'
})

print("\n\nClassifications : ")
print(classification) 

Enter your Zooniverse credentials...


Username:  rebecca.nevin
 ········


Sets : 
({'workflows': [{'id': '23254', 'display_name': 'Classification', 'tasks': {'T0': {'help': 'Is this a galaxy?', 'type': 'single', 'answers': [{'label': 'Yes'}, {'label': 'No'}], 'question': 'Is this a galaxy?', 'required': True}}, 'steps': [], 'classifications_count': 0, 'subjects_count': 200, 'created_at': '2023-01-05T16:59:46.979Z', 'updated_at': '2023-01-05T17:07:32.650Z', 'finished_at': None, 'first_task': 'T0', 'primary_language': 'en', 'version': '9.7', 'content_language': 'en', 'prioritized': False, 'grouped': False, 'pairwise': False, 'retirement': {'options': {'count': 1}, 'criteria': 'classification_count'}, 'retired_set_member_subjects_count': 0, 'href': '/workflows/23254', 'active': True, 'mobile_friendly': False, 'aggregation': {}, 'configuration': {'invert_subject': True}, 'public_gold_standard': False, 'completeness': 0.0, 'links': {'project': '19539', 'subject_sets': [], 'tutorial_subject': None, 'published_version': None, 'attached_images': {'href': '/workflows

In [23]:
print(classification)

({'workflows': [{'id': '23254', 'display_name': 'Classification', 'tasks': {'T0': {'help': 'Is this a galaxy?', 'type': 'single', 'answers': [{'label': 'Yes'}, {'label': 'No'}], 'question': 'Is this a galaxy?', 'required': True}}, 'steps': [], 'classifications_count': 0, 'subjects_count': 200, 'created_at': '2023-01-05T16:59:46.979Z', 'updated_at': '2023-01-05T17:07:32.650Z', 'finished_at': None, 'first_task': 'T0', 'primary_language': 'en', 'version': '9.7', 'content_language': 'en', 'prioritized': False, 'grouped': False, 'pairwise': False, 'retirement': {'options': {'count': 1}, 'criteria': 'classification_count'}, 'retired_set_member_subjects_count': 0, 'href': '/workflows/23254', 'active': True, 'mobile_friendly': False, 'aggregation': {}, 'configuration': {'invert_subject': True}, 'public_gold_standard': False, 'completeness': 0.0, 'links': {'project': '19539', 'subject_sets': [], 'tutorial_subject': None, 'published_version': None, 'attached_images': {'href': '/workflows/23254/a