<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>Citzen Science Notebook</b> <br>
Contact author: Clare Higgs & Eric Rosas <br>
Last verified to run: 2022-10-20 <br>
LSST Science Piplines version: Weekly 2022_40 <br>
Container size: medium <br>


## 1.0 Introduction
This notebook is intended to guide a PI through the process of sending data from the Rubin Science Platform (RSP) to the Zooniverse.
A detailed guide to Citizen Science projects, outlining the process, requirements and support available is here: (*link to citscipiguide*)
The data sent can be currated on the RSP as a necessary and take many forms. Here, we include an example of sending png cutout images. 
We encourage PIs new to the Rubin dataset to explore the tutorial notebooks and documentation.

As explained in the guide, this notebook will restrict the number of object sent to the Zooniverse to 100 objects. This limit is intended to demonstrate your project prior to full approval from the EPO Data Rights Panel. 

Support is available and questions are welcome - (*some email/link etc*)


**DEBUG VERSION note that this version of the notebook contains additional debugging and the first cell will need to be run once**

### Terminal Prep Work
The follow cell will run the necessary terminal commands that make this notebook possible.

**These cells only need to be run the first time this notebook is run and can be skipped after!**
**This cell should be incorporated in to the RSP and will not be part of the final notebook**

In [None]:
# Install panoptes client package to dependencies and create necessary folders
!mkdir -p project/citizen-science/astro-cutouts/
!python -m pip install google-cloud-storage
!python -m pip install panoptes-client

!export GOOGLE_APPLICATION_CREDENTIALS=/opt/lsst/software/jupyterlab/butler-secret/butler-gcs-idf-creds.json

# Temporary debugging, won't affect anything with this notebook, instead helps Zooniverse developers troubleshoot issues on their side
!export PANOPTES_DEBUG=1

## 2.0 Create a Zooniverse Account
If you haven't already, [create a Zooniverse account here.](https://www.zooniverse.org/) 
and [create your project](https://www.zooniverse.org/lab). Your project must be set to "public". To set your project to public, select the "Visibility" tab.
Note you will need to enter your username, password, and [project slug](https://www.zooniverse.org/talk/18/967061?comment=1898157&page=1) below.

After creating your account and project, return to this notebook.

### Log in to Zooniverse
Now that you have a Zooniverse account, log into the Zooniverse(Panoptes) client.

In [None]:
import panoptes_client
client = panoptes_client.Panoptes.connect(login="interactive")
print("You now are logged in")

 ### Look Up Your Zooniverse Project
 Supply your email and project slug below. 
(If you don't know what a "slug" is in this context, see: https://www.zooniverse.org/talk/18/967061?comment=1898157&page=1)
Do not include the leading forward slash.
 </br>
 
 IMPORTANT: Your Zooniverse project must be set to "public", a "private" project will not work. Select this setting under the "Visibility" tab, (it does not need to be set to live).
 The following code will not work if you have not authenticated in the cell titled "Log in to Zooniverse". 

In [None]:
from panoptes_client import Project, SubjectSet

email = ""
slugName = "" 

project = Project.find(slug=slugName)
projectId = project.id
print(projectId, project.display_name)

### Run the below cell to activate the Rubin Citizen Science SDK
**just run this cell**

**this cell should be gone in the final version**

In [None]:
# HiPS astrocutout libraries
from astroquery.hips2fits import hips2fits
from IPython.display import display
import matplotlib.pyplot as plt
from matplotlib.colors import Colormap
import astropy.units as u
from astropy.coordinates import Longitude, Latitude, Angle

# GCP libraries
from google.cloud import storage

# Import organizational libraries
import uuid
import os
import shutil
import json
import urllib.request

# Prep work
global email
vendor_batch_id = 0
_HIPS_CUTOUTS = "hips_cutouts"
project_id = project.id
guid = ""
cutouts_dir = ""
manifest_url = ""
edc_response = ""
step = 0

def clean_up_unused_subject_set():
    global client, vendor_batch_id
    log_step("Cleaning up unused subject set on the Zooniverse platform, vendor_batch_id : " + str(vendor_batch_id))
    
    try:
        subject_set = SubjectSet.find(str(vendor_batch_id))

        if subject_set.id == vendor_batch_id:
            subject_set.delete()

    except PanoptesAPIException:
        display(f"** Warning: Failed to find the subject set with id: {str(vendor_batch_id)}- perhaps it's been deleted?.")
    return

def send_zooniverse_manifest():
    global vendor_batch_id, manifest_url, client
    log_step("Sending the manifest URL to Zooniverse")
    display("** Information: subject_set.id: " + str(vendor_batch_id) + "; manifest: " + manifest_url);

    payload = {"subject_set_imports": {"source_url": manifest_url, "links": {"subject_set": str(vendor_batch_id)}}}
    json_response, etag = client.post(path='/subject_set_imports', json=payload)
    return

def create_new_subject_set(name):
    global project, panoptes_client, vendor_batch_id
    log_step("Creating a new Zooniverse subject set")
    
    # Create a new subject set
    subject_set = panoptes_client.SubjectSet()
    subject_set.links.project = project

    # Give the subject set a display name (that will only be visible to you on the Zooniverse platform)
    subject_set.display_name = name 
    subject_set.save()
    project.reload()
    vendor_batch_id = subject_set.id
    return vendor_batch_id

def check_status():
    global guid
    status_uri = "https://rsp-data-exporter-dot-skyviewer.uw.r.appspot.com/citizen-science-ingest-status?guid=" + guid
    raw_response = urllib.request.urlopen(status_uri).read()
    response = raw_response.decode('UTF-8')
    return json.loads(response)


# Validates that the RSP user is allowed to create a new subject set
def send_data(subject_set_name, batch_name, cutouts = None):
    global manifest_url, edc_response
    step = 0
    log_step("Checking batch status")
    if has_active_batch() == True:
        raise CitizenScienceError("You cannot send another batch of data while a subject set is still active on the Zooniverse platform - you can only send a new batch of data if all subject sets associated to a project have been completed.")
    if __cit_sci_data_type == _HIPS_CUTOUTS:
        zip_path = zip_hips_cutouts(batch_name)
        upload_hips_cutouts(zip_path)
        subject_set_id = create_new_subject_set(subject_set_name)
        
        edc_response = alert_edc_of_new_citsci_data(subject_set_id)
        if(edc_response == None):
            edc_response = { "status": "error", "messages": "An error occurred while processing the HiPS cutouts upload" }
        else:
            edc_response = json.loads(edc_response)

    else:
        # send_butler_data_to_edc()
        subject_set_id = create_new_subject_set(subject_set_name)
        manifest_url = send_butler_data_to_edc()
    
    if edc_response["status"] == "success":
        manifest_url = edc_response["manifest_url"]
        if len(edc_response["messages"]) > 0:
            display("** Additional information:")
            for message in edc_response["messages"]:
                display("    ** " + message)
        else:
            log_step("Success! The URL to the manifest file can be found here:")
            display(manifest_url)
    else:
        clean_up_unused_subject_set()
        display("    ** Error: One or more errors occurred during the last step **")
        for message in edc_response["messages"]:
            display("        ** " + message)
        return

    send_zooniverse_manifest()
    log_step("Transfer process complete, but further processing is required on the Zooniverse platform and you will receive an email at " + email)
    return

def zip_hips_cutouts(batch_name):
    global guid
    guid = str(uuid.uuid4())
    cutouts_dir = batch_name
    data_dir = cutouts_dir
    log_step("Zipping up all the astro cutouts - this can take a few minutes with large data sets, but unlikely more than 10 minutes.")
    shutil.make_archive("./" + guid, 'zip', data_dir)
    return ["./" + guid + '.zip', guid + '.zip']


def upload_hips_cutouts(zip_path):
    log_step("Uploading the citizen science data")
    bucket_name = "citizen-science-data"
    destination_blob_name = zip_path[1]
    source_file_name = zip_path[0]

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)
    return

def alert_edc_of_new_citsci_data(vendor_batch_id):
    global guid
    project_id_str = str(project_id)
    log_step("Notifying the Rubin EPO Data Center of the new data, which will finish processing of the data and notify Zooniverse")
    
    try:
        edc_endpoint = "https://rsp-data-exporter-dot-skyviewer.uw.r.appspot.com/citizen-science-bucket-ingest?email=" + email + "&vendor_project_id=" + project_id_str + "&guid=" + guid + "&vendor_batch_id=" + str(vendor_batch_id) + "&debug=True"
        response = urllib.request.urlopen(edc_endpoint).read()
        manifestUrl = response.decode('UTF-8')
        return manifestUrl
    except Exception as e:
        clean_up_unused_subject_set()
        return None

def send_butler_data_to_edc():
    log_step("Notifying the Rubin EPO Data Center of the new data, which will finish processing of the data and notify Zooniverse")
    edcEndpoint = "https://rsp-data-exporter-e3g4rcii3q-uc.a.run.app/citizen-science-butler-ingest?email=" + email + "&collection=" + datasetId + "&sourceId=" + sourceId + "&vendorProjectId=" + str(projectId) + "&vendor_batch_id=" + str(vendor_batch_id)
    log_step('Processing data for Zooniverse, this may take up to a few minutes.')
    response = urllib.request.urlopen(edcEndpoint).read()
    manifestUrl = response.decode('UTF-8')
    return

def has_active_batch():
    active_batch = False
    for subject_set in project.links.subject_sets:
        try:
            for completeness_percent in list(subject_set.completeness.values()):
                if completeness_percent == 1.0:
                    active_batch = True
                    break
            if active_batch:
                break
        except:
            display("    ** Warning! - The Zooniverse client is throwing an error about a missing subject set, this can likely safely be ignored.");
    return active_batch

def log_step(msg):
    global step
    step += 1
    display(str(step) + ". " + msg)
    return

# Custom error handling for this notebook
class CitizenScienceError(Exception):
   
    # Constructor or Initializer
    def __init__(self, value):
        self.value = value
   
    # __str__ is to print() the value
    def __str__(self):
        return(repr(self.value))
    
print("Loaded Citizen Science SDK")              

## 3.0 Make a Subject Set to Send

Here, the subject set of objects to send to Zooniverse should be curated. This can (and should!) be modified to create your own subject set. Your subject set must have 100 objects or less in the testing phase before your project is approved by the EPO Data Rights panel. 

Currently, this example makes a set of image cutouts of extended sources. 

In [None]:
#import packages used for generating subject set

import matplotlib.pyplot as plt
import gc
import numpy as np
import pandas

# Astropy imports
from astropy.wcs import WCS
from astropy.visualization import make_lupton_rgb
from astropy import units as u
from astropy.coordinates import SkyCoord

# Import the Rubin TAP service utilities
from lsst.rsp import get_tap_service, retrieve_query

# Image visualization routines.
import lsst.afw.display as afwDisplay
# The Butler provides programmatic access to LSST data products.
from lsst.daf.butler import Butler
# Geometry package
import lsst.geom as geom
# Object for multi-band exposures
from lsst.afw.image import MultibandExposure

import lsst.daf.butler as dafButler
import lsst.geom
import lsst.afw.display as afwDisplay

plt.style.use('tableau-colorblind10')
%matplotlib inline

import warnings
from astropy.units import UnitsWarning

In [None]:
def remove_figure(fig):
    """
    Remove a figure to reduce memory footprint.

    Parameters
    ----------
    fig: matplotlib.figure.Figure
        Figure to be removed.

    Returns
    -------
    None
    """
    # get the axes and clear their images
    for ax in fig.get_axes():
        for im in ax.get_images():
            im.remove()
    fig.clf()       # clear the figure
    plt.close(fig)  # close the figure

    gc.collect()    # call the garbage collector
    
def make_figure(exp, out_name):
    """
    Create an image.
    should be followed with remove_figure

    Parameters
    ----------
    exp : calexp from butler.get
    out_name : file name where you'd like to save it
    
    """
    fig = plt.figure(figsize=(10, 8))
    afw_display = afwDisplay.Display(1)
    afw_display.scale('asinh', 'zscale')
    afw_display.mtv(exp.image)
    plt.gca().axis('on')
    plt.savefig(out_name)
    
    return fig

def get_bandtractpatch(ra_deg,dec_deg):
    """
    get the tract and patch of a source. currently retrieves i band only. 

    Parameters
    ----------
    ra : ra of source in degrees
    dec : dec of source in degrees
    
    """
    spherePoint = lsst.geom.SpherePoint(ra_deg*lsst.geom.degrees, dec_deg*lsst.geom.degrees)
    tract = skymap.findTract(spherePoint)
    patch = tract.findPatch(spherePoint)
    my_tract = tract.tract_id
    my_patch = patch.getSequentialIndex()
    dataId = {'band': 'i', 'tract': my_tract, 'patch': my_patch}
    return dataId

# Set up some plotting defaults:       
params = {'axes.labelsize': 20,
          'font.size': 20,
          'legend.fontsize': 14,
          'xtick.major.width': 3,
          'xtick.minor.width': 2,
          'xtick.major.size': 12,
          'xtick.minor.size': 6,
          'xtick.direction': 'in',
          'xtick.top': True,
          'lines.linewidth': 3,
          'axes.linewidth': 3,
          'axes.labelweight': 3,
          'axes.titleweight': 3,
          'ytick.major.width': 3,
          'ytick.minor.width': 2,
          'ytick.major.size': 12,
          'ytick.minor.size': 6,
          'ytick.direction': 'in',
          'ytick.right': True,
          'figure.figsize': [8, 8],
          'figure.facecolor': 'White'
          }

plt.rcParams.update(params)

#initializing Tap and Butler
pandas.set_option('display.max_rows', 20)
warnings.simplefilter("ignore", category=UnitsWarning)
service = get_tap_service()
assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

# Use lsst.afw.display with the matplotlib backend
afwDisplay.setDefaultBackend('matplotlib')

config = 'dp02'
collection = '2.2i/runs/DP0.2'
butler = dafButler.Butler(config, collections=collection)
skymap = butler.get('skyMap')

In [None]:
max_rec=10 # make 100 for full subject set test
use_center_coords = "62, -37"
use_radius = "1.0"

Query can be modified to other sources - currently just selecting 10 objects (change max_rec above)

In [None]:
query = "SELECT TOP " + str(max_rec) + " " + \
        "objectId, coord_ra, coord_dec, detect_isPrimary " + \
        "g_cModelFlux, r_cModelFlux, r_extendedness, r_inputCount " + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND r_extendedness = 1 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 18.0 " + \
        "ORDER by r_cModelFlux DESC"
results = service.search(query)
assert len(results) == max_rec

In [None]:
results_table = results.to_table().to_pandas()
results_table['dataId'] = results_table.apply(lambda x: get_bandtractpatch(x['coord_ra'], x['coord_dec']), axis=1)

In [None]:
cutouts=[]

batch_dir = "./cutouts/"
if os.path.isdir(batch_dir) == False:
    os.mkdir(batch_dir)

for index, row in results_table.iterrows():
    deepCoadd= butler.get('deepCoadd', dataId=row['dataId'])
    figout = make_figure(deepCoadd, batch_dir+"cutout"+str(row['objectId'])+".png")
    cutouts.append(figout)
    remove_figure(figout)

### Create a new subject set
Name your subject set as it will appear on the Zooniverse. Try not to reuse names. 

In [None]:
subject_set_name = "" 

## 4.0 Send the cutouts to Zooniverse

Send your subject set to the Zooniverse. This cell will let you send one subject set. If you already have a set on Zooniverse, it will notify you and fail. If you want to send more data, delete what is on the Zooniverse and send again. You *may* get a warning that your set still exists or a "Could not find subject_set with id=' '" error. If so, wait (~10min) and try again, as Zooniverse takes a minute to process your changes. You may also have re-run the "Look up your project cell". Don't click the below cell multiple times, the upload will fail if multiple runs are attempted.

It has successfully worked if you get nofication and an email saying your data has been sent.

In [None]:
__cit_sci_data_type = _HIPS_CUTOUTS # Important: DO NOT change this value. Update - this value may be changed.
send_data(subject_set_name, batch_dir, cutouts)

### Explicitly check the status of your data batch
Is the send_data() call above stalling on "Notifying the Rubin EPO Data Center..." step? Run the below cell every few minutes to check the status of your data. Large datasets can cause the response to get lost, but that does not necessarily mean that your data was not sent to Zooniverse.

In [None]:
res = check_status()
print("Status:")
print(res["status"])
print("Manifest:")
print(res["manifest_url"])
print("Messages:")
print(res["messages"])
if res["status"] == "success":
    global manifest_url
    manifest_url = res["manifest_url"]
    send_zooniverse_manifest()