# Citizen Science Notebook
This notebook demonstrates the usage of the Butler to curate data and store it for later retrieval.

## Create a Zooniverse Account
If you haven't already, [create a Zooniverse account here.](https://www.zooniverse.org/)
After creating your account, return to this notebook.

## Terminal Prep Work
The follow cell will run the necessary terminal commands that make this notebook possible.

In [None]:
# Install panoptes client package to dependencies
!python -m pip install panoptes-client
!mkdir -p project/citizen-science/astro-cutouts/
!mkdir -p project/citizen-science/org

In [109]:
from IPython.display import display

In [None]:
!python -m pip install google-cloud-storage

## Log in to Zooniverse
Now that you have a Zooniverse account, log into the Zooniverse(Panoptes) client.

In [3]:
# Log into Zooniverse
import panoptes_client
client = panoptes_client.Panoptes.connect(login="interactive")

Enter your Zooniverse credentials...


Username:  doctor-toboggan
 ············


In [61]:
# print(panoptes_client.Project.find())
# panoptes_client.project

None


 ## Look Up Your Zooniverse Project
 The following code will not work if you have not authenticated in the cell titled "Log in to Zooniverse". </br>
 Supply the project name in the variable below.
 </br></br>
 Not that the `Project.find()` method expects the project name to reflect the "slug" of your project, if you don't know what a "slug" is in this context, see:</br>
 https://www.zooniverse.org/talk/18/967061?comment=1898157&page=1

In [4]:
# Replace the empty string below with your project name
slugName = "doctor-toboggan/wizard-wesearch" # Add your slug name

project = panoptes_client.Project.find(slug=slugName)
projectId = project.id

# If the following prints out a number, then your project lookup was successful.
print(project.id)

# subjectSet = panoptes_client.subject_set.subject_workflow_statuses()
# print(subjectSet)

16061


### Temp code that will be moved after to the RSP-data-exporter service

In [126]:
from panoptes_client import Panoptes, Project, SubjectSet
import pprint
import pdb

pp = pprint.PrettyPrinter(indent=2)
h = display(display_id='my-display')
h.display(None)

# project = Project.find(16061)
project = Project.find(14253)
# pp.pprint(project.raw)

send_data()

'Active batch exists!!! Continuing because this notebook is in debug mode'

'Active batch exists!!! Continuing because this notebook is in debug mode'

## Import Butler/LSST Stack dependencies
Before you can curate data, you need to load all of the LSST stack dependencies in order to use the Butler service.

In [None]:
# Generic python packages
import numpy as np
import pylab as plt
 
# Set a standard figure size to use
plt.rcParams['figure.figsize'] = (8.0, 8.0)
 
# LSST Science Pipelines (Stack) packages
import lsst.daf.butler as dafButler
import lsst.afw.display as afwDisplay
import lsst.geom as geom
import lsst.afw.coord as afwCoord
afwDisplay.setDefaultBackend('matplotlib')

 ## Prep Work
 Variable constants declared

In [None]:
# This should match the verified version listed at the start of the notebook
! eups list -s lsst_distrib
 
# DP0.1 repo:
# Check with DM (RSP team) to see if the below the terms will make sense to the 
email = "" # Add your primary email
datasetId = "u/" + email + "/change-this2" # Replace "change-this" with a unique name of your change, leave the leading slash '/'
repo = 's3://butler-us-central1-dp01' # Keep track of this URI for later use with: butler retrieve-artifacts...
collection = "2.2i/runs/DP0.1"

## Initialize the Butler service
Notice the run and collections pulled in from the prep work above

In [None]:
# Initialize the Butler
# the 'run' kwarg is an arbitrary name that must be unique, that is to say that once I use this name
# the Butler will complain if I try to rerun this code with the same value for 'run'
butler = dafButler.Butler(repo, collections=collection, run=datasetId)
registry = butler.registry
print(registry) # print just so we know something happened

And list all available collections:

In [None]:
# Another cell above this cell that can be uncommented that will show all the collections that can be queried
for c in sorted(registry.queryCollections()):
    print(c)

## Query the Butler for data
`butler.get()` queries the object store and database and return the results in the form of a Python object

In [None]:
# Specify the data to get
dataId = {'visit': '703697', 'detector': 80} # Hardcoded for now
calexp = butler.get('calexp', dataId=dataId) # Hardcoded for now
print(calexp) # print just so we know something happened

## Store the dataset in the IDF
`butler.put()` stores the Python object reference that contains the `butler.get()` query results in the IDF. It can then be retrieved with the Butler CLI.

In [None]:
# First, delete the dataset if it already exists in the IDF
# runs = ("")
# runsIter = iter(runs)
# print(datasetId)
# deleted = butler.removeRuns([datasetId], True)
# print(deleted)

# Do butler.put() to store the retrieved data with a "run" to identify the stored dataset
print(datasetId)
zoonyTest = butler.put(calexp, 'calexp', dataId=dataId, run=datasetId) # Hardcoded 'calexp' for now
# from pprint import pprint
print(zoonyTest)


## Prep the Data for Zooniverse
Technically, the data will be stored in a Rubin datacenter, but this process will also inform Zooniverse of the new data for your project.

In [None]:
# import urllib.request
# sourceId = "" # Add 
# edcEndpoint = "https://rsp-data-exporter-e3g4rcii3q-uc.a.run.app/citizen-science-ingest?email=" + email + "&collection=" + datasetId + "&sourceId=" + sourceId + "&vendorProjectId=" + projectId # RSP user will expect to just see a function call, not the details
# print('Processing data for Zooniverse, this may take up to a few minutes.')
# response = urllib.request.urlopen(edcEndpoint).read()
# manifestUrl = response.decode('UTF-8')
# print(manifestUrl)

## Create a New Subject Set for Your Data
Zooniverse refers to each distinct batch of data that is associated to a project as a "subject set". This really just means to you what you're sending over is "data", but Zooniverse refers to it as a "subject set". It's also a useful way to keep track of each distinct batch of data as you can give each batch a distinct name.

In [7]:
# 
# 
# 
# I don't think this is needed - erosas - 5/2/22
#
#
#


# # Create a new subject set
# subject_set = panoptes_client.SubjectSet()
# subject_set.links.project = project

# # Give the subject set a display name (that will only be visible to you on the Zooniverse platform)
# subject_set.display_name = 'delete me 2' # create a unique name for your subject set, which is how you will identify it 

# subject_set.save()
# project.reload()
# subject_set.id

'104309'

## Associate the Subject Set With Your Data
This step informs Zooniverse that there is a new subject set(data) available for it to pickup and associate with your project.

In [69]:
# 
# 
# 
# I don't think this is needed - erosas - 5/2/22
#
#
#



# import json
# subject_set.id
# payload = {"subject_set_imports": {"source_url": "https://storage.googleapis.com/citizen-science-data/4770983a-9fdb-4500-8715-0fc678cfdce9/manifest.csv", "links": {"subject_set": subject_set.id}}}

# json.dumps(payload)
# payload

{'subject_set_imports': {'source_url': 'https://storage.googleapis.com/citizen-science-data/4770983a-9fdb-4500-8715-0fc678cfdce9/manifest.csv',
  'links': {'subject_set': '103945'}}}

## Notify Zooniverse of the New Subject Set
Finally, we send a request out to the Zooniverse API to inform them that new data is available and which subject set to associate it with.

In [70]:
# 
# 
# 
# I don't think this is needed - erosas - 5/2/22
#
#
#

# json_response, etag = client.post(path='/subject_set_imports', json=payload)
# print(json_response)
# json_response

{'subject_set_imports': [{'id': '26', 'href': '/subject_set_imports/26', 'created_at': '2022-04-13T21:08:27.934Z', 'updated_at': '2022-04-13T21:08:27.934Z', 'source_url': 'https://storage.googleapis.com/citizen-science-data/4770983a-9fdb-4500-8715-0fc678cfdce9/manifest.csv', 'imported_count': 0, 'manifest_count': 100, 'failed_count': 0, 'progress': 0.0, 'links': {'subject_set': '103945', 'user': '2310327'}}], 'links': {'subject_set_imports.subject_set': {'href': '/subject_sets/{subject_set_imports.subject_set}', 'type': 'subject_sets'}, 'subject_set_imports.user': {'href': '/users/{subject_set_imports.user}', 'type': 'users'}}, 'meta': {'subject_set_imports': {'page': 1, 'page_size': 20, 'count': 1, 'include': [], 'page_count': 1, 'previous_page': None, 'next_page': None, 'first_href': '/subject_set_imports', 'previous_href': None, 'next_href': None, 'last_href': '/subject_set_imports'}}}


{'subject_set_imports': [{'id': '26',
   'href': '/subject_set_imports/26',
   'created_at': '2022-04-13T21:08:27.934Z',
   'updated_at': '2022-04-13T21:08:27.934Z',
   'source_url': 'https://storage.googleapis.com/citizen-science-data/4770983a-9fdb-4500-8715-0fc678cfdce9/manifest.csv',
   'imported_count': 0,
   'manifest_count': 100,
   'failed_count': 0,
   'progress': 0.0,
   'links': {'subject_set': '103945', 'user': '2310327'}}],
 'links': {'subject_set_imports.subject_set': {'href': '/subject_sets/{subject_set_imports.subject_set}',
   'type': 'subject_sets'},
  'subject_set_imports.user': {'href': '/users/{subject_set_imports.user}',
   'type': 'users'}},
 'meta': {'subject_set_imports': {'page': 1,
   'page_size': 20,
   'count': 1,
   'include': [],
   'page_count': 1,
   'previous_page': None,
   'next_page': None,
   'first_href': '/subject_set_imports',
   'previous_href': None,
   'next_href': None,
   'last_href': '/subject_set_imports'}}}

## Check the Status of Your Data
The below cell will query the Zooniverse API to check the status of the subject set that you just notifed Zooniverse of.

In [57]:
# 
# 
# 
# I don't think this is needed - erosas - 5/2/22
#
#
#

# status = client.get(path='/subject_sets/{}'.format(subject_set.id))
# print("============")
# client.endpoint
# print("============")
# status



({'subject_sets': [{'id': '103933',
    'display_name': 'Large cutouts NOT in subbucket',
    'set_member_subjects_count': 1,
    'metadata': {},
    'created_at': '2022-04-13T17:39:00.582Z',
    'updated_at': '2022-04-13T17:45:46.436Z',
    'href': '/subject_sets/103933',
    'completeness': {},
    'links': {'project': '16061', 'workflows': []}}],
  'links': {'subject_sets.project': {'href': '/projects/{subject_sets.project}',
    'type': 'projects'},
   'subject_sets.workflows': {'href': '/workflows?subject_set_id={subject_sets.id}',
    'type': 'workflows'}},
  'meta': {'subject_sets': {'page': 1,
    'page_size': 20,
    'count': 1,
    'include': [],
    'page_count': 1,
    'previous_page': None,
    'next_page': None,
    'first_href': '/subject_sets?id=103933',
    'previous_href': None,
    'next_href': None,
    'last_href': '/subject_sets?id=103933'}}},
 'W/"143cc9340be66522ec9dc6c483c001b1"')

## Create Image Cutout from HiPS Server
You can create a cutout image of a HiPS survey/server by doing the following:

In [None]:
# # Import the necessary astro/plot libraries
# from astroquery.hips2fits import hips2fits
# from IPython.display import display
# import matplotlib.pyplot as plt
# from matplotlib.colors import Colormap
# import astropy.units as u
# from astropy.coordinates import Longitude, Latitude, Angle

# # Import organizational libraries
# import time
# import uuid
# import os
# import shutil

# Reference the EDC HiPS Survey
# hips = 'https://storage.googleapis.com/hips-data/images'

# astroquery.hips2fits library documentation: https://astroquery.readthedocs.io/en/latest/hips2fits/hips2fits.html
# Configure the query and execute it
result = hips2fits.query(
   hips=hips,
   width=1000,
   height=500,
   ra=Longitude(0 * u.deg),
   dec=Latitude(20 * u.deg),
   fov=Angle(10 * u.deg),
   projection="AIT",
   get_query_payload=False,
   format='jpg',
   min_cut=0.5,
   max_cut=99.5,
   cmap=Colormap('viridis'),
)

# Create the plot
im = plt.imshow(result)
# Show the plot
plt.show()

# Add the cutout to the cutouts collection:
cutouts = []
cutouts.append(im)

In [None]:
subject_set_name = "" # give your subject set a name
# create_new_subject_set(subject_set_name)

In [None]:
# Send the cutouts to Zooniverse
__cit_sci_data_type = _HIPS_CUTOUTS # Important: DO NOT change this value
send_data(subject_set_name, cutouts)

## Now Upload the Zipped Cutouts to Zooniverse

You will need a service account key in your RSP `/home/project/citizen-science/org` folder for the next cell to work as expected. Ensure that you change the name of the `service_account_key` to the name of the JSON key file.

In [None]:
# from google.cloud import storage

# bucket_name = "citizen-science-data"
# service_account_key = "skyviewer-398f28c943e8.json" # replace this with the GCP key provided to you
# destination_blob_name = guid + ".zip"
# source_file_name = cutoutsDir + guid + ".zip"

# storage_client = storage.Client.from_service_account_json(
#     './project/citizen-science/org/' + service_account_key)
# bucket = storage_client.bucket(bucket_name)
# blob = bucket.blob(destination_blob_name)

# blob.upload_from_filename(source_file_name)

# print(
#     "File {} uploaded to {}.".format(
#         source_file_name, destination_blob_name
#     )
# )

## Notify the EDC of the New Citizen Science Cutout Data

Add your email address if you already haven't, and the `projectId` here inserted as a query param in the edcEndpoint is assigned in above in this notebook upon signing into the Zooniverse API.

In [67]:
# import urllib.request
# email = "erosas@lsst.org" # Add email
# # vendor_project_id = ""
# # projectId = str(16061)
# edcEndpoint = "https://rsp-data-exporter-e3g4rcii3q-uc.a.run.app/new-bucket-ingest?email=" + email + "&vendor_project_id=" + projectId + "&guid=" + guid
# print('Processing data for Zooniverse, this may take up to a few minutes.')
# response = urllib.request.urlopen(edcEndpoint).read()
# manifestUrl = response.decode('UTF-8')
# print(manifestUrl)

Processing data for Zooniverse, this may take up to a few minutes.
https://storage.googleapis.com/citizen-science-data/4770983a-9fdb-4500-8715-0fc678cfdce9/manifest.csv


## IMPORTANT! Run the below cell to activate the Rubin Citizen Science SDK

In [123]:
# HiPS astrocutout libraries
from astroquery.hips2fits import hips2fits
from IPython.display import display
import matplotlib.pyplot as plt
from matplotlib.colors import Colormap
import astropy.units as u
from astropy.coordinates import Longitude, Latitude, Angle

# Zooniverse libraries
from panoptes_client import Panoptes, Project, SubjectSet

# GCP libraries
from google.cloud import storage

# Import organizational libraries
import time
import uuid
import os
import shutil
import pprint
import pdb
import urllib.request

# Prep work
email = "erosas@lsst.org" # To-do: Refactor this out to a separate editable cell to add email
hips = 'https://storage.googleapis.com/hips-data/images'
pp = pprint.PrettyPrinter(indent=2)
working_message = "Status updating..."
vendor_batch_id = 0;
_HIPS_CUTOUTS = "hips_cutouts"

# project = Project.find(16061)
# project = Project.find(14253)
# pp.pprint(project.raw)

def create_new_subject_set(name):
    h.update("Creating a new Zooniverse subject set")
    # Create a new subject set
    subject_set = panoptes_client.SubjectSet()
    subject_set.links.project = project

    # Give the subject set a display name (that will only be visible to you on the Zooniverse platform)
    subject_set.display_name = name 

    subject_set.save()
    project.reload()
    vendor_batch_id = subject_set.id
    return vendor_batch_id

# Validates that the RSP user is allowed to create a new subject set
def send_data(subject_set_name, cutouts = None):
    h.update("Checking batch status")
    if has_active_batch() == True:
        h.update("Active batch exists!!! Continuing because this notebook is in debug mode")
        # raise CitizenScienceError("You cannot send another batch of data while a subject set is still active on the Zooniverse platform - you can only send a new batch of data if all subject sets associated to a project have been completed.")
    if __cit_sci_data_type == _HIPS_CUTOUTS:
        zip_cutouts_and_upload(cutouts)
        subject_set_id = create_new_subject_set(subject_set_name)
        alert_edc_of_new_citsci_data()
    else:
        send_butler_data_to_edc()
        subject_set_id = create_new_subject_set(subject_set_name)
        send_butler_data_to_edc()
    
    h.update("Transfer process complete, but further processing is required on the Zooniverse platform and you will receive an email at " + email + ")
    return

def send_hips_cutout_data(cutouts):
    h.update("Zipping up all the astro cutouts")
    guid = str(uuid.uuid4())
    cutoutsDir = "./project/citizen-science/astro-cutouts/"
    dataDir = cutoutsDir + guid
    os.mkdir(dataDir);
    
    # beginning of temporary testing code
    for x in range(100): # create 100 cutouts from the one cutout image
        plt.imsave(dataDir + "/cutout-" + str(round(time.time() * 1000)) + "-" + str(x) + ".png", result)

    shutil.make_archive(cutoutsDir + guid, 'zip', dataDir)
    # end of temporary testing code
    return 

def zip_cutouts_and_upload():
    h.update("Uploading the citizen science data")
    bucket_name = "citizen-science-data"
    service_account_key = "skyviewer-398f28c943e8.json" # replace this with the GCP key provided to you
    destination_blob_name = guid + ".zip"
    source_file_name = cutoutsDir + guid + ".zip"

    storage_client = storage.Client.from_service_account_json(
        './project/citizen-science/org/' + service_account_key)
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)

    print(
        "File {} uploaded to {}.".format(
            source_file_name, destination_blob_name
        )
    )
    return

def alert_edc_of_new_citsci_data():
    h.update("Notifying the Rubin EPO Data Center of the new data, which will finish processing of the data and notify Zooniverse")
    edcEndpoint = "https://rsp-data-exporter-e3g4rcii3q-uc.a.run.app/citizen-science-bucket-ingest?email=" + email + "&vendor_project_id=" + project_id + "&guid=" + guid + "&vendor_batch_id" = vendor_batch_id
    response = urllib.request.urlopen(edcEndpoint).read()
    manifestUrl = response.decode('UTF-8')
    return

def send_butler_data_to_edc():
    h.update("Notifying the Rubin EPO Data Center of the new data, which will finish processing of the data and notify Zooniverse")
    edcEndpoint = "https://rsp-data-exporter-e3g4rcii3q-uc.a.run.app/citizen-science-butler-ingest?email=" + email + "&collection=" + datasetId + "&sourceId=" + sourceId + "&vendorProjectId=" + projectId + "&vendor_batch_id" = vendor_batch_id
    print('Processing data for Zooniverse, this may take up to a few minutes.')
    response = urllib.request.urlopen(edcEndpoint).read()
    manifestUrl = response.decode('UTF-8')
    return

def has_active_batch():
    for subject_set in project.links.subject_sets:
        active_batch = False
        for completeness_percent in list(subject_set.completeness.values()):
            if completeness_percent == 1.0:
                active_batch = True
                break
        if active_batch:
            break
    return active_batch

# Custom error handling for this notebook
class CitizenScienceError(Exception):
   
    # Constructor or Initializer
    def __init__(self, value):
        self.value = value
   
    # __str__ is to print() the value
    def __str__(self):
        return(repr(self.value))