# Citizen Science Notebook
This notebook demonstrates the usage of the Butler to curate data and store it for later retrieval.

## Create a Zooniverse Account
If you haven't already, [create a Zooniverse account here.](https://www.zooniverse.org/)
After creating your account, return to this notebook.

In [None]:
# Install panoptes client package to dependencies
!python -m pip install panoptes-client

## Log in to Zooniverse
Now that you have a Zooniverse account, log into the Zooniverse(Panoptes) client.

In [None]:
# Log into Zooniverse
import panoptes_client
client = panoptes_client.Panoptes.connect(login="interactive")

## Import Butler/LSST Stack dependencies
Before you can curate data, you need to load all of the LSST stack dependencies in order to use the Butler service.

In [None]:
# Generic python packages
import numpy as np
import pylab as plt
 
# Set a standard figure size to use
plt.rcParams['figure.figsize'] = (8.0, 8.0)
 
# LSST Science Pipelines (Stack) packages
import lsst.daf.butler as dafButler
import lsst.afw.display as afwDisplay
import lsst.geom as geom
import lsst.afw.coord as afwCoord
afwDisplay.setDefaultBackend('matplotlib')

 ## Prep Work
 Variable constants declared

In [None]:
# Another cell above this cell that can be uncommented that will show all the collections that can be queried
for c in sorted(registry.queryCollections()):
    print(c)

In [None]:
# This should match the verified version listed at the start of the notebook
! eups list -s lsst_distrib
 
# DP0.1 repo:
# Check with DM (RSP team) to see if the below the terms will make sense to the 
email = "" # Add your primary email
datasetId = "u/" + email + "/change-this" # Replace "change-this" with a unique name of your change, leave the leading slash '/'
repo = '' # Keep track of this URI for later use with: butler retrieve-artifacts...
collection = ""

## Initialize the Butler service
Notice the run and collections pulled in from the prep work above

In [None]:
# Initialize the Butler
# the 'run' kwarg is an arbitrary name that must be unique, that is to say that once I use this name
# the Butler will complain if I try to rerun this code with the same value for 'run'
butler = dafButler.Butler(repo, collections=collection, run=datasetId)
registry = butler.registry
print(registry) # print just so we know something happened

## Query the Butler for data
`butler.get()` queries the object store and database and return the results in the form of a Python object

In [None]:
# Specify the data to get
dataId = {'visit': '703697', 'detector': 80} # Hardcoded for now
calexp = butler.get('calexp', dataId=dataId)# Hardcoded for now
print(calexp) # print just so we know something happened

## Store the dataset in the IDF
`butler.put()` stores the Python object reference that contains the `butler.get()` query results in the IDF. It can then be retrieved with the Butler CLI.

In [None]:
# First, delete the dataset if it already exists in the IDF
# runs = ("")
# runsIter = iter(runs)
# print(datasetId)
# deleted = butler.removeRuns([datasetId], True)
# print(deleted)

# Do butler.put() to store the retrieved data with a "run" to identify the stored dataset
print(datasetId)
zoonyTest = butler.put(calexp, 'calexp', dataId=dataId, run=datasetId) # Hardcoded 'calexp' for now
# from pprint import pprint
print(zoonyTest)


## Prep the Data for Zooniverse
Technically, the data will be stored in a Rubin datacenter, but this process will also inform Zooniverse of the new data for your project.

In [None]:
import urllib.request
sourceId = "" # Add 
edcEndpoint = "https://rsp-data-exporter-e3g4rcii3q-uc.a.run.app?email=" + email + "&collection=" + datasetId + "&sourceId=" + sourceId # RSP user will expect to just see a function call, not the details
response = urllib.request.urlopen(edcEndpoint).read()
manifestUrl = response.decode('UTF-8')
print(manifestUrl)

 ## Look Up Your Zooniverse Project
 The following code will not work if you have not authenticated in the cell titled "Log in to Zooniverse". </br>
 Supply the project name in the variable below.
 </br></br>
 Not that the `Project.find()` method expects the project name to reflect the "slug" of your project, if you don't know what a "slug" is in this context, see:</br>
 https://www.zooniverse.org/talk/18/967061?comment=1898157&page=1

In [None]:
# Replace the empty string below with your project name
slugName = "" # Add your slug name

project = panoptes_client.Project.find(slug=slugName)

# If the following prints out a number, then your project lookup was successful.
print(project.id)

## Create a New Subject Set for Your Data
Zooniverse refers to each distinct batch of data that is associated to a project as a "subject set". This really just means to you what you're sending over is "data", but Zooniverse refers to it as a "subject set". It's also a useful way to keep track of each distinct batch of data as you can give each batch a distinct name.

In [None]:
# Create a new subject set
subject_set = panoptes_client.SubjectSet()
subject_set.links.project = project

# Give the subject set a display name (that will only be visible to you on the Zooniverse platform)
subject_set.display_name = ''

subject_set.save()
project.reload()

## Associate the Subject Set With Your Data
This step informs Zooniverse that there is a new subject set(data) available for it to pickup and associate with your project.

In [None]:
import json
payload = {"subject_set_imports": {"source_url": f"{manifestUrl}", "links": {"subject_set": subject_set.id}}}

json.dumps(payload)

## Notify Zooniverse of the New Subject Set
Finally, we send a request out to the Zooniverse API to inform them that new data is available and which subject set to associate it with.

In [None]:
json_response, etag = client.post(path='/subject_set_imports', json=payload)
print(json_response)