<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px" alt="Vera C. Rubin Observatory Logo"> 
<h1 style="margin-top: 10px">Introduction to the Citizen Science Pipeline</h1>
Authors: Becky Nevin, Clare Higgs, and Eric Rosas <br>
Contact author: Clare Higgs <br>
Last verified to run: 2024-07-09 <br>
LSST Science Pipelines version: Weekly 2024_16 <br>
Container size: small or medium <br>
Targeted learning level: beginner 

<b>Description:</b> This notebook guides a PI through the process of sending data from the Rubin Science Platform (RSP) to the Zooniverse and retrieving classifications from Zooniverse. <br><br>
<b>Skills:</b> Table access protocol (TAP) query, Butler query, create and display cutout images, send cutout images to Zooniverse<br><br>
<b>LSST Data Products:</b> deepCoadd images, manifest file<br><br>
<b>Packages:</b> rubin.citsci, utils (citsci plotting and display utilities) <br><br>
<b>Credit:</b> The TAP query is based on notebooks developed by Leanne Guy and the Butler query is based on notebooks developed by Alex Drlica-Wagner and Melissa Graham<br><br>
<b>Get Support: </b>PIs new to DP0 are encouraged to find documentation and resources at <a href="https://dp0-2.lsst.io/">dp0-2.lsst.io</a>. Support for this notebook is available and questions are welcome at cscience@lsst.org.

## Table of Contents
* [1. Introduction](#first-bullet)
* [1.1 Package imports](#second-bullet)
* [1.2 Define functions and parameters](#third-bullet)
* [2. Make a subject set to send to Zooniverse](#fourth-bullet)
* [3. Create a manifest file](#fifth-bullet)
* [4. Send the data to Zooniverse](#sixth-bullet)
* [5. Retrieve the data](#seventh-bullet)

## 1. Introduction <a class="anchor" id="first-bullet"></a>
This notebook provides an introduction to how to use the rubin.citsci package to create cutout images and a manifest file, and send both of these to Zooniverse.

This notebook will restrict the number of object sent to the Zooniverse to 100 objects. This limit is intended to demonstrate a project prior to full approval from the education and public outreach (EPO) Data Rights Panel.

It is recommended to explore the DP0.2 `tutorial-notebooks/` folder in your home directory, specifically tutorial `DP02_4a_Introduction_to_the_Butler.ipynb`.

### 1.1 Package imports <a class="anchor" id="second-bullet"></a>

#### Install Pipeline Package

First, install the Rubin Citizen Science Pipeline package by doing the following:

1. Open up a New Launcher tab
2. In the "Other" section of the New Launcher tab, click "Terminal"
3. Use `pip` to install the `rubin.citsci` package by entering the following command:
```
pip install rubin.citsci
```
Note that this package will soon be installed directly on RSP.

If this package is already installed, make sure it is updated:
```
pip install --upgrade rubin.citsci
```

4. Confirm the next cell containing `from rubin.citsci import pipeline` works as expected and does not throw an error

In [None]:
from matplotlib import image as mpimg
import matplotlib.pyplot as plt
from rubin.citsci import pipeline
import utils
import os
import pandas as pd
import fake.package

plt.style.use('tableau-colorblind10')

### 1.2 Define functions and parameters <a class="anchor" id="third-bullet"></a>
First, [create a Zooniverse account](https://www.zooniverse.org/accounts/registerhttps://www.zooniverse.org/accounts/register) and create your Zooniverse project.

IMPORTANT: Your Zooniverse project must be set to "public", a "private" project will not work. Select this setting under the "Visibility" tab, (it does not need to be set to live). 

Supply the email associated with your Zooniverse account, and then follow the instructions in the prompt to log in and select your project by slug name. 

A "slug" is the string of your Zooniverse username and your project name without the leading forward slash, for instance: "username/project-name". [Click here for more details](https://www.zooniverse.org/talk/18/967061?comment=1898157&page=1).

**The `rubin.citsci` package includes a method that creates a Zooniverse project from template. If you wish to use this feature, do not provide a slug_name and run the subsequent cell.**

In [None]:
email = ""
cit_sci_pipeline = pipeline.CitSciPipeline()
cit_sci_pipeline.login_to_zooniverse(email)

**Run the following cell if you would like to create a new Zooniverse project from the Vera Rubin template**

In [None]:
cit_sci_pipeline.create_new_project_from_template()

## 2. Make a subject set to send to Zooniverse <a class="anchor" id="fourth-bullet"></a>
A subject set is a collection of data (images, plots, etc) that are shown to citizen scientists. It is also the unit of data that is sent to Zooniverse.

This notebook curates a subject set of objects to send to Zooniverse. This can be modified to create your own subject set. Your subject set must have 100 objects or less in the testing phase before your project is approved by the EPO Data Rights panel. 

This example makes a set of image cutouts of galaxies.

In [None]:
print('Establishing the connection to the Butler')
config = "dp02"
collection = "2.2i/runs/DP0.2"
service, butler, skymap = utils.setup_query_tools(config, collection)
print('Connected')

In [None]:
print('Setting the parameters for making image cutouts')
number_sources = 5  # change this to 100 for a full subject set test
use_center_coords = "62, -37"
use_radius = "1.0"  # increased from 1 to 10 for a larger search

This query can be modified to select other types of sources.

For more details, please have a look at the RSP tutorial notebooks (`/home/your_username/notebooks/tutorial-notebooks`).

In [None]:
print('Running the TAP query to return objects')
results = utils.run_tap_query(
    service, number_sources, use_center_coords, use_radius)

In [None]:
print('Preparing the table')
results_table = utils.prep_table(results, skymap)

Have a look at the table you'll use to save the cutout images.

In [None]:
results_table

## 3. Create a manifest file <a class="anchor" id="fifth-bullet"></a>

A manifest file is a csv file that is used to send all of the classification subjects to the Zooniverse. This file can be used to initiate options on the Zooniverse side. [Click here for an overview](https://about.pfe-preview.zooniverse.org/lab-how-to)

To send data other than the example cutout images, edit the `make_manifest_with_images` utility. Note that Object ID must be included.

In [None]:
print('Specify the directory that the cutouts will be output to')
batch_dir = "./cutouts/"
print(
    "Make the manifest file and "
    "save both the manifest and "
    "the cutout images in this folder: "
    f"{batch_dir}"
)
manifest = utils.make_manifest_with_deepcoadd_images(
    results_table, butler, batch_dir)

Have a look at some of the cutout images. 

The following cell will plot all images from the `batch_dir` preceded by their image names. The axes are pixel values, with ranges according to your preset radius. These are large co-added images centered on different coordinates; they contain many galaxies and stars.

In [None]:
for file in os.listdir(batch_dir):
    if file.endswith('.png'):
        plt.title(file)
        image = mpimg.imread(batch_dir + file)
        plt.imshow(image)
        plt.axis('off')
        plt.show()

There are multiple options for how to create the manifest file.
### 3.1 Option 1: Write the manifest file to the filesystem automatically
The below cell writes the `manifest.csv` file to the filesystem, which will be used by Zooniverse. This is the recommended option for PIs new to the citizen science pipeline.

In [None]:
manifest_path = cit_sci_pipeline.write_manifest_file(manifest, batch_dir)

### 3.2 Option 2: Make your own manifest file

PIs are welcome to create their own manifest file. This is not the recommended option for PIs new to the citizen science pipeline.

The manifest file _must_ abide by [RFC4180](https://datatracker.ietf.org/doc/html/rfc4180.html) as the backend service that parses the manifest file expects this format. In addition, you may have a column with no values, but there _must_ be an empty column value indicated with a comma. For example:

Valid syntax for empty column:
```
column1,column2,empty_column,column4
1,1,,4
1,1,,4
1,1,,4
```

**Important**: The manifest file must be named `manifest.csv` in order for the processing on the backend to work correctly.

## 4. Send the data to Zooniverse <a class="anchor" id="sixth-bullet"></a>
Zip up the data and send it to the Zooniverse.

#### 4.1 Zip up the data
Running the below cell will zip up all the cutouts into a single file - this can take 5 to 10 minutes for large data sets (> 5k cutouts).

In [None]:
zip_path = cit_sci_pipeline.zip_image_cutouts(batch_dir)

#### 4.2 Send image data

This cell will let PIs send one subject set. Name the subject set as it will appear on Zooniverse.

Running this cell will also initiate the data transfer and make your data available on the Zooniverse platform.

In [None]:
subject_set_name = ""
cit_sci_pipeline.send_image_data(subject_set_name, zip_path)

## 5. Retrieve the classification data from Zooniverse <a class="anchor" id="seventh-bullet"></a>
There are two ways to do this:

1) Go to your Zooniverse project and downloading the output csv files found on the 'Data Exports' tab. Click the 'Request new classification report' button and per Zooniverse: "Please note some exports may take a long time to process. We will email you when they are ready. You can only request one of each type of data export within a 24-hour time period."

2) Programatically (as demonstrated below). There are two ways to do this.

Find the `project_id` on Zooniverse by selecting 'build a project' and then selecting the project. Note that you don't need to be the project owner.

In [None]:
print('Retrieve the classifications from Zooniverse')
project_id = 19539
raw_clas_data = cit_sci_pipeline.retrieve_data(project_id)

counter = 0
list_rows = []

If the following cell throws an error, restart the kernel and rerun the cell.

In [None]:
for row in raw_clas_data:
    if counter == 0:
        header = row
    else:
        list_rows.append(row)
    counter += 1
df = pd.DataFrame(list_rows, columns=header)
df