# Data Preparation 

This notebook discuss how we have manually labelled spiral galaxies using an [online GUI](http://edd.ifa.hawaii.edu/inclination/) in a collaborative project. Users of the interface are asked to situate a target galaxy within a lattice of galaxies with established inclinations. In this graphical interface, we use the colorful images provided [SDSS](https://www.sdss.org/) as well as the `g`, `r` and `i` band images generated for our photometry program. These latter are presented in black-and-white after re-scaling by the `asinh` function to differentiate more clearly the internal structures of galaxies. The inclination of standard galaxies were initially measured based on their `I`-band axial ratios.

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 

Image(url= "http://edd.ifa.hawaii.edu/inclination/helpics/drag_drop.png")

Each galaxy is compared with the standard galaxies in two steps. First, the user locates the galaxy among nine standard galaxies sorted by their inclinations ranging between 45 and 90 degrees in increments of `5` degrees. 
In step two, the same galaxy is compared with nine other standard galaxies whose inclinations are one degree apart and cover the 5 degrees interval found in the first step. At the end, the inclination is calculated by averaging the inclinations of the standard galaxies on the left/right-side of the target galaxy. 

In [2]:
Image(url= "http://edd.ifa.hawaii.edu/inclination/helpics/trash.png")

We take the following precautions to minimize user dependant and independent biases:

- We round the resulting inclinations to the next highest or smallest integer values chosen randomly. 
- At each step, standard galaxies are randomly drawn with an option for users to change them randomly to verify their work or to compare galaxies with similar structures. 
- To increase the accuracy of the results, we catalog the median of at least **three** different measurements preformed by different users. 
- Users may **reject** galaxies for various reasons and leave comments with the aim of avoiding dubious cases.

The uncertainties on the measured inclinations are estimated based on the statistical scatter in the reported values by different users.

A more detailed discussions of these measurements and their uncertainties are presented in these papers: 

- Global Attenuation in Spiral Galaxies in Optical and Infrared Bands (**Journal ref:** Kourkchi et al.,2019, ApJ, 884, 82, [arXiv:1909.01572](https://arxiv.org/pdf/1909.01572))
- Cosmicflows-4: The Catalog of ~10000 Tully-Fisher Distances (**Journal ref:** Kourkchi et al., 2020, ApJ, 902, 145, [arXiv:2009.00733](https://arxiv.org/pdf/2009.00733)) 

## Data Product

[Galaxy Inclination Zoo](http://edd.ifa.hawaii.edu/inclination/) stores all of of the outputs in a `SQL` database. Each time a user sorts a galaxy in the GUI, the database gets updated.

We have devided the users of the project into two groups:
1. Undergraduate students of the Unviersity of Hawaii at Manoa
2. Citizen scientists and astronomy enthusiasts who helps us in this project

The output tables of the SQL database have been stored in two tables:
- `EDD_incNET_Manoa.20190524.txt` for the UH students
- `EDD_incNET_Guest.20190524.txt` for the guest users


In [3]:
import sys
import os
import subprocess
import glob
from math import *
import numpy as np
from datetime import *
from pylab import *
import matplotlib as mpl
from matplotlib.widgets import Slider, Button, RadioButtons
import matplotlib.pyplot as plt
from astropy.table import Table, Column 
from mpl_toolkits.axes_grid1 import make_axes_locatable
from optparse import OptionParser
from PIL import Image#, ImageTk
from subprocess import Popen, PIPE
import matplotlib.patches as patches
import scipy.misc as scimisc
import pandas as pd

## Output format

- Each user is identified by his/her email address. 
- `inc` is the mesured inclnation at each instance
- `pgcID` is the ID of the sorted target galaxy whose inclination has been evaluated
- `pgcID1` and `pgcID2` are the ID of the galaxies that are located in the left and right of the target galaxy by user
- `flag` = 0 if galaxy has been accepted, = 1 if user has decided to reject the galaxy
- `email` is the user email address
- Other columns are self explanatory. They hold some data logs that are produced by user. If a user decide to reject a galaxy for specific reasons, they would be recorded in the relevant columns.

In [4]:
Manoa = pd.read_csv('EDD_incNET_Manoa.20190524.txt', delimiter='|')
Manoa = pd.read_csv('EDD_incNET_Guest.20190524.txt', delimiter='|')

Manoa.head()

Unnamed: 0,id,pgcID,inc,pgcID1,pgcID2,flag,not_sure,better_image,bad_TF,ambiguous,...,HI,face_on,not_spiral,multiple,note,email,inputTable,checkoutTime,checkinTime,ip
0,12246,99866,76.5,1461,4192.0,0,0,0,0,0,...,0,0,0,0,,dummy,Input_Guest,2019-05-23 02:27:34,2019-05-23 02:28:00,78.239.55.56
1,12245,51425,74.5,44725,23662.0,0,0,0,0,0,...,0,0,0,0,,dummy,Input_Guest,2019-05-23 02:27:01,2019-05-23 02:27:30,78.239.55.56
2,12244,1696613,90.0,70708,,0,0,0,0,0,...,0,0,0,0,,dummy,Input_Guest_test_calib,2019-05-23 02:26:30,2019-05-23 02:26:58,78.239.55.56
3,12243,2135066,78.5,27451,53318.0,0,0,0,0,0,...,0,0,0,0,A clear neighbour,dummy,Input_Guest,2019-05-23 02:25:40,2019-05-23 02:26:27,78.239.55.56
4,12242,44168,64.5,41531,6235.0,0,0,0,0,0,...,0,0,0,0,,dummy,Input_Guest,2019-05-23 02:24:50,2019-05-23 02:25:34,78.239.55.56
