# SDSS Color-Magnitude Diagram
## By Katya Gozman
### May 15, 2021

## README

Welcome to my SDSS galaxy color-magntiude diagram (CMD) viewer!! This tool can be used to read a table of galaxies with known magnitudes to create an interactive CMD where users can click on each bin of the CMD to display galaxies that are representative of that color and magnitude range.

**Requirements:**
- python (I use python3)
- jupyter notebooks
- Common packages: numpy, pandas, PIL, requests
- plotly, an interactive data visualization library
    - Either pip install plotly==4.14.3 or conda install -c plotly plotly=4.14.3
    - Visit https://plotly.com/python/getting-started/ for more information about plotly
- A csv file that has objid,u,g,r,i,z,ra,dec columns for SDSS galaxies
    - I am using a csv of the top 1000 Galaxy Zoo galaxies with more than 10 votes
- An image of just a black white background to serve as a placeholder for bins with no galaxies in your working directory
    - I use a file I found on google called "white.jpg"
    
This code has a cell that will create a folder in your working directory called gal_images1 and populate it with cutout images of all the galaxies in your csv file. The images are only ~ 1 kb in size, so they shouldn't take up too much space, but make sure you have room on your computer. This will take a few minutes, because there are 1000 images to get! Good news is that you only have to run this cell once, after that all the galaxies should just be stored on your computer and you can modify your plot/code to your heart's content.

Once you've run all the cells, you should see a color-magnitude diagram pop up of all the galaxies in your input csv file. Click on any of the bins of the CMD histogram to see a selection of galaxies representative of that color and magnitude bin. The magnitude and color bin you are clicking on is displayed in a box around your mouse cursor when you hover over any bin as the x and y values, respectively. The z value will show you the number of galaxies in that bin. If a bin has more than 3 images, I simply take the first 3 images that plotly tells me are in that bin. If you click and no images appear, that means there are no galaxies in your dataset that correspond to that color and magnitude range.

There are also two customization options above the plot. With the colormap buttons, you can chose from four different colormaps to display the CMD in based on your visual preferences. You can also reverse the colormap (i.e. if the original colormap is a gradient going from red to blue, reversing would make it go from blue to red) using the "reverse colormap" true/false buttons. This does not change anything about the data, only how it is visually displayed to you.


There are also some plot options that are built-in with plotly that are displayed as little grey buttons above and to the right of the CMD (to the left of the first galaxy image that appears). You can hover over them to see what they do. You can zoom in or out, autoscale, pan, toggle spikelines, and more.

If you'd like to run this code, I have provided my csv file called results.csv and placeholder image white.jpg in the folder with the notebook, so you should be able to run this without any modification to the code.

Please note that this notebook is a very rough and unpolished, first version of this program. I made this as a proof-of-concept to see if I could make a tool like this that is akin to the Voyages old activity about galaxy CMDs that no longer works. I have some to-dos/questions that I thought about when making this in the very last cell of this notebook. For example, one of my biggest worries right now is needing to download all the images on a computer locally. This could probably be changed if this was a web activity vs. a notebook a user would run. In addition, I would love for a user to be able to import a dataset through a button/GUI menu instead of changing code, but this only seems to be an option if using plotly dash, a tool that lets you make interactive dashboards/web apps. 

Disclaimer: I am new to plotly (this is my first time using it!) or making interactive visualizations in general, so apologizes for any confusing or badly-written code. I relied heavily on plotly tutorials and examples to create this, which can be found at https://plotly.com/python/. 

Hope you enjoy this visualization!!


In [1]:
###imports###
import pandas as pd
import numpy as np
from PIL import Image
import requests
from io import BytesIO
import io
import os
import plotly

###read in the csv of galxies with objid, u,g,r,i,z,ra,dec columns###
###CHANGE FILENAME of your csv accordingly in the below line###
galaxies_df = pd.read_csv('result.csv', names=['objid','u','g','r','i','z','ra','dec'], dtype={'objid':str,'u':float,'g':float,'r':float,'i':float,'z':float,'ra':str,'dec':str}, skiprows=[0])
galaxies_df.sample(5)
ra = galaxies_df['ra']
dec = galaxies_df['dec']
gmag = (galaxies_df['g'])
rmag = (galaxies_df['r'])
imag = (galaxies_df['i'])
color = gmag-imag
galaxies_df.head()

Unnamed: 0,objid,u,g,r,i,z,ra,dec
0,1237657190906265729,19.644669,18.754942,18.435432,18.147774,18.133595,2.33899169,-0.14125159
1,1237663783661535442,19.704124,18.307823,17.795338,17.488132,17.289001,2.34967519,-0.40003004
2,1237652947452952698,19.327667,17.53224,16.642637,16.231489,15.915926,2.35221205,-10.09399097
3,1237657189832523967,20.325781,18.774132,17.830229,17.24284,16.846958,2.36426256,-0.91197187
4,1237652946916082051,21.592339,21.095518,19.282438,18.635124,18.151045,2.38843263,-10.47639727


In [400]:
###
'''
Cell to extract all images corresponding to the input csv objects and store them locally.
This will download a jpeg of each galaxy and store it to a folder called "gal_images1" in your
notebook's directory. Each image should be around 1 kb.
*****RUN ONLY ONCE TO EXTRACT IMAGES!!*******
'''
###

def extractOnlineImage(RA, dec, galID):
    imgurl = "http://skyserver.sdss.org/dr15/SkyServerWS/ImgCutout/getjpeg?TaskName=Skyserver.Chart.Image&ra={}&dec={}&scale=1&width=64&height=64".format(RA, dec)
    #filedata = urllib.request.urlopen(imgurl)
    #img = io.BytesIO(filedata.read())
    r = requests.get(imgurl)
    #print(r)
    img = Image.open(BytesIO(r.content))
    try: 
        os.makedirs('gal_images1')
    except OSError:
        pass
    img.save('gal_images1/%s.jpeg'%galID)
    


for i,row in galaxies_df.iterrows():
 
    extractOnlineImage(row['ra'], row['dec'], row['objid'])


In [2]:
###creates dictionaries of key=galaxy objid and value=image array to access image information###
image_data = {}
for img_filename in os.listdir('gal_images1'):
    galID = img_filename.split('.')[0]
    with open(f"gal_images1/{img_filename}", "rb") as f:
        b = f.read()
        image_data[galID] = b
        
###Need a blank/white image to act as placeholder for when you mouse over area without any galaxies###
###I downloaded a random jpg that was a white background called "white.jpg" and used that###
###CHANGE FILENAME AS NEEDED###
empty = {}        
with open(f"white.jpg", "rb") as f:
        b = f.read()
        empty['white'] = b

In [3]:
###Displays an image from the dataset--sanity check###
from ipywidgets import Image
Image(value=image_data['1237649918963548406'])

Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x0…

In [4]:
###imports necessary plotly plotting library###
import plotly.graph_objs as go

In [5]:
###Creates initial figure (2d historgram of i vs g-i)###
fig = go.FigureWidget(
    data=[
        dict(
            type='histogram2d',
            x=imag,
            y=gmag-imag,
            #mode='markers',
            nbinsx=100,
            nbinsy=100,
            colorscale='viridis',
        
        )
    ],
    layout=go.Layout(height=600, width=600)
)
fig.layout.xaxis.title = 'i-band magnitude'
fig.layout.yaxis.title = 'g-i color'
#fig.layout.title = 'SDSS Galaxy CMD'
#fig.layout

In [6]:
fig

FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
               …

In [7]:
###Extract data from the figure###
scatter = fig.data[0]

In [8]:
###Update hovermode so that the program recognizes closest click or hover on a histogram cell###
fig.layout.hovermode = 'closest'
#fig.layout.clickmode = 'event'
#fig.update_layout(clickmode='event+select')

In [9]:
#scatter.text = galaxies_df['objid']
#scatter.hoverinfo = 'text'


In [10]:
###Creates widgets for the white placeholder images for areas without galaxy data###
from ipywidgets import Image, Layout
image_widget1 = Image(
    #value=image_data['1237648720693887211'],
    value = empty['white'],
    layout=Layout(height='200px', width='200px')
)
image_widget2 = Image(
    #value=image_data['1237648720693887211'],
    value = empty['white'],
    layout=Layout(height='200px', width='200px')
)
image_widget3 = Image(
    #value=image_data['1237648720693887211'],
    value = empty['white'],
    layout=Layout(height='200px', width='200px')
)

In [11]:
###Function that shows images on hover or click###
###Displays up to 3 representative galaxies in each histogram bin###
def hover_fn(trace, points, state):
    ###gets list of galaxy indices that are in the bin you click on###
    inds = points.point_inds
    if len(inds) == 0:
        image_widget1.value = empty['white']
        image_widget2.value = empty['white']
        image_widget3.value = empty['white']
    else:
        # Update image widget
        galID1 = galaxies_df['objid'][inds[0]]
        image_widget1.value = image_data[galID1]

        if len(inds) < 2 :
            image_widget2.value = empty['white']
        else:
            galID2 = galaxies_df['objid'][inds[1]]
            image_widget2.value = image_data[galID2]

        if len(inds) < 3:
            image_widget3.value = empty['white']
        else:
            galID3 = galaxies_df['objid'][inds[2]]
            image_widget3.value = image_data[galID3]

            
### You can change whether you want images to be displayed on mouse hover or mouse click ###
### Right now it's set to on click ###
#scatter.on_hover(hover_fn)
scatter.on_click(hover_fn)

In [12]:
# Update plot sizing
fig.update_layout(
    width=600,
    height=600,
    autosize=False,
    margin=dict(t=150, b=0, l=0, r=0),
)

"""fig.update_scenes(
    aspectratio=dict(x=1, y=1, z=0.7),
    aspectmode="manual"
)"""
button_layer_1_height = 1.3
button_layer_2_height = 1.19
from ipywidgets import HBox, VBox
fig.update_xaxes(autorange="reversed") #reverse the x-axis as is traditional
fig.update_layout(clickmode='event')
###Creates user button options for colorscale###
fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(
                    args=["colorscale", "Viridis"],
                    label="Viridis",
                    method="restyle"
                ),
                dict(
                    args=["colorscale", "Cividis"],
                    label="Cividis",
                    method="restyle"
                ),
                dict(
                    args=["colorscale", "Blackbody"],
                    label="Blackbody",
                    method="restyle"
                ),
                dict(
                    args=["colorscale", "Rdylbu"],
                    label="Red-Yellow-Blue",
                    method="restyle"
                ),
            ]),
            type = "buttons",
            direction="right",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0,
            xanchor="left",
            y=button_layer_1_height,
            yanchor="top"
        ),
        dict(
            buttons=list([
                dict(
                    args=["reversescale", False],
                    label="False",
                    method="restyle"
                ),
                dict(
                    args=["reversescale", True],
                    label="True",
                    method="restyle"
                )
            ]),
            type = "buttons",
            direction="right",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.15,
            xanchor="left",
            y=button_layer_2_height,
            yanchor="top"
        ),
    ]
)

###Annotate button options###
fig.update_layout(
    annotations=[
        dict(text="Colorscale", x=0, xref="paper", y=1.35, yref="paper",
                             align="left", showarrow=False),
        dict(text="Reverse<br>Colorscale", x=0, xref="paper", y=1.17,
                             yref="paper", showarrow=False),
    ])

###Create & display everything###
HBox([fig,
      VBox([image_widget1, image_widget2, image_widget3])])




HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

In [13]:
####TO DOs#####
'''
-GET A BETTER DATASET (this is galaxy zoo, which is better than previous in images)
    -I have dataset from old activitity now but contains 10x more images...
-Images stored locally? Or can I call to the image for each point...seems slow?
-Add histogram vs scatter plot option?
-Alternatively, option to overlay a scatter plot
-Import your own data--can't find a way to do this in jupyter, have to use plotly dash I think
-Select whether you want hover or on click images
-Select colormap
    -Preliminarily done, need more/different options?
-Select the magnitudes a user would like to calculate color

'''

"\n-GET A BETTER DATASET (this is galaxy zoo, which is better than previous in images)\n    -I have dataset from old activitity now but contains 10x more images...\n-Images stored locally? Or can I call to the image for each point...seems slow?\n-Add histogram vs scatter plot option?\n-Alternatively, option to overlay a scatter plot\n-Import your own data--can't find a way to do this in jupyter, have to use plotly dash\n-Select whether you want hover or on click images\n-Select colormap\n    -Preliminarily done, need more/different options?\n\n"