## A Comparison of plotting with Matplotlib v Bokeh

Description:

1. We extract Galaxies from the object catalogues using the TAP  

2. We explore creating colo-magnitude diagrams for galaxies, with Matplotlib and Bokeh in order understand different functionalities 

2. We fit a line to the galaxy main sequence 

3. We add annotations to the plots in section 2.1 <br>

Contact author: Vicente Puga <br>
Last verified to run: 2023-04-02 <br>
Targeted learning level: beginner <br>

Credit: This notebook uses code from DP0 Tutorials 1 and 2

## Introduction 
Matplotlib and Bokeh are both methods of plotting and for graphing data on Python. In this notebook the qualities of each will be discussed. This will allow for a better understanding of how to use these two methods, as well as each is a better fit for creating color-color and color-magnitude diagrams. 

This notebook will also go into adding a line of best fit to a color-magnitude plot of a galaxy cluster. This galaxy cluster will have what is known as a red sequence. Which, in short, is a a linear feature that stands out in the color-magnitude plot for galaxies in the same physical cluster. A line of best fit will be introduced to these plots to better understand and quantify this trend. Annotations will be included to the plots to add clarity to help make the main ideas of plots clearer to the reader.

## 1. Extracting Data
Here you will be importing the packages required to use Bokeh and Matplotlib. Important data constants will also be imported to be able to use and manipulate the lsst data. 

The most important of the important packages below Matplotlib. This package will allow us to make our color-color and color-magnitude diagrams. The packages involving pandas are used for managing data, such as creating tables etc. Astropy will help us use astronomical values units which will be important when describing the specific data we want from the query. Numpy is what will let us do calculations, which will come into play when finding flux values from magnitude. 

In [None]:
# Import general python packages
import time
import gc
import numpy as np
import matplotlib.pyplot as plt
import pandas
from pandas.testing import assert_frame_equal
from astropy import units as u
from astropy.coordinates import SkyCoord



Below, Bokeh and its complimentary packages will be imported to allow us full use of its plotting features.

In [None]:
# Bokeh and holoviews for interactive visualization
import bokeh
from bokeh.io import output_file, output_notebook, show
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, CDSView, GroupFilter, HoverTool
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
import holoviews as hv




This cell imports lsst data packages which we can use to call the data we need for plotting. 

In [None]:
# Import the Rubin TAP service utilities
from lsst.rsp import get_tap_service, retrieve_query

# To ignore some kinds of warnings
import warnings
from astropy.units import UnitsWarning

In this cell, some adjustment is done to the data displayed with pandas. 

In [None]:
pandas.set_option('display.max_rows', 20) #Changes the number of rows displayed

output_notebook() #output will be inline in with the notebook output cell

warnings.simplefilter("ignore", category=UnitsWarning) #stops warnings from interrupting code

The cell below gives us the ability to compare the data from the query regardless of the order it is returned in. The `set_index` is used to reset incremental index

In [None]:
def sort_dataframe(df, sort_key='objectId'):
    df = df.sort_values(sort_key)
    df.set_index(np.array(range(len(df))), inplace=True)
    return df

In [None]:
service = get_tap_service()
assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

Before we search from the database we need to set constants that specify the parameters of our search. Defining these constants makes it easier to code our search from the query. This will also help if the search parameters need to be changed, instead of going into the cell with the search code and changing each instance these values appear we can simply change the definitions of these constants. 

In [None]:
center_coords = SkyCoord(62, -37, frame='icrs', unit='deg') #set the coordinates we want to center our search around
search_radius = 0.5*u.deg  #the radius of the search based on the coordintes chosen

max_rec = 5 #maximum records to return

Mag_Min = 23.0 #The maximum magnitude of objects we want from our search
Mag_Max = 16.0 #The minimum magnitude of objects we want from our search

print(center_coords)
print(search_radius)


The search query will also need the center coords in the form of a string, so we can define those here

In [None]:
use_center_coords = "62, -37" 
use_radius = "0.5"

The magnitude scale defines larger values as dimmer objects and smaller values as brighter objects. Flux however works as one would expect, large values are brighter and smaller values are dimmer. Sometimes instead of using a range of magnitudes we can use a range of fluxes to describe our search. Here we do just that by using the conversion equation between magnitude and flux. The maximum and minimum flux values will be used in the search, however the same search could also be done with the previously defined `Mag_Min` and `Mag_Max`. 

In [None]:
Max_Flux = 10**((Mag_Max-31.4)/-2.5)
Min_Flux = 10**((Mag_Min-31.4)/-2.5)

print("Min Flux is, ", Min_Flux)
print("Max Flux is, ", Max_Flux)

This cell contains the code used to access data with query. All the constants defined before will be used here to extract the specific objects that meet our criteria. At the end of the code the line reading `results_table` places everything in a table that will let us easily access the desired values needed to form our plots. (This cell might take several minutes to run)

In [None]:
results = service.search("SELECT objectId, detect_isPrimary, " 
        "coord_ra AS ra, coord_dec AS dec, "
        "scisql_nanojanskyToAbMag(g_cModelFlux) AS mag_g_cModel, "
        "scisql_nanojanskyToAbMag(r_cModelFlux) AS mag_r_cModel, "
        "scisql_nanojanskyToAbMag(i_cModelFlux) AS mag_i_cModel, "
        "scisql_nanojanskyToAbMag(u_cModelFlux) AS mag_u_cModel, "
        "r_extendedness "
        "FROM dp02_dc2_catalogs.Object "
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " 
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 "
        "AND detect_isPrimary = 1 "
        "AND g_cModelFlux >" + str(Min_Flux) + 
        "AND g_cModelFlux <" + str(Max_Flux) +
        "AND r_cModelFlux >" + str(Min_Flux) +
        "AND r_cModelFlux <" + str(Max_Flux) +
        "AND i_cModelFlux >" + str(Min_Flux) +
        "AND i_cModelFlux <" + str(Max_Flux) +
        "AND u_cModelFlux >" + str(Min_Flux) +
        "AND u_cModelFlux <" + str(Max_Flux) +
        "AND r_extendedness IS NOT NULL ")

results_table = results.to_table().to_pandas()



In [None]:
results_table

## 2. Using Matplotlib to plot
Here Matplotlib is used to plot a color-color diagram and a color-magnitude diagram. Annotations are added as needed to convey the information in a more precise way. We will also manipulate data to get different plots and see how much that can complicate the code. 

First,data is called and put into a table where it can be viewed before converting into a plot. However, this time the table will only contain objects which are stars. This is done by setting the extendedness equal to 0, since an object with an extendedness value of zero measn the object is a point source; a star. 

In [None]:
stars=results_table[results_table['r_extendedness']==0]
stars

Now we will plot stars and galaxies superimposed on the same plot. There is not feature on Matplotlib to plot both stars and galaxies at the same time in different colors or distinguishig symbols. So to accomplish this we use the same code twice to plot but distinguish between stars and galaxies, ie. plotting galaxies in one color then plotting again with stars as another color. This gives the result below. 

In [None]:
#Here we are naming certain columns from the table variables that are easier to code with
data = {'imag': results_table['mag_i_cModel'], 
        'rmag': results_table['mag_r_cModel']}
        
data['rmi'] = data['rmag'] - data['imag']

plt.scatter('rmag', 'rmi', s=6, color='blue', data=data)
plt.title('Colour-Magnitude Diagram')

#This sets axis range
plt.xlim(23.2, 15.7)
plt.xlabel('r')
plt.ylabel('r-i')
plt.grid(True)

#overlaying stars onto galaxies
data = {'imag1': stars['mag_i_cModel'], 
        'rmag1': stars['mag_r_cModel']}
        
data['rmi1'] = data['rmag1'] - data['imag1']

plt.scatter('rmag1', 'rmi1', s=6, color='green', data=data)

#Creating a legend to identify each
plt.legend(['Galaxies', 'Stars']);

plt.show()

We can also plot a color-color diagram which works the exact same way. But now we simply put a color on the x and y axis. 

## 2.1. Using Annotations

In [None]:
data = {'gmag': results_table['mag_g_cModel'], 
        'rmag': results_table['mag_r_cModel'],
        'imag': results_table['mag_i_cModel']}
        
# Solving for the color index of g-r and r-i by taking the columns of those magnitudes and performing simple math    
data['gmr'] = data['gmag'] - data['rmag']
data['rmi'] = data['rmag'] - data['imag']

colorcolor=plt.scatter('gmr', 'rmi', s=6, color='green', alpha=0.3, label='Lighten', data=data)
plt.title('Colour-Color Diagram')
plt.xlim(-0.7, 2.0)
plt.ylim(-0.5, 1.5)
plt.ylabel('r-i')
plt.xlabel('g-r')
plt.grid(True)

#overlaying stars onto galaxies 

data = {'gmag1': stars['mag_g_cModel'], 
        'rmag1': stars['mag_r_cModel'],
        'imag1': stars['mag_i_cModel']}
        
data['gmr1'] = data['gmag1'] - data['rmag1']
data['rmi1'] = data['rmag1'] - data['imag1']

plt.scatter('gmr1', 'rmi1', s=6, color='black', data=data)

plt.legend(['Galaxies', 'Stars']);


#This step adds annotation on the plot so we can better understand what information the plot is conveying. 
#The first four entries in plt.arrow are: inital x-position, inital y-position, x-length, and y-length. In that order.
plt.arrow(0.6, -0.4, -1, 0, head_width = 0.08,
          width = 0.02)
plt.arrow(0.6, -0.4, 1, 0, head_width = 0.08,
          width = 0.02, color='red')

#Text annotation
plt.annotate('Redder', xy=(0, 0), xytext=(1, -0.3), color='red')
plt.annotate('Bluer', xy=(0, 0), xytext=(0, -0.3), color='blue')

plt.show()

## 3. Using Bokeh to Plot
This method begins similarly to the Matplotlib method

Start by restating the center coordinates used before. But now, we want them to have the dataype float, instead of a string like in Section 1. This should yield the same ra and dec we chose in Section 1.

In [None]:
center_ra = center_coords.ra.deg
center_dec = center_coords.dec.deg
print(center_ra, center_dec)

This section is how to dinstinguish the stars from galaxies and plot them on the same graph but in different colors. This is similar to before where we made a new table for all stars by only calliong objects with an extendness of 0. Here it is just done in one step by defining staras as objects with an extendedness of 0 and galaxies as objects with an extendedness of 1.

In [None]:
object_map = {0.0: 'star', 1.0: 'galaxy'}

Here we define the data we will use to make the plots like we did in the previous section.

In [None]:
data = dict(ra=results_table['ra'], dec=results_table['dec'],
            target_ra=results_table['ra']-center_ra,
            target_dec=results_table['dec']-center_dec,
            rmi=results_table['mag_r_cModel']-results_table['mag_i_cModel'],
            gmag=results_table['mag_g_cModel'],
            rmag=results_table['mag_r_cModel'],
            imag=results_table['mag_i_cModel'])
source = ColumnDataSource(data=data)

# Additional data can be added to the Column Data Source after creation
source.data['objectId'] = results_table['objectId']
source.data['r_extendedness'] = results_table['r_extendedness']

`object_type` is the name of the collection of all the objects and what they are classified as based on the description we gave in the `object_map`. We can call this in the code for plotting so that Bokeh can distinguish stars and galaxies, and plot simultaneously instead of having to overlay two seperate plots. 

In [None]:
source.data['object_type'] = results_table['r_extendedness'].map(object_map)
source.data['object_type']

Now we plot using Bokeh

In [None]:
#Adjusting the specifications of the plot
plot_options = {'plot_height': 400, 'plot_width': 400,
                'tools': ['box_select', 'reset', 'box_zoom', 'help']}

#This is to add the interactive features
tooltips = [
    ("Col (r-i)", "@rmi"),
    ("Mag (g)", "@gmag"),
    ("Mag (r)", "@rmag"),
    ("Mag (i)", "@imag"),
    ("Type", "@objectId")
]
hover_tool_cmd = HoverTool(tooltips=tooltips)

#Plotting
p = figure(title="Colour - Magnitude Diagram",
           x_axis_label='r', y_axis_label='g-r',
           x_range=(23.2, 16),
           **plot_options)

#Adding the hover tool and setting the plot features
object_type_palette = ['darkred', 'green']
p.add_tools(hover_tool_cmd)
p.circle(x='rmag', y='rmi', source=source,
         size=3, alpha=0.6,
         legend_field="object_type",
         color=factor_cmap('object_type',
                           palette=object_type_palette,
                           factors=['star', 'galaxy']),
         hover_color="darkblue")

#displaying
show(p)


## 4. Adding line of best fit

Using Matplotlib you can easliy plot a line of best fit as well. Here we will add a line of best fit to the color-magnitude diagram from before. 

Here we will be working with galaxies. To extract all the galaxies from the data we write the same code we did to find the stars, except this time we set extendedness equal to 1, which indicates a spread out light source. 

In [None]:
galaxies=results_table[results_table['r_extendedness']==1]
galaxies

This code exactly the same as the code for the Matplotlib section up until the part where we code the line of best fit. 

In [None]:
data = {'imag': galaxies['mag_i_cModel'], 
        'rmag': galaxies['mag_r_cModel']}
        
data['rmi'] = data['rmag'] - data['imag']

plt.scatter('rmag', 'rmi', s=6, color='blue', data=data)
plt.title('Colour-Magnitude Diagram')
plt.xlim(23.2, 15.7)
plt.ylim(-0.5, 1.5)
plt.xlabel('r')
plt.ylabel('r-i')
plt.grid(True)


x = data['rmag']
y = data['rmi']

#find line of best fit
a, b = np.polyfit(x, y, 1)

#We use a*x+b because we want a line, this part of the code can be modified depending on the type of fit desired
plt.plot(x, a*x+b, color='red')        

#saving the final plot
plt.savefig('RedSequenceLine.png')


## 5 Conclusions 
Matplotlib delivers a very easy using experince. As can be seen by the code written. The layout is very practical and stratightforward. It is also very simple to add annotations such as arrows and text. One of the cons is how you cannot plot two different sets of data simulataneously, however this is almost no issue due to the fact that overlaying two plots works just as well, and requires very minimal effor. Adding the line of best fit was also a trivial task, which is very useful when working with data and performing analysis. 

Bokeh has many more features included in the plot, the most prominent of which is the hover tool. This tool displays information about specific data points on the graph. This would be something that is very useful in a web document or any work that is shared electronically. However not so much on a printed document. Bokeh is also not as straightforward as Matplotlib is, the plot options, tools and other features are not as intuitive. That being said they are not necessary to plot. If you were to go back and comment that code out the plot will still appear, however it would lack any of the features that make Bokeh special. 

Matplotlib is perhaps the most efficient method to use for plotting if the goal is to create cohesive and simple plots that can be added to papers or lab reports. It allows for easy modifications and requires very little code to do so. Bokeh however would be very useufl for webistes or virtual reports, anything where the reader can interact with the document. And if the extra time to figure out the slight learning curve is no issue, it also provides a slightly nicer format that has a more distinguished appearance. 