<img align='left' src = '../images/linea.png' width=150 style='padding: 20px'> 

# DP02 duplicates analysis
## Part 4 - Spatial distribution - Generating by hand in the cluster 

Analysis of duplicates found in the DP02 catalog.

<br> 
<br>

Contact: Luigi Silva ([luigi.lcsilva@linea.org.br](mailto:luigi.lcsilva@linea.org.br))

Last check: October 7, 2024.

#### Acknowledgments

'_This notebook used computational resources from the Associação Laboratório Interinstitucional de e-Astronomia (LIneA) with financial support from the INCT of e-Universe (Process No. 465376/2014-2)._'

'_This notebook uses libraries from the LSST Interdisciplinary Network for Collaboration and Computing (LINCC) Frameworks project, such as the hipscat, hipscat_import, and lsdb libraries. The LINCC Frameworks project is supported by Schmidt Sciences. It is also based on work supported by the National Science Foundation under Grant No. AST-2003196. Additionally, it receives support from the DIRAC Institute at the Department of Astronomy of the University of Washington. The DIRAC Institute is supported by gifts from the Charles and Lisa Simonyi Fund for Arts and Sciences and the Washington Research Foundation._'

# Inputs and configs

Let us import the packages that we will need.

In [None]:
# General
import os
import sys
import math
import numpy as np
import pandas as pd

# Bokeh
import bokeh
from bokeh.io import output_notebook

# Holoviews
import holoviews as hv
from holoviews import opts

# Geoviews
import geoviews as gv
import geoviews.feature as gf
from geoviews.operation import project
import cartopy.crs as ccrs

Let us set the number of rows that pandas will display.

In [None]:
pd.set_option('display.max_rows', 10)

Now, let us configure the plots to use bokeh and to be inline.

In [None]:
hv.extension('bokeh')

In [None]:
gv.extension('bokeh')

In [None]:
output_notebook()

In [None]:
%matplotlib inline

# Spatial distribution

In the following subsections, we will read the data of the 2D histogram corresponding to the **spatial distribution of objects**, considering their distribution in the sky according to their Right Ascension (R.A.) and Declination (DEC) coordinates, and we will make two plots for each considered case.

The first plot uses the **equidistant cylindrical projection (Plate Carrée projection)**, in which the lines corresponding to R.A. are equally spaced vertical straight lines, and the lines corresponding to DEC are equally spaced horizontal straight lines. This projection distorts areas and shapes, especially at high declinations.

The second plot uses the **Mollweide projection**, an equal-area, pseudocylindrical map projection. The Mollweide projection preserves area; however, it distorts shapes, especially near the edges of the sky map. The central meridian and the celestial equator are straight lines, while other lines of R.A. and DEC are represented as curves.

The plots will also have a **colorbar** corresponding to the counts of objects per R.A. and DEC bin of the 2D histogram.

## First case - All the objects without any filter

### Reading the data

Below we can see the output Parquet file structure. The line 0 (type "histogram_ra_dec") contains the counts of the 2D histogram in the "values" column, and the line 1 (type "bins_ra_dec") contains the R.A. bin edges ("ra_bins") and DEC bin edges ("dec_bins") in the "values" column.

In [None]:
### Reading the Parquet file and showing the dataframe.
df_spatial_dist = pd.read_parquet('output/histo_2d_ra_dec.parquet', engine='fastparquet')
df_spatial_dist

We have to convert the lists to numpy arrays.

In [None]:
### Generating new dataframes from the original one, containing the counts, the R.A. bin edges and the DEC bin edges, and converting
### these dataframes to numpy arrays.
histogram_ra_dec = np.array(df_spatial_dist['values'][0])
bins_ra = np.array(df_spatial_dist['values'][1]['ra_bins'])
bins_dec = np.array(df_spatial_dist['values'][1]['dec_bins'])

Now, we show some information about the R.A. bins, DEC bins and the 2D histogram counts.

In [None]:
### Printing the information.
print("INFO - R.A. BINS")
print(f"Min. edge: {bins_ra.min():.2f} | Max. edge: {bins_ra.max():.2f} | Step: {bins_ra[1]-bins_ra[0]:.2f} | Shape: {bins_ra.shape} \n")
print("INFO - DEC BINS")
print(f"Min. edge: {bins_dec.min():.2f} | Max. edge: {bins_dec.max():.2f} | Step: {bins_dec[1]-bins_dec[0]:.2f} | Shape: {bins_dec.shape} \n") 
print("INFO- 2D HISTOGRAM COUNTS")
print(f"Min. count: {histogram_ra_dec.min()} | Max. count: {histogram_ra_dec.max()} | Shape: {histogram_ra_dec.shape}")

### Spatial distribution plot

----------------------------------------------------------------------------------------------
#### Note

In what follows, if cartopy tries to download some file from natural earth, check the path of the cartopy data directory with
```python
import cartopy
print(cartopy.config['data_dir'])
```
Then, download the file manually to the ```shapefiles/natural_earth/physical``` folder inside this directory and unzip it.

----------------------------------------------------------------------------------------------

Before making the plots, we must perform some tasks:

1. Change the 0 values in the 2D histogram counts array to NaN values, so that they appear white in the plot.
2. Compute the centers of the bins.
3. For the Plate Carrée projection, change the R.A. coordinates so that they belong to the range $[−180^{\circ},180^{\circ})$. This is necessary for inverting the x-axis in the plot, a widely used convention. We must also adjust the 2D histogram counts accordingly, so that they agree with the new R.A. range.
4. For the Mollweide projection, invert the R.A. coordinates by doing $360^{\circ} - x$ for all $x$ in the R.A. values. This is just computational artifice, which is necessary for inverting the x-axis in the plot. However, in the final plot, the R.A. and DEC ticks will be correctly showed in the original range, $[0^{\circ},360^{\circ})$. Again, we must also adjust the 2D histogram counts accordingly.
5. Transpose the 2D histogram counts arrays so that they become compatible with HoloViews/GeoViews.

In [None]:
### Changing the 0 values to NaN values.
histogram_ra_dec_NaN = histogram_ra_dec.astype(float)
histogram_ra_dec_NaN[histogram_ra_dec_NaN == 0] = np.nan

### Getting the bins centers.
bins_ra_centers = (bins_ra[1:] + bins_ra[:-1])/2
bins_dec_centers = (bins_dec[1:] + bins_dec[:-1])/2

### Plate Carrée projection - Changing the R.A. coordinates to the range [-180,180), and changing the 2d histogram counts accordingly.
bins_ra_centers_180_range = np.where(bins_ra_centers >= 180, bins_ra_centers - 360, bins_ra_centers)
sorted_indices_180_range = np.argsort(bins_ra_centers_180_range)
histogram_ra_dec_180_range = histogram_ra_dec_NaN[sorted_indices_180_range, :]
bins_ra_centers_180_range = bins_ra_centers_180_range[sorted_indices_180_range]

### Mollweide projection - Inverting the R.A. values (360 - values), and changing the 2d histogram counts accordingly.
bins_ra_centers_inverted = np.where(bins_ra_centers <= 360, 360 - bins_ra_centers, bins_ra_centers)
sorted_indices_inverted = np.argsort(bins_ra_centers_inverted)
histogram_ra_dec_inverted = histogram_ra_dec_NaN[sorted_indices_inverted, :]
bins_ra_centers_inverted = bins_ra_centers_inverted[sorted_indices_inverted]

### Transposing the histogram arrays for the holoviews plots.
histogram_ra_dec_180_range_transpose = histogram_ra_dec_180_range.T
histogram_ra_dec_inverted_transpose = histogram_ra_dec_inverted.T

After these tasks, we are ready to make the spatial distribution plots.

First, the Plate Carrée projection plot. In this plot, **the R.A. values are in the $[−180^{\circ},180^{\circ})$ range**, where the negative values corresponds to values greater than $180^{\circ}$ in the original range, $[0^{\circ}, 360^{\circ})$.

In [None]:
### Creating the image using holoviews.
hv_image_ra_dec = hv.Image((bins_ra_centers_180_range, bins_dec_centers, histogram_ra_dec_180_range_transpose), [f'R.A.', f'DEC'], f'Counts')

### Adjusting the image options.
hv_image_ra_dec = hv_image_ra_dec.opts(
    opts.Image(cmap='viridis', cnorm='linear', colorbar=True, width=1000, height=500,
               xlim=(180, -180), ylim=(-90, 90), tools=['hover'], clim=(10, np.nanmax(histogram_ra_dec_180_range_transpose)),
               title=f'Spatial Distribution of Objects - All Objects - Plate Carrée Projection', show_grid=True)
)

# Showing the graph.
hv_image_ra_dec

Second, the Mollweide projection plot. In this plot, **the R.A. values are in the original $[0^{\circ},360^{\circ})$ range**. Unfortunately, the bokeh 'hover' tool does not work with this projection.

In [None]:
### Generating the R.A. and DEC ticks
longitudes = np.arange(30, 360, 30)
latitudes = np.arange(-75, 76, 15)

lon_labels = [f"{lon}°" for lon in longitudes]
lat_labels = [f"{lat}°" for lat in latitudes]

labels_data = {
    "lon": list(np.flip(longitudes)) + [-180] * len(latitudes),
    "lat": [0] * len(longitudes) + list(latitudes),
    "label": lon_labels + lat_labels,
}

df_labels = pd.DataFrame(labels_data)

labels_plot = gv.Labels(df_labels, kdims=["lon", "lat"], vdims=["label"]).opts(
    text_font_size="12pt",
    text_color="black",
    text_align='right',
    text_baseline='bottom',
    projection=ccrs.Mollweide()
)

### Creating the image using holoviews.
gv_image_ra_dec = gv.Image((bins_ra_centers_inverted, bins_dec_centers, histogram_ra_dec_inverted_transpose), [f'R.A.', f'DEC'], f'Counts')

### Doing the Mollweide projection.
gv_image_ra_dec_projected = gv.operation.project(gv_image_ra_dec, projection=ccrs.Mollweide())

### Generating the grid lines.
grid = gf.grid().opts(
    opts.Feature(projection=ccrs.Mollweide(), scale='110m', color='black')
)

### Adjusting the image options.
gv_image_ra_dec_projected = gv_image_ra_dec_projected.opts(cmap='viridis', cnorm='linear', colorbar=True, width=1000, height=500, 
                                                           clim=(10, np.nanmax(histogram_ra_dec_inverted_transpose)), 
                                                           title='Spatial Distribution of Objects - All Objects - Mollweide Projection', 
                                                           projection=ccrs.Mollweide(),  global_extent=True)

### Showing the plot.
combined_plot = gv_image_ra_dec_projected * grid * labels_plot
combined_plot

## Second case - Objects with ```detect_isPrimary == True```

The ```detect_isPrimary``` flag is true when:

1) A source is located on the interior of a patch and tract (detect_isPatchInner & detect_isTractInner)

2) A source is not a sky object (~merge_peak_sky for coadds or ~sky_source for single visits)

3) A source is either an isolated parent that is un-modeled or deblended from a parent with multiple children (isDeblendedSource)

Source: https://pipelines.lsst.io/modules/lsst.pipe.tasks/deblending-flags-overview.html

### Reading the data

Below we can see the output Parquet file structure. The line 0 (type "histogram_ra_dec") contains the counts of the 2D histogram in the "values" column, and the line 1 (type "bins_ra_dec") contains the R.A. bin edges ("ra_bins") and DEC bin edges ("dec_bins") in the "values" column.

In [None]:
### Reading the Parquet file and showing the dataframe.
df_spatial_dist_detect_isPrimary_true = pd.read_parquet('output/histo_2d_ra_dec_detect_isPrimary_true.parquet', engine='fastparquet')
df_spatial_dist_detect_isPrimary_true

We have to convert the lists to numpy arrays.

In [None]:
### Generating new dataframes from the original one, containing the counts, the R.A. bin edges and the DEC bin edges, and converting
### these dataframes to numpy arrays.
histogram_ra_dec_detect_isPrimary_true = np.array(df_spatial_dist_detect_isPrimary_true['values'][0])
bins_ra_detect_isPrimary_true = np.array(df_spatial_dist_detect_isPrimary_true['values'][1]['ra_bins'])
bins_dec_detect_isPrimary_true = np.array(df_spatial_dist_detect_isPrimary_true['values'][1]['dec_bins'])

Now, we show some information about the R.A. bins, DEC bins and the 2D histogram counts.

In [None]:
### Printing the information.
print("INFO - R.A. BINS")
print(f"Min. edge: {bins_ra_detect_isPrimary_true.min():.2f} | Max. edge: {bins_ra_detect_isPrimary_true.max():.2f} | Step: {bins_ra_detect_isPrimary_true[1]-bins_ra_detect_isPrimary_true[0]:.2f} | Shape: {bins_ra_detect_isPrimary_true.shape} \n")
print("INFO - DEC BINS")
print(f"Min. edge: {bins_dec_detect_isPrimary_true .min():.2f} | Max. edge: {bins_dec_detect_isPrimary_true .max():.2f} | Step: {bins_dec_detect_isPrimary_true [1]-bins_dec_detect_isPrimary_true [0]:.2f} | Shape: {bins_dec_detect_isPrimary_true .shape} \n") 
print("INFO- 2D HISTOGRAM COUNTS")
print(f"Min. count: {histogram_ra_dec_detect_isPrimary_true.min()} | Max. count: {histogram_ra_dec_detect_isPrimary_true.max()} | Shape: {histogram_ra_dec_detect_isPrimary_true.shape}")

### Spatial distribution plot

Before making the plots, we must perform some tasks:

1. Change the 0 values in the 2D histogram counts array to NaN values, so that they appear white in the plot.
2. Compute the centers of the bins.
3. For the Plate Carrée projection, change the R.A. coordinates so that they belong to the range $[−180^{\circ},180^{\circ})$. This is necessary for inverting the x-axis in the plot, a widely used convention. We must also adjust the 2D histogram counts accordingly, so that they agree with the new R.A. range.
4. For the Mollweide projection, invert the R.A. coordinates by doing $360^{\circ} - x$ for all $x$ in the R.A. values. This is just computational artifice, which is necessary for inverting the x-axis in the plot. However, in the final plot, the R.A. and DEC ticks will be correctly showed in the original range, $[0^{\circ},360^{\circ})$. Again, we must also adjust the 2D histogram counts accordingly.
5. Transpose the 2D histogram counts arrays so that they become compatible with HoloViews/GeoViews.

In [None]:
### Changing the 0 values to NaN values.
histogram_ra_dec_NaN_detect_isPrimary_true = histogram_ra_dec_detect_isPrimary_true.astype(float)
histogram_ra_dec_NaN_detect_isPrimary_true[histogram_ra_dec_NaN_detect_isPrimary_true == 0] = np.nan

### Getting the bins centers.
bins_ra_centers_detect_isPrimary_true = (bins_ra_detect_isPrimary_true[1:] + bins_ra_detect_isPrimary_true[:-1])/2
bins_dec_centers_detect_isPrimary_true = (bins_dec_detect_isPrimary_true[1:] + bins_dec_detect_isPrimary_true[:-1])/2

### Plate Carrée projection - Changing the R.A. coordinates to the range [-180,180), and changing the 2d histogram counts accordingly.
bins_ra_centers_180_range_detect_isPrimary_true = np.where(bins_ra_centers_detect_isPrimary_true >= 180, bins_ra_centers_detect_isPrimary_true - 360, bins_ra_centers_detect_isPrimary_true)
sorted_indices_180_range_detect_isPrimary_true = np.argsort(bins_ra_centers_180_range_detect_isPrimary_true)
histogram_ra_dec_180_range_detect_isPrimary_true = histogram_ra_dec_NaN_detect_isPrimary_true[sorted_indices_180_range_detect_isPrimary_true, :]
bins_ra_centers_180_range_detect_isPrimary_true = bins_ra_centers_180_range_detect_isPrimary_true[sorted_indices_180_range_detect_isPrimary_true]

### Mollweide projection - Inverting the R.A. values (360 - values), and changing the 2d histogram counts accordingly.
bins_ra_centers_inverted_detect_isPrimary_true = np.where(bins_ra_centers_detect_isPrimary_true <= 360, 360 - bins_ra_centers_detect_isPrimary_true, bins_ra_centers_detect_isPrimary_true)
sorted_indices_inverted_detect_isPrimary_true = np.argsort(bins_ra_centers_inverted_detect_isPrimary_true)
histogram_ra_dec_inverted_detect_isPrimary_true = histogram_ra_dec_NaN_detect_isPrimary_true[sorted_indices_inverted_detect_isPrimary_true, :]
bins_ra_centers_inverted_detect_isPrimary_true = bins_ra_centers_inverted_detect_isPrimary_true[sorted_indices_inverted_detect_isPrimary_true]

### Transposing the histogram arrays for the holoviews plots.
histogram_ra_dec_180_range_transpose_detect_isPrimary_true = histogram_ra_dec_180_range_detect_isPrimary_true.T
histogram_ra_dec_inverted_transpose_detect_isPrimary_true = histogram_ra_dec_inverted_detect_isPrimary_true.T

After these tasks, we are ready to make the spatial distribution plots.

First, the Plate Carrée projection plot. In this plot, **the R.A. values are in the $[−180^{\circ},180^{\circ})$ range**, where the negative values corresponds to values greater than $180^{\circ}$ in the original range, $[0^{\circ}, 360^{\circ})$.

In [None]:
### Creating the image using holoviews.
hv_image_ra_dec_detect_isPrimary_true = hv.Image((bins_ra_centers_180_range_detect_isPrimary_true, bins_dec_centers_detect_isPrimary_true, histogram_ra_dec_180_range_transpose_detect_isPrimary_true), [f'R.A.', f'DEC'], f'Counts')

### Adjusting the image options.
hv_image_ra_dec_detect_isPrimary_true = hv_image_ra_dec_detect_isPrimary_true.opts(
    opts.Image(cmap='viridis', cnorm='linear', colorbar=True, width=1000, height=500,
               xlim=(180, -180), ylim=(-90, 90), tools=['hover'], clim=(10, np.nanmax(histogram_ra_dec_180_range_transpose_detect_isPrimary_true)),
               title=f'Spatial Distribution of Objects - detect_isPrimary==True - Plate Carrée Projection', show_grid=True)
)

# Showing the graph.
hv_image_ra_dec_detect_isPrimary_true

Second, the Mollweide projection plot. In this plot, **the R.A. values are in the original $[0^{\circ},360^{\circ})$ range**. Unfortunately, the bokeh 'hover' tool does not work with this projection.

In [None]:
### Generating the R.A. and DEC ticks
longitudes = np.arange(30, 360, 30)
latitudes = np.arange(-75, 76, 15)

lon_labels = [f"{lon}°" for lon in longitudes]
lat_labels = [f"{lat}°" for lat in latitudes]

labels_data = {
    "lon": list(np.flip(longitudes)) + [-180] * len(latitudes),
    "lat": [0] * len(longitudes) + list(latitudes),
    "label": lon_labels + lat_labels,
}

df_labels = pd.DataFrame(labels_data)

labels_plot = gv.Labels(df_labels, kdims=["lon", "lat"], vdims=["label"]).opts(
    text_font_size="12pt",
    text_color="black",
    text_align='right',
    text_baseline='bottom',
    projection=ccrs.Mollweide()
)

### Creating the image using holoviews.
gv_image_ra_dec_detect_isPrimary_true = gv.Image((bins_ra_centers_inverted_detect_isPrimary_true, bins_dec_centers_detect_isPrimary_true, histogram_ra_dec_inverted_transpose_detect_isPrimary_true), [f'R.A.', f'DEC'], f'Counts')

### Doing the Mollweide projection.
gv_image_ra_dec_projected_detect_isPrimary_true = gv.operation.project(gv_image_ra_dec_detect_isPrimary_true, projection=ccrs.Mollweide())

### Generating the grid lines.
grid = gf.grid().opts(
    opts.Feature(projection=ccrs.Mollweide(), scale='110m', color='black')
)

### Adjusting the image options.
gv_image_ra_dec_projected_detect_isPrimary_true = gv_image_ra_dec_projected_detect_isPrimary_true.opts(cmap='viridis', cnorm='linear', colorbar=True, width=1000, height=500, 
                                                           clim=(10, np.nanmax(histogram_ra_dec_inverted_transpose_detect_isPrimary_true)), 
                                                           title='Spatial Distribution of Objects - detect_isPrimary==True - Mollweide Projection', 
                                                           projection=ccrs.Mollweide(),  global_extent=True)

### Showing the plot.
combined_plot = gv_image_ra_dec_projected_detect_isPrimary_true * grid * labels_plot
combined_plot