
<img align="left" src = images/linea.png width=150 style="padding: 20px"> <br> 
 
LOGO DO EVENTO | LOGO DA UFRJ | LOGO DO LINEA 

# Título (nome do evento? ) 
# Subtítulo (nome da oficina?) 


Mediadores (?):  
- Bruno Moraes (email)
- Julia Gschwend (julia@linea.org.br)

Última verificação: 11/02/2025

***


## Introdução 

### Sobre a oficina

### Sobre o LIneA 

## Dados   

#### Sobre os dados

#### Acesso ao banco de dados

Dentro da plataforma LIneA JupyterHub, o acesso ao banco de dados é feito através da biblioteca [dblinea](https://github.com/linea-it/dblinea). Confira a documentação completa da biblioteca [neste link](https://dblinea.readthedocs.io/en/latest/index.html).


Instalação da biblioteca `dblinea`: 

In [None]:
! pip install dblinea

A classe `DBBase` faz a conexão com o banco de dados e oferece algumas funcionalidades como veremos a seguir. Nos exemplos abaixo, vamos utilizar o objeto `db` para acessar os dados e metadados da tabela  "**public_pz_training_set**" do segundo _release_ (**DR2**) do levantamento **DES**. 

In [None]:
from dblinea import DBBase

In [None]:
db = DBBase()
schema = 'des_dr2'  
tablename = 'public_pz_training_set' 

In [None]:
db.get_table_columns(tablename, schema=schema)

In [None]:
db.describe_table(tablename, schema=schema)

A tabela abaixo traz os significados das colunas que vamos utilizar para fazer a consulta no banco de dados. 

|Coluna | Significado |
|---|---|
|COADD_OBJECT_ID | Unique identifier for the coadded objects|
|RA | Right ascension, with quantized precision for indexing (ALPHAWIN_J2000 has full precision but not indexed) [degrees]|
|DEC | Declination, with quantized precision for indexing (DELTAWIN_J2000 has full precision but not indexed) [degrees] |
|EXTENDED_CLASS_COADD |0: high confidence stars; 1: candidate stars; 2: mostly galaxies; 3: high confidence galaxies; -9: No data; Using Sextractor photometry |
|FLAGS_{G,R,I,Z,Y}| Additive flag describing cautionary advice about source extraction process. Use less than 4 for well behaved objects |
|MAG_AUTO_{G,R,I,Z,Y} | Magnitude estimation, for an elliptical model based on the Kron radius [mag] |
|MAG_AUTO_{G,R,I,Z,Y}_DERED | Dereddened magnitude estimation (using SFD98), for an elliptical model based on the Kron radius [mag]|



### Leitura dos dados

In [None]:
import pandas as pd

In [None]:
query = f"SELECT * FROM {schema}.{tablename} WHERE survey = 'VVDS' AND magerr_auto_i <= 0.1 AND z <= 2.0 "   
query

In [None]:
df = db.fetchall_df(query)
df

In [None]:
df.info()

In [None]:
# Shuffle the DataFrame with fixed random_state for reproducibility
df_shuffled = df.sample(frac=1, random_state=42) 

# Determine the split point (for 70/30 split)
split_point = int(len(df)*0.7)

# Split into two subsets
train = df_shuffled.iloc[:split_point]
test  = df_shuffled.iloc[split_point:]

# Convert subsets to sets of indices
train_indices = set(train.index)
test_indices = set(test.index)

# Find the intersection of the index sets
intersection_indices = train_indices.intersection(test_indices)

# Assert that the intersection is empty
assert not intersection_indices, "There are common elements in both subsets!"

print("Assertion passed! No common elements found in subsets.")
print("\nIntersection of Indices:", intersection_indices) # Should be an empty set

print(len(train))
print(len(test))

In [None]:
train.info()

### Caracterização da amostra 

Nos exemplos abaixo, também vamos usar a biblioteca [Astropy](https://docs.astropy.org/en/stable/). As demais bibliotecas utilizadas já estão disponíveis na instalação original do JupyterHub. 

Instalação da biblioteca `astropy`: 

In [None]:
! pip install astropy

In [None]:
import numpy as np
from astropy import units as u
from astropy.coordinates import SkyCoord

In [None]:
bibliotecas de visualização 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

--- 

#### Distribuição espacial 


In [None]:
! wget https://raw.githubusercontent.com/kadrlica/skymap/master/skymap/data/des-round19-poly.txt  

In [None]:
foot_ra, foot_dec = np.loadtxt('des-round19-poly.txt', unpack=True)


In [None]:
coords = SkyCoord(ra=-np.array(df.ra)*u.degree, 
                  dec=np.array(df.dec)*u.degree, 
                  frame='icrs')

In [None]:
%%time
fig = plt.figure(figsize=[14,6])
ax = fig.add_subplot(111, projection='mollweide')   
ra_rad = coords.ra.wrap_at(180 * u.deg).radian
dec_rad = coords.dec.radian
plt.plot(ra_rad, dec_rad, '.', alpha=0.3)
plt.plot(-np.radians(foot_ra), np.radians(foot_dec), '-', color='darkorange')
org=0.0
tick_labels = np.array([150, 120, 90, 60, 30, 0, 330, 300, 270, 240, 210])
tick_labels = np.remainder(tick_labels+360+org,360)
ax.set_xticklabels(tick_labels)     # we add the scale on the x axis
ax.set_xlabel('R.A.')
ax.xaxis.label.set_fontsize(14)
ax.set_ylabel('Dec.')
ax.yaxis.label.set_fontsize(14)
ax.grid(True)
plt.tight_layout()
#plt.savefig('specz_spatial_dist.png')

#### Distribuição de redshifts

In [None]:
fig = plt.figure()
sns.histplot(df['mag_auto_i_dered'], stat='density', bins=30)
sns.kdeplot(df['mag_auto_i_dered'], fill=False, color='red')
plt.tight_layout()

In [None]:
fig = plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
sns.histplot(train['z'], stat='density', bins=30, label='training set')
sns.histplot(test['z'], stat='density', bins=30, label='test set')
plt.legend()
plt.subplot(1,2,2)
sns.histplot(train['z'], stat='count', bins=30, label='training set')
sns.histplot(test['z'], stat='count', bins=30, label='test set')
plt.tight_layout()

#### Distribuição de magnitudes

In [None]:
fig = plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
sns.histplot(train['mag_auto_i_dered'], stat='density', bins=30, label='training set')
sns.histplot(test['mag_auto_i_dered'], stat='density', bins=30, label='test set')
plt.legend()
plt.subplot(1,2,2)
sns.histplot(train['mag_auto_i_dered'], stat='count', bins=30, label='training set')
sns.histplot(test['mag_auto_i_dered'], stat='count', bins=30, label='test set')
plt.tight_layout()

#### Diagramas cor-cor, cor-mag, z-mag

In [None]:
fig = plt.figure()
rmi = train[]
sns.scatterplot(data=train, x='mag_auto_i_dered', y=')
sns.scatterplot(data=test, x='z', y='mag_auto_i_dered')
plt.tight_layout()

In [None]:
fig = plt.figure()
sns.scatterplot(data=train, x='z', y='mag_auto_i_dered')
sns.scatterplot(data=test, x='z', y='mag_auto_i_dered')
plt.tight_layout()

## Photo-z