## Requirements check

In [None]:
# Import the required Python packages
import numpy
import astropy
import matplotlib
import PIL
import yaml

In [None]:
# Check sextractor is installed
from shutil import which
assert which("sex") is not None

If the cell above errors out, refer to Requirements.md.

## Read configuration file

In [None]:
# Read the list of notebooks from a YAML file
with open('../../../config/config.yml', 'r') as f:
    config = yaml.safe_load(f)

# Access the values
data_cube = config['paths']['data_cube']

# Print the values
print(f'Data Cube: {data_cube}')

## Data cube information

This pipeline is meant to be used on 3D data cubes, where the third dimension is the frequency of observation. The two variables in the next code block represent the frequency of the first observation, and the increase between observations.

If they're not known, they can be found by reading the FITS file's header:

```python
from astropy.io import fits
hdulist = fits.open(data_cube)
hdulist.info()
hdulist[0].header
```

In [None]:
initial_frequency = 106000
frequency_step = 100

## Execute sextractor

The code cells in this section run sextractor on each of the data cube's layers.

Source detection parameters are defined in `default.sex`, available in this folder. They have been chosen for best performance on the SDC3 sample data cube. For better adaptation to different data cubes, the parameters can be modified by editing the file.

The output columns for the point source catalog are defined in `default.param`. Additional parameters can be added, although it should be noted that including photometry parameters increases execution time by a factor of 10. All available columns are listed [here](https://sextractor.readthedocs.io/en/latest/Param.html).

In [None]:
from astropy.io import fits
import os, subprocess

hdulist = fits.open(data_cube)

header = hdulist[0].header
data = hdulist[0].data

layers = len(data)
# If the cube contains a single layer this process is unnecessary
assert layers > 1       

if not os.path.exists("results"):
    os.makedirs("results")

In [None]:
# Run sextractor on each layer (may take several minutes)

for layer in range(layers):
    frequency = initial_frequency + frequency_step*layer
    layer_data = data[layer]
    
    # Create FITS file with only one layer
    hdu = fits.PrimaryHDU(layer_data, header=header)
    new_hdulist = fits.HDUList([hdu])
    new_hdulist.writeto("temp.fits", overwrite=True)
    new_hdulist.close()

    # Run sextractor on this file
    command = f"sex temp.fits -CATALOG_NAME results/results{frequency}kHz.cat"
    subprocess.run(command, shell=True, check=True)

os.remove("temp.fits")

## Source count

As a sanity check on results, we graph the number of point sources found for each frequency.

In [None]:
import os
catalogs = os.scandir("results")

num_sources = []

for catalog in catalogs:
    with open(catalog) as f:
        num_sources.append(sum(1 for line in f if not line.strip().startswith('#')))

frequencies = range(initial_frequency, initial_frequency+frequency_step*layers, frequency_step)

In [None]:
sum(num_sources)/len(num_sources)

In [None]:
import matplotlib.pyplot as plt

plt.plot(frequencies, num_sources)
plt.xlabel('Frequency (kHz)')
plt.ylabel('Number of sources')
plt.show()