# MuseX Tutorial, with a Photutils catalog

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import os
import tempfile
from astropy.io import fits
from astropy.table import Table
from mpdaf.sdetect import Source

In [3]:
import musex
musex.__version__

'0.3.dev38+g7ee7411.d20190429'

In [4]:
# TEMP, to test the notebook
#settings_file = 'settings.yaml'
#DATADIR = '../tests/data'

## Create settings file

The settings are specified in a YAML file. MuseX comes with a default settings file (`musex/musex/udf/settings.yaml`) that gives a full example for the UDF, with HST priors.

For this tutorial, we use the `tests/data/settings.yaml` settings file that is also used for the unit tests. Because of this the file contains variables for the paths and db location that we will replace below, to work in a temporary directory, but otherwise you don't need to do this.

The data inside the `tests/` directory is extracted from the HDFS v1.24 dataset, and the catalog was created with [Photutils](http://photutils.readthedocs.io/).

In [5]:
DATADIR = os.path.abspath(os.path.join(os.path.dirname(musex.__file__), '..', 'tests', 'data'))
DATADIR

'/home/simon/dev/musex/tests/data'

In [6]:
tmpdir = tempfile.TemporaryDirectory(prefix='musex.')
tmpdir

<TemporaryDirectory '/tmp/simon/musex.u90wkmiq'>

Let's create a settings file in the temp directory, with correct paths.

In [7]:
settings_file = os.path.join(tmpdir.name, 'settings.yaml')
with open(os.path.join(DATADIR, 'settings.yaml'), 'r') as f:
    out = f.read()

# we replace some variable in the file: path, datadir, and db location
out = out.format(tmpdir=tmpdir.name, datadir=DATADIR, db=os.path.join(tmpdir.name, 'test.db'))

with open(settings_file, 'w') as f:
    f.write(out)

In [8]:
import subprocess
ret = subprocess.run(['pygmentize', settings_file], capture_output=True)
print(ret.stdout.decode('utf8'))

[34;01mworkdir[39;49;00m: [33m'[39;49;00m[33m/tmp/simon/musex.u90wkmiq[39;49;00m[33m'[39;49;00m
[34;01mdb[39;49;00m: [33m'[39;49;00m[33m/tmp/simon/musex.u90wkmiq/test.db[39;49;00m[33m'[39;49;00m
[34;01mshow_banner[39;49;00m: true
[34;01mauthor[39;49;00m: [33m'[39;49;00m[33mJohn[39;49;00m[31m [39;49;00m[33mDoe[39;49;00m[33m'[39;49;00m

[34;01mdatasets[39;49;00m:

  [34;01mtest[39;49;00m:
    [34;01mdescription[39;49;00m: [33m'[39;49;00m[33msmall[39;49;00m[31m [39;49;00m[33mtest[39;49;00m[31m [39;49;00m[33mdataset[39;49;00m[31m [39;49;00m[33mwith[39;49;00m[31m [39;49;00m[33mimages[39;49;00m[33m'[39;49;00m
    [34;01mversion[39;49;00m: [33m'[39;49;00m[33m1.0[39;49;00m[33m'[39;49;00m
    [34;01mprefix[39;49;00m: TEST
    [34;01mimages[39;49;00m:
      [34;01mFAKE[39;49;00m: [33m'[39;49;00m[33m/home/simon/dev/musex/tests/data/image.fits[39;49;00m[33m'[39;49;00m

  [34;01mphotutils_masks[39;49;00m:
    [34;01mde

## Create the MuseX object

This is the main object to manage all the extraction process below. Settings can also be overridden with additional arguments.

In [9]:
mx = musex.MuseX(settings_file=settings_file, author='SCO')

[1;32mINFO[0m Input catalogs loaded
[1;32mINFO[0m User catalogs loaded

  __  __               __  __
  |  \/  |_   _ ___  ___\ \/ /
  | |\/| | | | / __|/ _ \\  /
  | |  | | |_| \__ \  __//  \
  |_|  |_|\__,_|___/\___/_/\_\


The MUse Source EXtractor :) - v0.3.dev38+g7ee7411.d20190429

database       : <Database(sqlite:////tmp/simon/musex.u90wkmiq/test.db)>
settings file  : /tmp/simon/musex.u90wkmiq/settings.yaml
muse_dataset   : hdfs
datasets       :
    - test            : small test dataset with images
    - photutils_masks : provide masks for the photutils catalog
    - origin          : provide masks and sources for the origin catalog
input_catalogs :
    - photutils       : 0 rows
    - origin          : 0 rows
catalogs       :


### Datasets

A MuseX object can contains several *datasets*, where a `DataSet` object gathers all the data from a dataset:

- For MUSE datasets: image, cube, exposure map.
- For other datasets (e.g. HST): images, ...

A MuseX object is tied to a given **MUSE dataset** (`mx.muse_dataset`). The other datasets are typically added to the sources object during the extraction.

In [10]:
# The MUSE dataset
mx.muse_dataset

<MuseDataSet(prefix=MUSE, version=1.24, 1 datacube, 1 expima, 1 white)>

In [11]:
mx.muse_dataset.cube

<Cube(shape=(200, 90, 90), unit='1e-20 erg / (Angstrom cm2 s)', dtype='None')>

In [12]:
# The other (additional) datasets
mx.datasets

{'test': <DataSet(prefix=TEST, version=1.0, 1 images)>,
 'photutils_masks': <DataSet(prefix=PHU, detector=photutils, linked_cat=photutils, 1 detector, 2 masks)>,
 'origin': <DataSet(prefix=ORIG, detector=origin, linked_cat=origin, 1 detector, 1 tables, 2 sources, 2 masks)>}

In [13]:
mx.datasets['test'].images

{'FAKE': <Image(shape=(90, 90), unit='1e-60 erg / (Angstrom cm2 s)', dtype='None')>}

## Input Catalogs, from source detection

We have access to a list of input catalogs, defined in the settings file.

At this point, the catalogs are given as FITS files defined in the settings, and we need to load them in the database.

In [14]:
mx.input_catalogs

{'photutils': <InputCatalog('photutils', 0 rows)>,
 'origin': <InputCatalog('origin', 0 rows)>}

### Photutils

The first catalog used here was created with the Photutils detection code.

In [15]:
photcat = mx.input_catalogs['photutils']

To work with this catalog we need to ingest it in the database:

In [16]:
photcat.ingest_input_catalog()

[1;32mINFO[0m ingesting catalog /home/simon/dev/musex/tests/data/catalog.fits


HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


[1;32mINFO[0m 13 rows inserted


In [17]:
photcat.ingest_input_catalog()

[1;32mINFO[0m ingesting catalog /home/simon/dev/musex/tests/data/catalog.fits


In [18]:
photcat

<InputCatalog('photutils', 13 rows)>

MuseX stores some information about the operations on individual sources, which is useful to know when a source was inserted or updated:

In [19]:
list(photcat.get_log(1))

[OrderedDict([('_id', 1),
              ('catalog', 'photutils'),
              ('id', 1),
              ('date', '2019-05-23T09:47:59.002574'),
              ('msg', 'inserted from input catalog'),
              ('data',
               '{"id": 1, "ra": 338.22216683331436, "dec": -60.56791899034823, "xcentroid": 84.17464353709585, "ycentroid": 37.8735833857062, "source_sum": 17.611382994800806, "source_sum_err": 3.8582196214768953, "area": 79.0, "eccentricity": 0.43393342946114793, "orientation": 0.8580410669352785, "ellipticity": 0.09905506339394576, "elongation": 1.1099457462595834, "version": "1.0"}'),
              ('author', 'SCO')])]

### See available columns, table info

We now have rows in our catalog, we can see some information about it: number of rows, columns, and some metadata.

In [20]:
mx.input_catalogs

{'photutils': <InputCatalog('photutils', 13 rows)>,
 'origin': <InputCatalog('origin', 0 rows)>}

In [21]:
photcat.info()

InputCatalog 'photutils' - 13 rows.

Metadata:
- creation_date : 2019-05-23T09:47:59.027984
- type          : input
- parent_cat    : None
- raname        : ra
- decname       : dec
- zname         : None
- zconfname     : None
- idname        : id
- maxid         : 13
- query         : None
- primary_id    : id
- status        : inserted

Columns:
- id             : INTEGER 
- ra             : FLOAT 
- dec            : FLOAT 
- xcentroid      : FLOAT 
- ycentroid      : FLOAT 
- source_sum     : FLOAT 
- source_sum_err : FLOAT 
- area           : FLOAT 
- eccentricity   : FLOAT 
- orientation    : FLOAT 
- ellipticity    : FLOAT 
- elongation     : FLOAT 
- version        : TEXT 


The catalog metadata is also available with the `.meta` attribute:

In [22]:
photcat.meta

OrderedDict([('id', 1),
             ('name', 'photutils'),
             ('creation_date', '2019-05-23T09:47:59.027984'),
             ('type', 'input'),
             ('parent_cat', None),
             ('raname', 'ra'),
             ('decname', 'dec'),
             ('zname', None),
             ('zconfname', None),
             ('idname', 'id'),
             ('maxid', 13),
             ('query', None),
             ('primary_id', 'id'),
             ('status', 'inserted')])

### Origin

We also have the possibility to ingest a catalog from the ORIGIN detection software. This catalog is associated to a line catalog, which contains the individual line detections, and to sources and masks all defined in the *origin* dataset.

In [23]:
origcat = mx.input_catalogs['origin']
origcat.ingest_input_catalog()

[1;32mINFO[0m ingesting catalog /home/simon/dev/musex/tests/data/cat_origin.fits


HBox(children=(IntProgress(value=0, max=4), HTML(value='')))


[1;32mINFO[0m 4 rows inserted


In [24]:
mx.input_catalogs

{'photutils': <InputCatalog('photutils', 13 rows)>,
 'origin': <InputCatalog('origin', 4 rows)>}

## Intro to `Catalog` objects

In MuseX, `Catalog` objects are wrapping a SQL table, using the [dataset](https://dataset.readthedocs.io/en/latest/) and [SQLAlchemy](http://docs.sqlalchemy.org/en/latest/index.html) packages. SQLAlchemy provides a Pythonic interface to the SQL language, and MuseX also provides some higher-level operations.

For instance, to select in the catalog the sources with an ``ID < 5``:

In [25]:
res = photcat.select(whereclause=photcat.c.id < 5, 
                     columns=[photcat.idname, photcat.raname, photcat.decname])
res

<ResultSet(photutils.id < 5)>, 4 results

The result of a selection can be exported as Astropy Table (or the `mpdaf.sdetect.Catalog` wrapper by default).
This shows the complete catalog:

In [26]:
photcat.select().as_table()

id,ra,dec,xcentroid,ycentroid,source_sum,source_sum_err,area,eccentricity,orientation,ellipticity,elongation,version
int64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,str3
1,338.2221668333144,-60.56791899034823,84.17464353709585,37.8735833857062,17.611382994800806,3.8582196214768953,79.0,0.4339334294611479,0.8580410669352785,0.0990550633939457,1.1099457462595834,1.0
2,338.223717115442,-60.56603060344533,70.39134668054363,71.85060521460757,5.36989851295948,2.3579307859187533,28.0,0.5278369963327637,-0.3707806312431074,0.1506543075387936,1.1773769018622235,1.0
3,338.23122540134506,-60.5659476224396,3.898353412108424,73.12238343040192,7.717203065752983,2.5785737182307544,37.0,0.3748604507716344,-1.504432500343894,0.0729187508921955,1.078654110373142,1.0
4,338.23075897016,-60.56529582206629,8.008507386689562,84.87964684116072,47.56456159427762,3.957137899405799,86.0,0.3618120741766021,1.253981251781708,0.0677489485229716,1.0726724291870011,1.0
5,338.22344426453924,-60.5652939507232,72.78604306409346,85.1304977128141,9.267546605318785,2.891929673071918,43.0,0.3414958290540388,-0.7723162773518457,0.060116710043904,1.063961888339043,1.0
6,338.23006974404103,-60.56947389750513,14.242037102996946,9.62603692085029,65.34346552938223,4.843022140700766,129.0,0.3788811153062252,-1.56822411586026,0.0745546474997507,1.080560831926303,1.0
7,338.2302119381167,-60.5687317201611,12.959857008648024,22.99321070636352,129.69185087271035,7.112266323040305,284.0,0.4017365013359274,0.4063112561507294,0.0842446923471487,1.0919947628401676,1.0
8,338.2289866204975,-60.56824280312122,23.794820877858843,31.83857580458253,3.1225252524018288,2.018170629318752,25.0,0.693823639780682,1.2360735732899375,0.2798550445351391,1.38860932429151,1.0
9,338.2256990939331,-60.56919605095256,52.93434534163357,14.762051380534484,27.56038597226143,4.0427991796868215,88.0,0.5158184032950975,-0.3097023334651164,0.1433020515828835,1.1672725513673243,1.0
10,338.22648935292443,-60.5690199068245,45.93150875569169,17.912207339981585,4.270262666046619,2.0329902244214924,23.0,0.5796710994807805,0.9583476298180438,0.1851494514779149,1.2272189075821636,1.0


It is possible to do any SQL selection and choose which columns to get, using the SQLAlchemy syntax:

In [27]:
photcat.select(whereclause=photcat.c.source_sum > 50,
               columns=[photcat.idname, photcat.raname, photcat.decname, 'source_sum', 'area']).as_table()

id,ra,dec,source_sum,area
int64,float64,float64,float64,float64
6,338.23006974404103,-60.56947389750513,65.34346552938223,129.0
7,338.2302119381167,-60.5687317201611,129.69185087271035,284.0


## User catalogs

Input catalogs are kept immutable, instead users can work on a *"user catalog"*. 

User catalogs must be created with the result of a selection. The mandatory columns are `ID`, `RA` and `DEC` (the other columns from the input catalog can be accessed later with a SQL *join*).

So we can create a user catalog named `my_cat`:

In [28]:
res

<ResultSet(photutils.id < 5)>, 4 results

In [29]:
res = photcat.select(columns=[photcat.idname, photcat.raname, photcat.decname])
mycat = mx.new_catalog_from_resultset('my_cat', res, drop_if_exists=True)

HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


[1;32mINFO[0m 13 rows inserted


And we can look at the metadata which is stored about our new user catalog:

In [30]:
mycat.info()

Catalog 'my_cat' - 13 rows.

Metadata:
- creation_date : 2019-05-23T09:47:59.661568
- type          : user
- parent_cat    : photutils
- raname        : ra
- decname       : dec
- zname         : None
- zconfname     : None
- idname        : id
- maxid         : 13
- query         : None
- primary_id    : id
- status        : None
- version_meta  : None
- CAT3_TS       : None

Columns:
- id  : INTEGER 
- ra  : FLOAT 
- dec : FLOAT 


In [31]:
myori = mx.new_catalog_from_resultset('my_oricat', origcat.select())

HBox(children=(IntProgress(value=0, max=4), HTML(value='')))


[1;32mINFO[0m 4 rows inserted


## User catalogs from scratch

User catalogs must be created with the result of a selection. The mandatory columns are `ID`, `RA` and `DEC` (the other columns from the input catalog can be accessed later with a SQL *join*).

In [32]:
catfile = os.path.join(DATADIR, 'catalog.fits')
tbl = Table.read(catfile)
tbl[:2]

id,ra,dec,xcentroid,ycentroid,source_sum,source_sum_err,area,eccentricity,orientation,ellipticity,elongation
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,pix,pix,Unnamed: 5_level_1,Unnamed: 6_level_1,pix2,Unnamed: 8_level_1,rad,Unnamed: 10_level_1,Unnamed: 11_level_1
int64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64
1,338.2221668333144,-60.56791899034823,84.17464353709585,37.8735833857062,17.611382994800806,3.8582196214768953,79.0,0.4339334294611479,0.8580410669352785,0.0990550633939457,1.1099457462595834
2,338.223717115442,-60.56603060344533,70.39134668054363,71.85060521460757,5.36989851295948,2.3579307859187533,28.0,0.5278369963327637,-0.3707806312431074,0.1506543075387936,1.1773769018622235


In [33]:
mycat2 = mx.new_catalog('my_cat2', drop_if_exists=True, idname='id', raname='ra', decname='dec')

In [34]:
mycat2.insert(tbl)

HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


[1;32mINFO[0m 13 rows inserted


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

In [35]:
mycat2.info()

Catalog 'my_cat2' - 13 rows.

Metadata:
- creation_date : 2019-05-23T09:47:59.948364
- type          : user
- parent_cat    : None
- raname        : ra
- decname       : dec
- zname         : None
- zconfname     : None
- idname        : id
- maxid         : 13
- query         : None
- primary_id    : id
- status        : None
- version_meta  : None
- CAT3_TS       : None

Columns:
- id             : INTEGER 
- ra             : FLOAT 
- dec            : FLOAT 
- xcentroid      : FLOAT 
- ycentroid      : FLOAT 
- source_sum     : FLOAT 
- source_sum_err : FLOAT 
- area           : FLOAT 
- eccentricity   : FLOAT 
- orientation    : FLOAT 
- ellipticity    : FLOAT 
- elongation     : FLOAT 


* * * 

## Restart and use existing catalog (created at previous step)

All of the above is a setup that has to be done only once. Then all the information is stored in the database. Let's restart with a new MuseX object to check that it works.

In [36]:
import musex
mx = musex.MuseX(settings_file=settings_file)

[1;32mINFO[0m Input catalogs loaded
[1;32mINFO[0m User catalogs loaded

  __  __               __  __
  |  \/  |_   _ ___  ___\ \/ /
  | |\/| | | | / __|/ _ \\  /
  | |  | | |_| \__ \  __//  \
  |_|  |_|\__,_|___/\___/_/\_\


The MUse Source EXtractor :) - v0.3.dev38+g7ee7411.d20190429

database       : <Database(sqlite:////tmp/simon/musex.u90wkmiq/test.db)>
settings file  : /tmp/simon/musex.u90wkmiq/settings.yaml
muse_dataset   : hdfs
datasets       :
    - test            : small test dataset with images
    - photutils_masks : provide masks for the photutils catalog
    - origin          : provide masks and sources for the origin catalog
input_catalogs :
    - photutils       : 13 rows
    - origin          : 4 rows
catalogs       :
    - my_cat          : 13 rows
    - my_oricat       : 4 rows
    - my_cat2         : 13 rows


User catalogs are stored in `mx.catalogs`:

In [37]:
photcat = mx.input_catalogs['photutils']
origcat = mx.input_catalogs['origin']

And we can verify that our sources are still here:

In [38]:
mycat = mx.catalogs['my_cat']
mycat.select(limit=2).as_table()

id,ra,dec
int64,float64,float64
1,338.2221668333144,-60.56791899034823
2,338.223717115442,-60.56603060344533


## Masks

To extract spectra from the cube, we need a mask for each source, and either a global sky mask or one per source. The spectra extraction is done with MPDAF Sources ([mpdaf.sdetect.Source](https://mpdaf.readthedocs.io/en/latest/source.html#)), specifically with the [Source.extract_spectra](https://mpdaf.readthedocs.io/en/latest/api/mpdaf.sdetect.Source.html#mpdaf.sdetect.Source.extract_spectra) method. To do this we need to put a sub-cube and the masks in the `Source` object.

There are multiple ways to extract the masks. Some tools (e.g. ORIGIN) provide the masks. Another method is to use a segmentation map. Here we use:

* ORIGIN: the masks computed by ORIGIN are available with the "origin" dataset.
* photutils: an associated segmentation map was computed with photutils' [detect_sources](https://photutils.readthedocs.io/en/stable/api/photutils.segmentation.detect_sources.html) and [deblend_sources](https://photutils.readthedocs.io/en/stable/api/photutils.segmentation.deblend_sources.html) functions.

To help creating the masks from a segmentation map, there is the `MuseX.create_masks_from_segmap` method, which itself uses [create_masks_from_segmap](https://mpdaf.readthedocs.io/en/latest/api/mpdaf.sdetect.create_masks_from_segmap.html) from MPDAF.

This can also be used for an HST segmentation map, and it will take care of aligning the segmap and putting to the MUSE resolution.

In [39]:
maskdir = os.path.join(mx.workdir, 'masks', 'hdfs')
mx.create_masks_from_segmap(photcat, maskdir, skip_existing=True)

[1;32mINFO[0m read catalog /home/simon/dev/musex/tests/data/catalog.fits with 13 sources
[1;32mINFO[0m selected 13 sources in dataset footprint
[1;32mINFO[0m Aligning segmap with reference image
[1;32mINFO[0m computing masks for 13 sources




HBox(children=(IntProgress(value=0, max=13), HTML(value='')))




In [40]:
print('work dir:', mx.workdir)
sorted(os.listdir(maskdir))

work dir: /tmp/simon/musex.u90wkmiq


['mask-sky.fits',
 'mask-source-00001.fits',
 'mask-source-00002.fits',
 'mask-source-00003.fits',
 'mask-source-00004.fits',
 'mask-source-00005.fits',
 'mask-source-00006.fits',
 'mask-source-00007.fits',
 'mask-source-00008.fits',
 'mask-source-00009.fits',
 'mask-source-00010.fits',
 'mask-source-00011.fits',
 'mask-source-00012.fits',
 'mask-source-00013.fits']

Then the masks must be declared in the settings file:

In [41]:
mx.conf['datasets']['photutils_masks']['masks']

{'skymask': '/tmp/simon/musex.u90wkmiq/masks/hdfs/mask-sky.fits',
 'mask_tpl': '/tmp/simon/musex.u90wkmiq/masks/hdfs/mask-source-%05d.fits'}

The ORIGIN input data also includes pre-generated masks, which are defined in the YAML file with ``mask_tpl`` and ``skymask_tpl``:

In [42]:
mx.conf['datasets']['origin']['masks']

{'mask_srctag': 'ORI_MASK_OBJ', 'skymask_srctag': 'ORI_MASK_SKY'}

## Export sources

In [43]:
mx.datasets

{'test': <DataSet(prefix=TEST, version=1.0, 1 images)>,
 'photutils_masks': <DataSet(prefix=PHU, detector=photutils, linked_cat=photutils, 1 detector, 2 masks)>,
 'origin': <DataSet(prefix=ORIG, detector=origin, linked_cat=origin, 1 detector, 1 tables, 2 sources, 2 masks)>}

### Photutils

In [44]:
with mx.use_loglevel('DEBUG'):
    mx.export_sources(mycat.select(limit=2).as_table(), 
                      masks_dataset='photutils_masks', verbose=True)

[1;32mINFO[0m Exporting 2 sources with hdfs dataset, size=5.0
[1;32mINFO[0m using datasets: hdfs, test, photutils_masks
[1;34mDEBUG[0m Creating source 00001 (-60.56792, 338.22217)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('photutils',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG[0m Adding FSF info from the MUSE datacube
[1;34mDEBUG[0m Add dataset test
[1;34mDEBUG[0m Adding image: TEST_FAKE
[1;34mDEBUG[0m Add dataset photutils_masks
[1;34mDEBUG[0m /!\ no value for REFSPEC
[1;34mDEBUG[0m Add mask from dataset photutils_masks
[1;34mDEBUG[0m MASKS: SKY: 46.2%, OBJ: 12.6%
[1;34mDEBUG[0m Extract spectra
[1;32mINFO[0m Source 00001 (-60.56792, 338.22217) done, 5 images, 7 spectra
[1;34mDEBUG[0m IMAGES: MUSE_WHITE, MUSE_EXPMAP, TEST_FAKE, MASK_SKY, MASK_OBJ
[1;34mDEBUG[0m SPECTRA: MUSE_TOT, MUSE_WHITE, MUSE_PSF, MUSE_SKY, MUSE_TOT_SKYSUB, MUSE_WHITE_SKYSUB, MUSE_PSF_SKYSUB
[1;32mINFO[0m FITS written to /tmp/simon/musex.u90wk

In [45]:
sorted(os.listdir(os.path.join(mx.exportdir, mycat.name, 'sources')))

['source-00001.fits', 'source-00002.fits']

In [46]:
def customize_source(src, row, datasets=None, catalogs=None, outdir=None, outname=None):
    src.FOO = 'bar'

In [47]:
with mx.use_loglevel('DEBUG'):
    mx.export_sources(mycat.select(limit=1).as_table(), 
                      masks_dataset='photutils_masks', verbose=True, user_func=customize_source)

[1;32mINFO[0m Exporting 1 sources with hdfs dataset, size=5.0
[1;32mINFO[0m using datasets: hdfs, test, photutils_masks
[1;34mDEBUG[0m Creating source 00001 (-60.56792, 338.22217)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('photutils',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG[0m Adding FSF info from the MUSE datacube
[1;34mDEBUG[0m Add dataset test
[1;34mDEBUG[0m Adding image: TEST_FAKE
[1;34mDEBUG[0m Add dataset photutils_masks
[1;34mDEBUG[0m /!\ no value for REFSPEC
[1;34mDEBUG[0m Add mask from dataset photutils_masks
[1;34mDEBUG[0m MASKS: SKY: 46.2%, OBJ: 12.6%
[1;34mDEBUG[0m Extract spectra
[1;34mDEBUG[0m Calling user function
[1;32mINFO[0m Source 00001 (-60.56792, 338.22217) done, 5 images, 7 spectra
[1;34mDEBUG[0m IMAGES: MUSE_WHITE, MUSE_EXPMAP, TEST_FAKE, MASK_SKY, MASK_OBJ
[1;34mDEBUG[0m SPECTRA: MUSE_TOT, MUSE_WHITE, MUSE_PSF, MUSE_SKY, MUSE_TOT_SKYSUB, MUSE_WHITE_SKYSUB, MUSE_PSF_SKYSUB
[1;32mINFO[0m

In [48]:
fits.getval(f'{mx.exportdir}/my_cat/sources/source-00001.fits', 'FOO')

'bar'

### Origin

In [49]:
myori = mx.catalogs['my_oricat']

In [50]:
myori.select(limit=2).as_table()

ID,ra,dec,x,y,n_lines,seg_label,comp,line_merged_flag,flux,STD,nsigSTD,T_GLR,nsigTGLR,purity,version
int64,float64,float64,float64,float64,int64,int64,int64,bool,float64,str1,str1,float64,float64,float64,str3
1,338.2300860917844,-60.566344854991016,14.0,66.0,1,0,0,False,26.91065275337527,--,--,6.553810779966584,5.687764534465783,0.0,1.0
2,338.2235460862445,-60.56924185540205,72.0,14.0,1,0,0,False,24.247310329287533,--,--,6.5893421075319765,5.7186006131341935,0.0,1.0


In [51]:
with mx.use_loglevel('DEBUG'):
    mx.export_sources(myori.select(limit=2).as_table(), verbose=True)
                      #datasets={'origin':['ORI_MAXMAP', 'ORI_CORR_1']})

[1;32mINFO[0m Exporting 2 sources with hdfs dataset, size=5.0
[1;32mINFO[0m using datasets: hdfs, test, origin
[1;34mDEBUG[0m Creating source 00001 (-60.56634, 338.23009)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('origin',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG[0m Adding FSF info from the MUSE datacube
[1;34mDEBUG[0m Add dataset test
[1;34mDEBUG[0m Adding image: TEST_FAKE
[1;34mDEBUG[0m Add dataset origin
[1;34mDEBUG[0m Adding source images: ORI_MAXMAP
[1;34mDEBUG[0m Adding source images: ORI_MASK_OBJ
[1;34mDEBUG[0m Adding source images: ORI_MASK_SKY
[1;34mDEBUG[0m Adding source images: ORI_SEGMAP_LABEL
[1;34mDEBUG[0m Adding source images: ORI_SEGMAP_MERGED
[1;34mDEBUG[0m Adding source images: NB_LINE_1
[1;34mDEBUG[0m Adding source images: ORI_CORR_1
[1;34mDEBUG[0m Adding source spectra: MUSE_SKY
[1;34mDEBUG[0m Adding source spectra: MUSE_TOT_SKYSUB
[1;34mDEBUG[0m Adding source spectra: MUSE_WHITE_SKYSUB


In [52]:
os.listdir(os.path.join(mx.exportdir, myori.name, 'sources'))

['source-00002.fits', 'source-00001.fits']

In [53]:
src = Source.from_file(os.path.join(mx.exportdir, myori.name, 'sources', 'source-00001.fits'))
src.info()

[1;32mINFO[0m ID      =                    1 / object ID %d                                   
[1;32mINFO[0m RA      =    338.2300860917844 / RA u.degree %.7f                               
[1;32mINFO[0m DEC     =   -60.56634485499102 / DEC u.degree %.7f                              
[1;32mINFO[0m FROM    = 'MuseX   '           / detection software                             
[1;32mINFO[0m FROM_V  = '0.3.dev38+g7ee7411.d20190429' / version of the detection software    
[1;32mINFO[0m CUBE    = 'cube.fits'          / datacube                                       
[1;32mINFO[0m CUBE_V  = '1.24    '           / version of the datacube                        
[1;32mINFO[0m SRC_V   = '' / Source Version                                                   
[1;32mINFO[0m SIZE    =                    5                                                  
[1;32mINFO[0m CATALOG = 'origin  '                                                            
[1;32mINFO[0m EXPMEAN =     

## MarZ

### Export with Photutils

First, we need to set the `REFSPEC` column to tell which spectrum must be used : (this may change later to allow more flexibility, currently it is possible to set a `refspec` value for each source).

In [54]:
mycat.update_column('refspec', 'MUSE_PSF_SKYSUB')

[1;32mINFO[0m creating column 'my_cat.refspec'


In [55]:
with mx.use_loglevel('DEBUG'):
    mx.export_marz(mycat.select(limit=2), masks_dataset='photutils_masks', verbose=True)

[1;32mINFO[0m Exporting 2 sources with hdfs dataset, size=5.0
[1;32mINFO[0m using datasets: hdfs
[1;34mDEBUG[0m Creating source 00001 (-60.56792, 338.22217)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('photutils',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG[0m Adding FSF info from the MUSE datacube
[1;34mDEBUG[0m Add REFSPEC='MUSE_PSF_SKYSUB'
[1;34mDEBUG[0m Add mask from dataset photutils_masks
[1;34mDEBUG[0m MASKS: SKY: 46.2%, OBJ: 12.6%
[1;34mDEBUG[0m Extract spectra
[1;32mINFO[0m Source 00001 (-60.56792, 338.22217) done, 4 images, 7 spectra
[1;34mDEBUG[0m IMAGES: MUSE_WHITE, MUSE_EXPMAP, MASK_SKY, MASK_OBJ
[1;34mDEBUG[0m SPECTRA: MUSE_TOT, MUSE_WHITE, MUSE_PSF, MUSE_SKY, MUSE_TOT_SKYSUB, MUSE_WHITE_SKYSUB, MUSE_PSF_SKYSUB
[1;34mDEBUG[0m Creating source 00002 (-60.56603, 338.22372)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('photutils',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG

In [56]:
with mx.use_loglevel('DEBUG'):
    mx.export_marz(mycat.select(limit=2), masks_dataset='photutils_masks', verbose=True)

[1;32mINFO[0m Exporting 2 sources with hdfs dataset, size=5.0
[1;32mINFO[0m using datasets: hdfs
[1;34mDEBUG[0m Creating source 00001 (-60.56792, 338.22217)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('photutils',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG[0m Adding FSF info from the MUSE datacube
[1;34mDEBUG[0m Add REFSPEC='MUSE_PSF_SKYSUB'
[1;34mDEBUG[0m Add mask from dataset photutils_masks
[1;34mDEBUG[0m MASKS: SKY: 46.2%, OBJ: 12.6%
[1;34mDEBUG[0m Extract spectra
[1;32mINFO[0m Source 00001 (-60.56792, 338.22217) done, 4 images, 7 spectra
[1;34mDEBUG[0m IMAGES: MUSE_WHITE, MUSE_EXPMAP, MASK_SKY, MASK_OBJ
[1;34mDEBUG[0m SPECTRA: MUSE_TOT, MUSE_WHITE, MUSE_PSF, MUSE_SKY, MUSE_TOT_SKYSUB, MUSE_WHITE_SKYSUB, MUSE_PSF_SKYSUB
[1;34mDEBUG[0m Creating source 00002 (-60.56603, 338.22372)
[1;34mDEBUG[0m Add extra header {'SRC_V': ('', 'Source Version'), 'CATALOG': ('photutils',)}
[1;34mDEBUG[0m Add dataset hdfs
[1;34mDEBUG

Once the spectra have been processed in MarZ, the results file can be ingested into MuseX:

### Import the MarZ file

In [57]:
mx.import_marz(os.path.join(DATADIR, 'marz-my-cat-hdfs_full_SCO.mz'), mycat)

[1;32mINFO[0m ingesting catalog


HBox(children=(IntProgress(value=0, max=13), HTML(value='')))


[1;32mINFO[0m 13 rows inserted


In [58]:
mx.marzcat.info()

MarzCatalog 'marz' - 13 rows.

Metadata:
- creation_date : 2019-05-23T09:48:09.985515
- type          : marz
- parent_cat    : None
- raname        : None
- decname       : None
- zname         : None
- zconfname     : None
- idname        : ID
- maxid         : 13
- query         : None
- primary_id    : _id
- status        : inserted
- version_meta  : None
- CAT3_TS       : None

Columns:
- _id      : INTEGER 
- ID       : INTEGER 
- RA       : FLOAT 
- DEC      : FLOAT 
- Mag      : FLOAT 
- Type     : INTEGER 
- AutoTID  : INTEGER 
- AutoTN   : TEXT 
- AutoZ    : FLOAT 
- AutoXCor : FLOAT 
- FinTID   : INTEGER 
- FinTN    : TEXT 
- FinZ     : FLOAT 
- QOP      : INTEGER 
- Blend    : INTEGER 
- Defect   : INTEGER 
- Revisit  : INTEGER 
- HstIds   : TEXT 
- Comment  : TEXT 
- catalog  : TEXT 
- version  : TEXT 


In [59]:
mx.marzcat.select(limit=3).as_table()

_id,ID,RA,DEC,Mag,Type,AutoTID,AutoTN,AutoZ,AutoXCor,FinTID,FinTN,FinZ,QOP,Blend,Defect,Revisit,HstIds,Comment,catalog,version
int64,int64,float64,float64,float64,int64,int64,str55,float64,float64,int64,str55,float64,int64,int64,int64,int64,str5,str1,str6,str1
1,1,338.222167,-60.567919,-99.0,0,11,MUSE-35 HDF-S z-bin 1 - Ha emitter,-0.00258,9.79608,11,MUSE-35 HDF-S z-bin 1 - Ha emitter,-0.00258,2,0,0,0,--,--,my_cat,1
2,2,338.223717,-60.566031,-99.0,2,14,MUSE-38 HDF-S z-bin 3 - [O II] emitter - weak continuum,0.31325,11.90192,14,MUSE-38 HDF-S z-bin 3 - [O II] emitter - weak continuum,0.31325,2,0,0,0,24348,--,my_cat,1
3,3,338.231225,-60.565948,-99.0,6,18,MUSE-42 HDF-S z-bin 5 - Ly-a emitter - weak continuum,3.0343,25.8933,18,MUSE-42 HDF-S z-bin 5 - Ly-a emitter - weak continuum,3.0343,3,0,0,0,24353,--,my_cat,1


### Origin

In the Origin case we have sources containing the spectra that we want to use instead of extracting spectra from the MUSE cube.

In [60]:
myori

<Catalog('my_oricat', 4 rows)>

In [61]:
mx.datasets['origin'].get_source(1).info()

[1;32mINFO[0m ID      =                    1 / object ID %d                                   
[1;32mINFO[0m RA      =    338.2300860917844 / RA u.degree %.7f                               
[1;32mINFO[0m DEC     =   -60.56634485499102 / DEC u.degree %.7f                              
[1;32mINFO[0m FROM    = 'ORIGIN  '           / detection software                             
[1;32mINFO[0m FROM_V  = '3.2.dev128+g554d26a' / version of the detection software             
[1;32mINFO[0m CUBE    = 'cube.fits'          / datacube                                       
[1;32mINFO[0m CUBE_V  = '' / version of the datacube                                          
[1;32mINFO[0m SRC_V   = '0.1     '           / Source version                                 
[1;32mINFO[0m SRC_TS  = '2019-04-12T18:06:01.927833'                                          
[1;32mINFO[0m CAT3_TS = '2019-04-12T18:05:59.435812'                                          
[1;32mINFO[0m OR_X    =     

In [62]:
#myori.update_column('refspec', 'ORI_SPEC_1')
#myori.select(limit=2, columns=['ID', 'refspec']).as_table()

In [63]:
mx.export_marz(myori.select(limit=2), sources_dataset='origin', skyspec='MUSE_SKY')

[1;32mINFO[0m Writing /tmp/simon/musex.u90wkmiq/export/hdfs/my_oricat/marz/marz-my_oricat-hdfs.fits


## Joining catalogs

When the user catalog is ready to be exported, one need to gather the data from multiple catalogs (the input catalog, MarZ, etc.). This can be done with a SQL *join*.

By default `.join` renames the columns with the format `{catname}_{colname}` to avoid name conflicts. But it is also possible to specify manually the column names (see below).

In [64]:
res = mycat.join([photcat, mx.marzcat], whereclause=(mx.marzcat.c.catalog == mycat.name))
res.as_table()[:2]

my_cat_id,my_cat_ra,my_cat_dec,my_cat_refspec,photutils_id,photutils_ra,photutils_dec,photutils_xcentroid,photutils_ycentroid,photutils_source_sum,photutils_source_sum_err,photutils_area,photutils_eccentricity,photutils_orientation,photutils_ellipticity,photutils_elongation,photutils_version,marz__id,marz_ID,marz_RA,marz_DEC,marz_Mag,marz_Type,marz_AutoTID,marz_AutoTN,marz_AutoZ,marz_AutoXCor,marz_FinTID,marz_FinTN,marz_FinZ,marz_QOP,marz_Blend,marz_Defect,marz_Revisit,marz_HstIds,marz_Comment,marz_catalog,marz_version
int64,float64,float64,str15,int64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,str3,int64,int64,float64,float64,float64,int64,int64,str57,float64,float64,int64,str57,float64,int64,int64,int64,int64,str5,str1,str6,str1
1,338.2221668333144,-60.56791899034823,MUSE_PSF_SKYSUB,1,338.2221668333144,-60.56791899034823,84.17464353709585,37.8735833857062,17.611382994800806,3.8582196214768953,79.0,0.4339334294611479,0.8580410669352785,0.0990550633939457,1.1099457462595834,1.0,1,1,338.222167,-60.567919,-99.0,0,11,MUSE-35 HDF-S z-bin 1 - Ha emitter,-0.00258,9.79608,11,MUSE-35 HDF-S z-bin 1 - Ha emitter,-0.00258,2,0,0,0,--,--,my_cat,1
2,338.223717115442,-60.56603060344533,MUSE_PSF_SKYSUB,2,338.223717115442,-60.56603060344533,70.39134668054363,71.85060521460757,5.36989851295948,2.3579307859187533,28.0,0.5278369963327637,-0.3707806312431074,0.1506543075387936,1.1773769018622235,1.0,2,2,338.223717,-60.566031,-99.0,2,14,MUSE-38 HDF-S z-bin 3 - [O II] emitter - weak continuum,0.31325,11.90192,14,MUSE-38 HDF-S z-bin 3 - [O II] emitter - weak continuum,0.31325,2,0,0,0,24348,--,my_cat,1


## Select columns for the final catalog

To avoid column name conflicts, for now the user needs to select explicitely the columns and rename if needed. This must be done with SQLAlchemy columns objects:

In [65]:
list(photcat.c)

[Column('id', INTEGER(), table=<photutils>, primary_key=True, nullable=False),
 Column('ra', FLOAT(), table=<photutils>),
 Column('dec', FLOAT(), table=<photutils>),
 Column('xcentroid', FLOAT(), table=<photutils>),
 Column('ycentroid', FLOAT(), table=<photutils>),
 Column('source_sum', FLOAT(), table=<photutils>),
 Column('source_sum_err', FLOAT(), table=<photutils>),
 Column('area', FLOAT(), table=<photutils>),
 Column('eccentricity', FLOAT(), table=<photutils>),
 Column('orientation', FLOAT(), table=<photutils>),
 Column('ellipticity', FLOAT(), table=<photutils>),
 Column('elongation', FLOAT(), table=<photutils>),
 Column('version', TEXT(), table=<photutils>)]

In [66]:
photcols = [photcat.c[name] for name in ['source_sum', 'source_sum_err', 'area', 'eccentricity', 
                                         'orientation', 'ellipticity', 'elongation', 'version']]

marzcols = [mx.marzcat.c[name] for name in ('FinZ', 'QOP', 'Type', 'Blend', 'Defect', 'Revisit', 'Comment')]

It is possible to rename a column with `.label`:

In [67]:
photcols[2] = photcols[2].label('pixel_area')

Then we concatenate the column names from the user catalog, MarZ, and the input catalog:

In [68]:
mycat.c + marzcols + photcols

[Column('id', INTEGER(), table=<my_cat>, primary_key=True, nullable=False),
 Column('ra', FLOAT(), table=<my_cat>),
 Column('dec', FLOAT(), table=<my_cat>),
 Column('refspec', TEXT(), table=<my_cat>),
 Column('FinZ', FLOAT(), table=<marz>),
 Column('QOP', INTEGER(), table=<marz>),
 Column('Type', INTEGER(), table=<marz>),
 Column('Blend', INTEGER(), table=<marz>),
 Column('Defect', INTEGER(), table=<marz>),
 Column('Revisit', INTEGER(), table=<marz>),
 Column('Comment', TEXT(), table=<marz>),
 Column('source_sum', FLOAT(), table=<photutils>),
 Column('source_sum_err', FLOAT(), table=<photutils>),
 <sqlalchemy.sql.elements.Label object at 0x7fcdf0129278>,
 Column('eccentricity', FLOAT(), table=<photutils>),
 Column('orientation', FLOAT(), table=<photutils>),
 Column('ellipticity', FLOAT(), table=<photutils>),
 Column('elongation', FLOAT(), table=<photutils>),
 Column('version', TEXT(), table=<photutils>)]

And we can proceed with the join. Note that by default the join uses the `id` column of each catalog, but it is possible to specify others keys with the `keys` arguments. Also, the `mx.marzcat` table can store the results for multiple catalogs, so we need to select the results for our catalog with the `whereclause`:

In [69]:
res = mycat.join(
    [mx.marzcat, photcat], 
    whereclause=(mx.marzcat.c.catalog == mycat.name), 
    columns=mycat.c + marzcols + photcols, 
    use_labels=False,
    limit=2
).as_table()
res

id,ra,dec,refspec,FinZ,QOP,Type,Blend,Defect,Revisit,Comment,source_sum,source_sum_err,pixel_area,eccentricity,orientation,ellipticity,elongation,version
int64,float64,float64,str15,float64,int64,int64,int64,int64,int64,str1,float64,float64,float64,float64,float64,float64,float64,str3
1,338.2221668333144,-60.56791899034823,MUSE_PSF_SKYSUB,-0.00258,2,0,0,0,0,--,17.611382994800806,3.8582196214768953,79.0,0.4339334294611479,0.8580410669352785,0.0990550633939457,1.1099457462595834,1.0
2,338.223717115442,-60.56603060344533,MUSE_PSF_SKYSUB,0.31325,2,2,0,0,0,--,5.36989851295948,2.3579307859187533,28.0,0.5278369963327637,-0.3707806312431074,0.1506543075387936,1.1773769018622235,1.0


## Export sources

In [70]:
# Set the REFSPEC column to tell which spectrum must be used 
# mycat.update_column('refspec', 'MUSE_PSF_SKYSUB')

Currently, to get a redshift and confidence number in the `Source` objects, MuseX looks at specific columns (`Z` and `Confid`) in the table. So here we rename the columns (we could have done this within the SQL join above, with `.label`):

In [71]:
res.rename_column('FinZ', 'Z')
#res.rename_column('QOP', 'Confid')

In [72]:
res

id,ra,dec,refspec,Z,QOP,Type,Blend,Defect,Revisit,Comment,source_sum,source_sum_err,pixel_area,eccentricity,orientation,ellipticity,elongation,version
int64,float64,float64,str15,float64,int64,int64,int64,int64,int64,str1,float64,float64,float64,float64,float64,float64,float64,str3
1,338.2221668333144,-60.56791899034823,MUSE_PSF_SKYSUB,-0.00258,2,0,0,0,0,--,17.611382994800806,3.8582196214768953,79.0,0.4339334294611479,0.8580410669352785,0.0990550633939457,1.1099457462595834,1.0
2,338.223717115442,-60.56603060344533,MUSE_PSF_SKYSUB,0.31325,2,2,0,0,0,--,5.36989851295948,2.3579307859187533,28.0,0.5278369963327637,-0.3707806312431074,0.1506543075387936,1.1773769018622235,1.0


And we do the export, here for only one source selected by its ID. We also ask for the creation of a PDF file with plots:

In [73]:
mx.set_loglevel('INFO')

In [74]:
mx.export_sources(res, size=5, srcvers='0.1', apertures=[0.4], masks_dataset='photutils_masks')

[1;32mINFO[0m Exporting 2 sources with hdfs dataset, size=5.0
[1;32mINFO[0m using datasets: hdfs, test, photutils_masks


HBox(children=(IntProgress(value=0, max=2), HTML(value='')))




['/tmp/simon/musex.u90wkmiq/export/hdfs/my_cat/sources/source-00001.fits',
 '/tmp/simon/musex.u90wkmiq/export/hdfs/my_cat/sources/source-00002.fits']

The output files are placed in the *work directory*:

In [75]:
outdir = os.path.join(mx.exportdir, mycat.name, 'sources')
print(outdir)
sorted(os.listdir(outdir))

/tmp/simon/musex.u90wkmiq/export/hdfs/my_cat/sources


['source-00001.fits', 'source-00002.fits']

A MPDAF source file has been generated, with all the information both in the header and in the extensions (images, spectra):

In [76]:
from mpdaf.sdetect import Source
Source.from_file(f'{outdir}/source-00001.fits').info()

[1;32mINFO[0m ID      =                    1 / object ID %d                                   
[1;32mINFO[0m RA      =    338.2221668333144 / RA u.degree %.7f                               
[1;32mINFO[0m DEC     =   -60.56791899034823 / DEC u.degree %.7f                              
[1;32mINFO[0m FROM    = 'MuseX   '           / detection software                             
[1;32mINFO[0m FROM_V  = '0.3.dev38+g7ee7411.d20190429' / version of the detection software    
[1;32mINFO[0m CUBE    = 'cube.fits'          / datacube                                       
[1;32mINFO[0m CUBE_V  = '1.24    '           / version of the datacube                        
[1;32mINFO[0m SRC_V   = '0.1     '           / Source Version                                 
[1;32mINFO[0m SIZE    =                    5                                                  
[1;32mINFO[0m CATALOG = 'photutils'                                                           
[1;32mINFO[0m EXPMEAN =     

In [77]:
# Cleanup temp directory
tmpdir.cleanup()