Skip to content

A Star Cluster Finder for Gaia Catalog based on Friend of Friend and isochrone fitting


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



23 Commits

Repository files navigation

GAIA SHiP (盖亚方舟): Star cluster Hunting Pipeline

Identifying star clusters in Gaia archive using friend of friend (FoF) method and isochrone fitting.


There are 3 directories in this repo.

  • data/:

    • K13.dat: Kharchenko et al. (2013)
    • CG18.txt and CG18b.txt: these two catalogues together compose of the CG18 catalog.
    • Bica.txt: Bica et al. (2019)
    • cal_all.txt: full version of Tab. 1 in the paper.
    • cat_new.txt: full version of Tab. 3 in the paper.
    • fit_iso.txt: isochrone fitting result of 2443 star cluster candidates.
    • group/: cluter members of 56 cluster groups.

    Note that cat_all.txt, cat_new.txt, fit_iso.txt and group/sc_groupXXXX.txt can be loaded with pandas, e.g.:

      df = pd.read_csv("cat_all.txt", delim_whitespace = True, header = 0)

    The first line is the name of the corresponding column. The unit of each quantity is explained in the table header of Tab. 1 and 3 in the paper. The $x$, $y$, $z$ coordinates in group/sc_groupXXXX.txt take the unit of pc.

    • fof/npy: star members of 2443 star cluster candidates, in .npy format, can be loaded with arr = np.load(). Note! These files might be demaged if you download them individually from this repo via web browser. I strongly recommend you down the whole repo via git clone if you want to use the .npy format.
    • fof/csv: star members of 2443 star cluster candidates, in .csv format, can be loaded with df = pd.read_csv(). Safe for individual downling with web browser.
    • isochrone/0-10.dat: .dat files that contain the isochrone tables downloaded from Padova group web interface.
    • isochrone/Z_python3.npy.tar.gz: the Z.npy data file. Provided in compressed format. Please uncompress it before loading: tar zxvf Z_python3.npy.tar.gz.
    • fit_iso/: isochrone fitting results of 2443 cluster candidates. For Python 3: fit = np.load(name_fit_iso, allow_pickle=True, encoding='latin1').flat[0]. For Python 2: fit = np.load(name_fit_iso).flat[0].
  • figure/:

    • 4panel/: 4-panels of 2443 star cluster candidates.
  • src/: the GAIA SHiP pipeline, see below.


  1. If you make use of SHiP in your work, we require that you quote the pipeline link and reference the following paper:
  • Liu, Lei & Pang, Xiaoying, "A catalog of newly identified star clusters in GAIA DR2", 2019, ApJS, 245, 32, arXiv:1910.12600
  1. When citing star clusters in this catalog, we strongly recommend taking the format LP + FoF ID, e.g. LP0005 (the first newly identified star clusters in Table. 3 of the paper), such that different studies concerning this catalog can be compared directly.

  2. Most of the programs are developed with Python 2.7.13 (provided by conda 4.5.8). If you use Python 3, please pay attention to the following suggestions:

    • Change print " " to print(" ").
    • Pay attention to the difference between / (float division) and // (integer division).
    • If failed with np.load(), try this: np.load(xxx.npy, allow_pickle=True). See additional notes for loading iso fitting files.
  3. According to feedbacks from colleagues, star members in .npy format might be demaged and not be loadable with np.load() if you download them individually from this repo via web browser. We strongly recommend you download the whole repo via git clone if you want to use the .npy format member list. For convenience, I have prepared the member list in csv format. This guarantees the safe downloading via web browser with a small loss of precision.

  4. The whole pipeline (including the data and figure) is as large as 3 GB, which is actually not easy to download from GitHub. For conveniece, I have prepared the src.tar.gz, just in case you are only interested with the code. If you want to run isochrone fitting, an extra file in data/isochrone/ is required.

  5. Current SHiP pipeline includes the data preparation, FoF, isochrone fitting and classification parts, so that you may construct the same catalog presented in the above paper. The data visualization part is not provided, since the programs are not well documented and the writings are messy. However they are still available upon request.

  6. The installation of mpi4py is a little bit tricky. Fortunately it is not required if you use the core functions in single process mode, e.g. ischrone fitting, FoF clustering, see below for detailed instructions.

  7. The FoF clustering part and isochrone fitting part are the core of this pipeline. For those want to integrate them into your own pipeline, more detailed instructions are provided for the corresponding file below ( and

  8. Due to the file size limitation set by GitHub (< 100 MB), Z.npy (~ 129 MB) cannot be uploaded directly. I provide the compressed version 'Z_python3.npy.tar.gz'. You may download it and uncompress it with tar zxvf Z_python3.npy.tar.gz. I only provide Z.npy for GAIA system. To generate for your own photometry system, prepare .dat files in Padova webpage, use load_dat() and save_npy() in

  9. You may use to convert the npy format member list of every individual SC candidate to csv format which is readable by topcat.

  10. The number of domains that corresponds to the released catalog is 4170.

  11. A parameter bounds is added to as a global variable. Set doubles = None to fit both $\Delta_g$ and $\Delta_{b-r}$. Otherwise fit $\Delta_{b-r}$ only. In the latter case the input data should be in ABSOLUTE magnitude.

Feel free to contact me (e-mail:, wechat: thirtyliu) if you have any problem.


  • Raw GAIA DR2 data (GaiaSource_*_*.csv.gz file).
  • ali-gaia_dr2_source.txt: listing all .csv.gz files.


  • gaia_segxxxx.npy: 200 files, stored in array of structure dtype_gaia.


  • Retrieve position, proper motion, magnitude, error information from raw data, and store them in 200 gaia_seg files. See dtype_gaia for details.


  • gaia_segXXXX.npy: 200 files.


  • selXXXX.npy: 200 files, stored in array of structure dtype_select, which consists of two parts:
    • idx: star index in gaia_segXXXX.npy
    • param: 5 parameters: l, b, parallax, pmra, pmdec


  • Retrieve all stars that satisfy the following criterion:
    • mag_G < 18
    • 0.2 < parallax < 7.0 (in mas)
    • |pmra| < 30 and |pmdec| < 30 (in mas/year)
    • |b| < 25 degree
  • Positions have been evolved from epoch 2015.5 to epoch 2000.0 according to proper motion.


  • selXXXX.npy: 200 files.


  • gaia_partition.npy


  • This serial program equal partitions the 3D parameter space (l, b, parallax) continuously, such that each partition contains roughly the same number of stars.
  • The typical scale of star cluster (l_sc) and the dispersion of parallax (sigma_plx) determine the minimum size of the partition in the corresponding dimension. See the program for the values.


  • gaia_partition.npy: partition table, generated by
  • selXXXX.npy


  • stars_in_segxxxx.npy: 200 files, each row is a list that contains star indexes for the corresponding partition.


  • Assign stars to the corresponding partition according to partition table in gaia_partition.npy.

This pipeline is originally designed for the data processing of large number of particles. Therefore every step takes parallalization into account, which makes it complicated. Now I realize that most of cases, users only cares about a single sky area, which greatly simplifies the usage. Below is the intruction for FoF clustering usage for a single sky area.

  • The FoF method can only identify cluster from a smooth background. Therefore it might not work well if your sky area is too large.
  • The entry for FoF clustering is runfof(). You may find instructions on how to prepare data in example function load_stars_single().
  • Some variables for controlling the FoF clustering:
    • self.w: the weight of every quantity. Adjust them according to the feature of your data.
    • self.b_fof: the linking length. Usually 0.2 times the mean partile seperation.
    • n_star_min: a group is output with at least this number of stellar numbers. Change it according to your usage.
  • The output of the fof() function is ginfos and l_keys1. See output below for instructions. For single sky area FoF clustering, you may care about the star member list of each group in l_keys1.


  • stars_in_segXXXX.npy
  • selXXXX.npy


  • ginfos_pXXXX.npy: 2D floating array. Each row is a list that gives the basic information of a star cluster identified in this partition.
    • Meaning of each element in the list: cluster star number, l, b, r_max, pmra, pmdec, r_pm, parallax, r_parallax.
  • keys_sel_pXXXX.npy: each row is a list that contains all keys for that cluster. Each key is a 64 bit integer:
    • bit 0 - 31: star index in selXXXX.npy
    • bit 32 - 63: file index for selXXXX.npy


  • For each partition, the program identifies a series of star clusters using FoF algorithm, and save the basic info of the cluster and keys to the specific stars in ginfos_pXXXX.npy and keys_sel_pXXXX.npy, respectively.
  • Note! Star clusters given in this step are just intermediate results for each partition. They need to be further merged by as final product.


  • gaia_partition.npy
  • ginfos_pXXXX.npy
  • keys_sel_pXXXX.npy


  • ginfos_merge.npy: basic info of each cluster after merge
  • keys_seg_cluster.npy: keys (file and star index) of each cluster for gaia_segXXXX.npy


  • This program merges cluster from partitions. At present the merge is just based on the intersection of keys (keys_sel_pXXXX.npy) from two clusters. See is_merge() function for more details. nmin is set to 50 to select those with at least 50 members after merge.


  • keys_seg_cluster.npy
  • gaia_segXXXX.npy


  • fof_scXXXX.npy


  • This program retrieve stars from gaia_segXXXX.npy according to keys of each cluster recorded in keys_seg_cluster.npy.
  • Further analysis of star clusters will be totally based on fof_scXXXX.npy.


  • fof_scXXXX.npy
  • Z.npy: isochrones of multiple Z and ages


  • fit_iso_scXXXX.npy


  • This program reads g and b-r info from fof_scXXXX.npy, fits isochrones to derive Z and age. The fitting is carried out by minizing $\bar{d^2} = \sum_{k = 1}^{n}(x_k - x_{k, nn})^2 / n$, where $x_k = (b-r, g)$ is color and magnitude of the $k~\mathrm{th}$ cluster member, $x_{k, nn}$ is the nearest neighbor of $k~\mathrm{th}$ member in the isochrone.

This file provides all the necessary functions to for isochrone fitting. Below is a summary of its usage.

  • The main function for isochrone fitting is fit_age_Z(). An example is provided by fit():

      iso =   ISO()
      d_fit   =   iso.fit_age_Z(g, b_r)

g and b_r are numpy arrays for photometry. d_fit is a dict that stores the fitting result. You may load it with np.load(fit_iso_name, allow_pickle=True).flat[0]. Pay attention to the extra encoding='latin1' param if you load our fitting results with Ptython 3.

  • Originally I expect the users generate their Z.npy using load_dat() and save_npy() for their own photometry data. However it seems more convenienet to provide Z.npy directly. Please download it from data/isochrone/Z_python3.npy.tar.gz and uncompress it with tar zxvf Z_python3.npy.tar.gz.

  • As mentioned in the paper, there is a magnitude cut g < 17 for bright stars. You may change or discard this criteria.

  • For efficiency, the fitting only deals with a maximum of 1000 stars, which is enough for most of star cluster cases. This is controlled by nmax. You may change or discard this criteria.

  • The fitting is only tested on apparent magnitude. You may try with absolute magnitude. If you succeed, please let me know.

  • You may want to plot the fitting result. I provide this function in The input is g, b_r and the name of your fitting result file. Or just modify it to pass the fittign result dict. The problem is pretty simple.

Below is old information on generating Z.npy for your own photometry system.



  • Z.npy: metallicity and age table generated from .dat files.


The ISO class in this file provides the following functions:

  • load_dat() and save_npy(): generate Z.npy from .dat files.


  • fit_iso_scXXXX.npy: isochrone fitting result ($t_\mathrm{age}$, $\bar{d^2}$)
  • fof_scXXXX.npy: g and b-r for narrowness ($r_\mathrm{n}$) calculation, $n_{g&lt;17}$


  • sc_info.txt: id_sc, ntot, $n_{g&lt;17}$, $\bar{d^2}$, $r_\mathrm{n}$, $Z$ (logMsol), $t_\mathrm{age}$ (in Gyr), $\Delta_g$, $\Delta_{b-r}$, $d_\mathrm{plx}$, $d_\mathrm{iso}$, classification


This program reads sc info and classifies them., and


  • ginfos_merge.npy: position and radius of star clusters
  • sc_info.txt: classification info
  • K13.dat (K13, Kharchenko 2013)
  • CG18.dat and CG18b.dat (CG18, Cantat-Gaudin 2018a,b)
  • B19.dat (B19, Bica et al. 2019)

Output: cross matching catalog

- col 1: line No. (0 indexed) in `ginfos_merge.npy`
- col 2: position seperation in two catalogs
- col 3: radius in `ginfos_merge.npy`
- col 4: radius in reference 
- col 5: line No. (0 indexed) in K13, CG18 and B19
- col 6: classification


  • This program cross matches the FoF sc of this work with K13, CG18 and B19. Two clusters are regarded as matched if their distance is smaller than both of their radii.


  • cat_all.txt


  • sc_groupXXXX.txt: cluster members of a group.
  • sc_groups.txt: number of clusters in each group.


  • This program reads all the class 1 cluster candidates, identifies cluster groups with standard FoF method, outputs groups that contain at least 2 members.


  • In this program and only in this program, we use the $d$, $l$, $b$ to $X$, $Y$, $Z$ conversion described by Eq. 3 of Conrad et al. (2017). This is different from the commonly used conversion adopted in our paper (Fig. 8).


  • Extract the isochrone fitting result of the 2443 scs.


  • fit_iso.txt
    • col 0: id (from 0)
    • col 1: distance module ($\Delta G$)
    • col 2: color excess ($\Delta B-R$)
    • col 3: age (in Gyr)
    • col 4: Z (in log10(Z/Zsol), here Zsol = 0.0152)


A Star Cluster Finder for Gaia Catalog based on Friend of Friend and isochrone fitting







No releases published
