### Filtering galaxies by threshold

Previously, we looked at different spiral regions with respect to thresholds to determine what works best for us. Since we know that a threshold of $T_{sp} = T_{nsp} = 3$ works, we'll filter our sample further. We'll do this the same way we did in `1 initial_sample_filter.ipynb`. This will be the last filter we'll do before our data processing iA!

In [1]:
import numpy as np
import pandas as pd
import sys

#we'll import the gz3d_fits module now
sys.path.insert(0, '../../GZ3D_production/')
import gz3d_fits

[0;34m[INFO]: [0mNo release version set. Setting default to MPL-11


In [2]:
manga_gz3d_spirals = np.load('manga_gz3d_spirals.npy', allow_pickle=True)

The following function will give the percentage of pixels identified by at least 3 people as being part of a spiral. This will help us filter out our sample to galaxies with more confident spiral arms.

In [3]:
# This function will now tell us the percentage of pixels identified as a spiral arm in the spiral galaxy
# by at least threshold number of people.

def get_pc_spiral_pixels(path, threshold=0):
    data = gz3d_fits.gz3d_fits(path)
    image_spiral_mask = data.spiral_mask
    pixels_above_threshold = (image_spiral_mask > threshold).sum()
    
    return (pixels_above_threshold * 100) / image_spiral_mask.size

In [4]:
# We'll form a list of dictionaries, each containing some information (the filepath and MaNGA ID) for
# the galaxy. We'll also calculate what percent of pixels in the galaxy's image have been classified as spiral arms
# by 3 people. This will help us drop galaxies with no classifications.
galdict_array = []
manga_gz3d_spirals_len = len(manga_gz3d_spirals)

for idx, path in enumerate(manga_gz3d_spirals):
    mangaid = path.split('/')[-1].split('_')[0]
    percent  = get_pc_spiral_pixels(path, threshold=3)
    
    galdict = {
        'filepath': path,
        'mangaid': mangaid,
        'pc_spiral_pixels': percent
    }
    
    if (idx+1) % 25 == 0: #just to keep track of processing
        print((manga_gz3d_spirals_len - idx + 1), 'galaxies left')
        
    galdict_array.append(galdict)

2273 galaxies left
2248 galaxies left
2223 galaxies left
2198 galaxies left
2173 galaxies left
2148 galaxies left
2123 galaxies left
2098 galaxies left
2073 galaxies left
2048 galaxies left
2023 galaxies left
1998 galaxies left
1973 galaxies left
1948 galaxies left
1923 galaxies left
1898 galaxies left
1873 galaxies left
1848 galaxies left
1823 galaxies left
1798 galaxies left
1773 galaxies left
1748 galaxies left
1723 galaxies left
1698 galaxies left
1673 galaxies left
1648 galaxies left
1623 galaxies left
1598 galaxies left
1573 galaxies left
1548 galaxies left
1523 galaxies left
1498 galaxies left
1473 galaxies left
1448 galaxies left
1423 galaxies left
1398 galaxies left
1373 galaxies left
1348 galaxies left
1323 galaxies left
1298 galaxies left
1273 galaxies left
1248 galaxies left
1223 galaxies left
1198 galaxies left
1173 galaxies left
1148 galaxies left
1123 galaxies left
1098 galaxies left
1073 galaxies left
1048 galaxies left
1023 galaxies left
998 galaxies left
973 galaxies 

Here's our dataframe ready to work with!

In [5]:
df = pd.DataFrame.from_dict(galdict_array)
df

Unnamed: 0,filepath,mangaid,pc_spiral_pixels
0,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-26306,0.262676
1,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-178542,1.232834
2,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-91339,1.839456
3,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-51315,1.453061
4,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-94066,9.491156
...,...,...,...
2291,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-2604,2.522630
2292,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-71763,0.000000
2293,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-548639,0.810159
2294,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-352635,4.374422


Now we'll further make sure we conisder only galaxies with a significant portion of pixels identified as spirals, so we don't end up with a one-pixel spiral in a galaxy. We'll call at least 1% of pixels identified as significant.

In [6]:
df = df[df.pc_spiral_pixels > 1]
df

Unnamed: 0,filepath,mangaid,pc_spiral_pixels
1,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-178542,1.232834
2,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-91339,1.839456
3,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-51315,1.453061
4,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-94066,9.491156
5,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-145679,2.699683
...,...,...,...
2289,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-603974,2.803810
2290,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-176187,2.675374
2291,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-2604,2.522630
2294,/home/sshamsi/sas/mangawork/manga/sandbox/gala...,1-352635,4.374422


In [7]:
final_sample_paths = df.filepath.to_numpy()

In [8]:
np.save('final_sample_paths.npy', final_sample_paths)