This notebook will walk you through acquiring the necessary data before further analysis can be done. 

First, import the necessary functions. Make sure you provide the right path to the custom module ```simulation_data```.

In [None]:
import numpy as np
import h5py

from simulation_data import get
from simulation_data.galaxies import GalaxyPopulation
my_galaxy_population = GalaxyPopulation()
from simulation_data.galaxies.galaxy import get_galaxy_particle_data, get_stellar_assembly_data

Now, download targeted particle data for galaxies only within the specified mass cut at the specified redshift. This cell downloads particle data for all $z=2$ galaxies within $10^{10.5} \leq M_{*}/M_{\odot} \leq 10^{12}$. 

Remember to check the path specified in the function ```get_galaxy_particle_data``` in ```simulation_data.galaxies.galaxy``` and make sure it points to a valid local drive. Within the target drive, create a folder titled ```'redshift_'+str(redshift)+'_data``` before running this cell. Note that the ```get_stellar_assembly_data``` needs a pre-existing stellar assembly file to run. Generating Figures 4 and 5 in the ```Figures``` notebook partially depends on the stellar assembly files for $z=2$.

In [None]:
redshift = 2
#this initializes the values in simulation_data.galaxies.galaxy_population
ids = my_galaxy_population.select_galaxies(redshift=redshift, mass_min=10.5, mass_max=12)

#this gets and saves the particle data for each galaxy in our selection
for idx in ids:
    get_galaxy_particle_data(id=idx, redshift=redshift, populate_dict=False)
    #Download Stellar Assembly Files for the chosen redshift before attempting to get particle assembly data
    get_stellar_assembly_data(id=idx, redshift=redshift, populate_dict==False)

We now get the ids for the progenitors and descendents of our population identified at $z=2$. This example finds the main progenitor for each galaxy at $z=3$ and the descendant at $z=1.5$. It saves the arrays of ids at each redshift in the file ```redshift_ids.hdf5```. This step may be time-consuming. The intermediate print statements are not required.

In [None]:
z2_ids = ids
z3_ids = np.array([-1]*(len(ids)))
z1_5_ids = np.array([-1]*(len(ids)))


#Finding the progenitors at z=3   
count = 0
print('z=3', z3_ids)
for i, id in enumerate(z2_ids):
    if z3_ids[i] == -1:
        start_url = "http://www.tng-project.org/api/TNG100-1/snapshots/33/subhalos/" + str(id)
        sub = get(start_url)  
        while sub['prog_sfid'] != -1:
            # request the full subhalo details of the progenitor by following the sublink URL
            sub = get(sub['related']['sublink_progenitor'])
            if sub['snap'] == 25:
                z3_ids[i] = sub['id'] 
    count += 1
    print(count)
with h5py.File('redshift_ids.hdf5', 'a') as f:
    d1 = f.create_dataset('z3_ids', data = z3_ids)
    d2 = f.create_dataset('z2_ids', data = z2_ids)
    
#Finding the descendants at z=1.5
count = 0
print('z=1.5', z1_5_ids)
for i, id in enumerate(z2_ids):
    if z1_5_ids[i] == -1:
        start_url = "http://www.tng-project.org/api/TNG100-1/snapshots/33/subhalos/" + str(id)
        sub = get(start_url)   
        while sub['desc_sfid'] != -1:
            # request the full subhalo details of the progenitor by following the sublink URL
            sub = get(sub['related']['sublink_descendant'])
            if sub['snap'] == 40:
                z1_5_ids[i] = sub['id']         
    count += 1
    print(count)
with h5py.File('redshift_ids.hdf5', 'a') as f:
    d3 = f.create_dataset('z1.5_ids', data = z1_5_ids)

For each new set of ids for progenitors and descendants, now repeat the steps from the second cell to get and save the particle data for each galaxy at the new redshift. 

Remember to create new folders for each redshift you look at. We do not add the stellar assembly data for these redshifts.

In [None]:
with h5py.File('redshift_ids.hdf5', 'r') as f:
    z1_5_ids = f['z1.5_ids'][:]
    z3_ids = f['z3_ids'][:]

redshift = 1.5
for idx in z1_5_ids:
    get_galaxy_particle_data(id=idx, redshift=redshift, populate_dict=False)
    
redshift = 3
for idx in z1_5_ids:
    get_galaxy_particle_data(id=idx, redshift=redshift, populate_dict=False)

This concludes our section on downloading data. 

To speed up analysis, we now calculate and store some halo properties in a separate hdf5 file named ```'galaxy_population_data_'+str(self.redshift)+'.hdf5'```. Remember to finish running the above section before moving on to the steps below.

In [None]:
redshift = 2
#this initializes the values in simulation_data.galaxies.galaxy_population
ids = my_galaxy_population.select_galaxies(redshift=redshift, mass_min=10.5, mass_max=12)

#calculate halo properties and store calculated data
my_galaxy_population.get_galaxy_population_data()