# Workshop 3: Exploring IllustrisTNG simulations to derive observationally comparable star formation rates and metallicities

## Notebook 1: IllustrisTNG API and downloading data

In this notebook, you will be introduced to the different data types availbale for the TNG simulations, and how to use the TNG API to download the data you need. If you haven't download iapi_TNG.py here: https://github.com/bryannemcd/TNG_workshop/blob/main/iapi_TNG.py

IMPORTANT: You will need to update iapi_TNG.py on your machine with the API key you requested from the TNG website. If you do not have an API key, request one here: https:/www.tng-project.org/users/register/

Below, you will find several exercises to get practice working with the different data types available. There are also 'extensions,' prompts for you to explore data that is aligned with your interests.

In [None]:
import iapi_TNG as iapi
#this package contains useful functions for downloading the neccessary data
#make sure you have edited iapi_TNG.py to include your personal API key
import numpy as np
import h5py #most TNG data is downloaded as hdf5 files
import matplotlib.pyplot as plt
import os.path

 
baseUrl = 'http://www.tng-project.org/api/'


### General simulation data

Using the TNG API, we can explore basic information about the simulations available. 

In [None]:
####EDIT THIS FOR YOUR MACHINE#####
dirc ='/path/to/directory/'
#dirc='/projectnb/res-star/TNG_workshop/'

###specify which simulation you want to explore###
sim='TNG100-1'

"""
TNG100-1 is the highest resolution simulation in the 100 Mpc simulation box
TNG50-1 is the highest resolution available in a 50 Mpc box
TNG300-1 is the largest volume simulation (300 Mpc)
Lower resolutions are available with -N replacing -1, allowing for testing resolution dependency
'Dark' simulations are also available: these are dark-matter only runs
Subboxes are availble that provide higher time resolution
For the exercises in this worrkshop, you will need to use a baryonic simulation, we recommend TNG100-1, TNG50-1, or TNG300-1
Check all available simulations by uncommenting the line below:
"""
r=iapi.get(baseUrl)
#print([sim['name'] for sim in r['simulations']])

In [None]:
#check the properties of the simulation you have selected
simUrl = baseUrl+sim
print(simUrl) #view the simulation data in your browser by following the URL (make sure you are logged in!)
simdata = iapi.get(simUrl)
print(simdata['description'])

##uncomment line below to see all the simulation-level information available, or follow the simUrl to view in browser
#print(simdata.keys())


#### Exercise (don't skip!): Find value of Hubble's constant used in the simulation you chose to explore
In simulations, units often include 'little h' or Hubble's constant divided by 100.

In these simulations, the value of h is stored in the simulation data as the 'hubble' key.

In [None]:
###Uncomment and complete this line###
#h=
print(h)
##Hint: should find h=0.6774

### Group catalogs

Group catalogs contain properies of all identified halos or subhalos (galaxies) in a given snapshot. These are good for obtaining masses, positions, and other global properties. You can check out details about the available fields here: https://www.tng-project.org/data/docs/specifications/#sec2 

In iapi_TNG there are two similar functions that obtain a field for all subhalos or all halos in a given simulation at a given snapshot:

> getSubhaloField(field, simulation='TNG100-1', snapshot=99, fileName='tempCat', rewriteFile=1)

> getHaloField(field, simulation='TNG100-1', snapshot=99, fileName='tempCat', rewriteFile=1)

- field (str): name of field to be returned from the table linked above, e.g. 'SubhaloPos'
- simulation (str): name of simulation, e.g. 'TNG100-1'
- snapshot (int): snapshot to pull data from. For TNG, snapshot=99 is z=0, which is the default
- fileName (str): path to the file where you want to save the data, recommended to avoid repeated API requests
- rewriteFile (0 or 1): if 0 (recommended), will attempt to pull from an existing file (fileName) before downloading; if 1 will download and overwrite

Now let's fetch the fields we will want for our later analysis

In [None]:
#the flag field indicates whether a subhalo is cosmological in origin
#you will generally only want to use subhalos that have flag=1
flag=iapi.getSubhaloField('SubhaloFlag',simulation=sim,fileName=dirc+'catalogs/SubhaloFlag',rewriteFile=0)

In [None]:
#fetch the positions in case we want them later
pos=iapi.getSubhaloField('SubhaloPos',fileName='catalogs/SubhaloPos',rewriteFile=0)


In [None]:
#let's fetch a field that will tell us about the mass of the galaxy
#SubhaloMassType gives the total mass of all bound particles, separated by particle type
mass=iapi.getSubhaloField('SubhaloMassType',simulation=sim,fileName=dirc+'catalogs/MassType',rewriteFile=0)
print(mass.shape)

#note that there are 6 entries for each subhalo, one for each particle type:
#0 - gas
#1 - dark matter
#2 - unused
#3 - tracers (you can ignore these)
#4 - stars/wind
#5 - black holes

#Pull the stellar mass: 
stellar_mass=mass[:,4]

#check the subhalo catalog for the default units, convert into stellar masses
stellar_mass=stellar_mass*10**10/h
#print(min(stellar_mass[np.nonzero(stellar_mass)]))

#running into an error? did you define h above?

#### Exercise:

There are several other fields relating to galaxy mass. Review those found at the link above and fetch at least one other field relating to mass. Later, we will test the effect of using other definitions of mass on the global star formation main sequence. Generally, you will want to use a mass that is most comparable to how mass was measured in observations you want to compare to.


In [None]:
### Pull an additional stellar mass field ###

In [None]:
sfr_inst = iapi.getSubhaloField('SubhaloSFR',simulation = sim,fileName=dirc+'SubhaloSFR',rewriteFile=0)
#the subhalo catalog includes SubhaloSFR, which is the sum of SFRs over all gas particles bound to the subhalo
#this is NOT directly comparable to SFRs obtained from observations, 
#because observational tracers generally detect already formed stars, not stars about to be formed

#if we want to get more comparable SFRs, we'll have to dig into particle data or merger trees

#### Extension: Metallicities
Interested in metallicities? Reveiw the fields availalbe in the data specifications. Make sure to scroll all the way through, there are many different metallicity fields, based on both stars and gas. 

In [None]:
### Download metallicity fields ###

In [None]:
#process the data to make our galaxy catalog
#it's useful to keep track of the subhalo ID (subID), the index into the fields
subID=np.arange(0,len(flag))

#make cuts based on flag, and any other cuts to generate your sample
wh_incl=np.nonzero((flag==1) & (stellar_mass>10**8))
#to make additional cuts, add additional criteria to the line above 
#e.g. wh_incl=np.nonzero((flag==1) & (stellar_mass>masscut))

#Now store fields for our sample in a dictionary

###Update with any other fields you would like to store###

IDs=subID[wh_incl]
pos_cut =pos[wh_incl]
s_mass_cut=stellar_mass[wh_incl]
sfr_i_cut = sfr_inst[wh_incl]


galcat = {
    'subID' : IDs,
    'pos' : pos_cut,
    'M_*' : s_mass_cut,
    'SFR_inst': sfr_i_cut
}

#save the galaxy catalog for later use
np.save(dirc+'galcat', galcat)


### Merger Trees

Tracing a subhalo through cosmic time can be complicated by the major and minor mergers that ultimately form a z=0 galaxy. The merger trees trace the most massive progenitor of a subhalos through previous snapshots. See the TNG data specifications for more information: https://www.tng-project.org/data/docs/specifications/#sec2

In this workshop, we will be using the SubLink merger trees.

iapi_TNG contains the function gettree(snapnum,subID), which obtains the tree for a given galaxy. The trees contain all the fields in the Halo and Subhalo group catalogs, for each snapshot. Subhalo information will always be for the progenitor of the subID at snapnum. The group/halo of a subhalo may change, so the group information in previous snapshots may not be for the group the subhalo is a member of at snapnum.

getredshift(snapnum) is another useful function in iapi_TNG. This returns the redfshift of a given snapshot. 

In [None]:
#Let's explore the history of a subhalo in the top middle of our sample
#galaxy indices are very roughly ordered from largest to smallest
sub=IDs[round(len(IDs)/2)]
print(sub)
#sub = np.random.choice(IDs)
#print(sub)

sub_ind= np.where(IDs==sub)[0][0]
print(sub_ind)
print(pos[sub_ind])
subTreeFile = iapi.gettree(99,sub)

#open the hdf5 file that contains the tree
subTree= h5py.File(subTreeFile,'r')

#What fields are available?
print(subTree.keys())


In [None]:
#pull the snapshot numbers that correspond to each entry in each of the fields for this tree
snaps = subTree['SnapNum'][:]
print(snaps[0])
#notice how the first entry corresponds to z=0! The latest entries are first, while the earliest entries are last
print(snaps[-1])
#Some subhalos have shorter merger trees than others
#If the random subhalo you selected earlier has a short merger tree (earliest snapshot > ~70), rerun the previous block

In [None]:
#What redshift does the earliest snapshot correspond to?
print(iapi.getredshift(snaps[-1]))

In [None]:
#construct an array of redshifts corresponding to snaps
z = iapi.getredshift(snaps)
#print(z)

#### Exercise: Construct a plot of star formation rate versus redshift for this subhalo


In [None]:
### Plot SFR versus redshift for this subhalo ###
#hint: pull the SubhaloSFR field from the merger tree first



#### Extension: Plot other properties versus redshift for this subhalo
What other properties would be interested to explore as a function of redshift?

In [None]:
### Extension ###

### Particle Data
Now, we can download some particle data that will allow us to compute luminosity-weighted ages and time-averaged star formation rates for a galaxy. Full snapshots contain way more data than we need, so we can instead pull the parameters we want from 'cutouts' - files that contain data for all particles bound to a subhalo.

In [None]:
parttype='stars'
particle_fields = 'Coordinates,Masses,GFM_Metallicity,GFM_StellarFormationTime,GFM_InitialMass,GFM_StellarPhotometrics'
#Note that for time-averaged SFR calculations, the initial mass of a star should be used

cut_file = iapi.getSubcutout(sub, parttype, particle_fields, sim=sim, snapnum=99, fName=dirc+'cutouts/'+parttype+'_'+str(sub))
print(cut_file)

#running into issues? follow the sub_url that getSubcutout prints to check if a cutout exists for this subhalo

### Supplementary Data Catalogs
Rather than instantaneous star formation rates from gas particles, a more observationally-comparable measure is time-averaged star formation rates. Later in this workshop we will cover how to obtain these yourself, but for now we can make use of a supplementary catalog: https://www.tng-project.org/data/docs/specifications/#sec5b

In [None]:
#first download the catalog for the simulation of your choice 
#note that some catalogs are only available for certain simulations
#this is a large file, it may take awhile to download
SFRf=dirc+'SubhaloSFR_ta'
if os.path.exists(SFRf): sfr_cat=SFRf+'.hdf5'
else: sfr_cat = iapi.get('https://www.tng-project.org/api/'+sim+'/files/star_formation_rates.hdf5', fName=SFRf)



In [None]:
###which snapshot are you interested in? ###
snapN=99 #z=0

with h5py.File(sfr_cat,'r') as f:
    f_N =f['Snapshot_'+str(snapN)]
    print(f_N.keys())
    #note that there are several options
    #there are SFRs that are calculated within certain apertures
    #and SFRs calculated over different timescales
    #when comparing simulations to observations, it's important to use a timescale comparable to the observational tracer
    #e.g. Halpha roughly traces 10Myrs
    SFR_subind=np.asarray(f_N['SubfindID'].astype('int'))
    SFR_all_10=np.asarray(f_N['SFR_MsunPerYrs_in_all_10Myrs'][:])


#make an array of SFRs for our sample
SFR_10=np.empty(len(IDs))
for i in range(0,len(IDs)):
    wh=np.nonzero((SFR_subind==IDs[i]))
    try: SFR_10[i]=SFR_all_10[wh]
    #not all galaxies in our sample may have SFRs computed in this catalog
    except: 
        SFR_10[i]=np.nan
    

#### Extension: Edit the code above to use a different aperature, or save additional timescales

In [None]:
### Extension ###

In [None]:
#now let's update our galcat
galcat['SFR_all_10']=SFR_10
### add any other fields you want to save ###
np.save(dirc+'galcat', galcat)