# Introduction: HST WFC3 Free Retrieval Transmission/Transit Tutorial  

Welcome to the transmission spectrum model/retrieval tutorial!

For this particular setup, the atmosphere is parameterized within the classic "free retrieval" framework.  In the free retrieval, typically, molecular/atomic abundances are assumed to be constant with altitude and chemically un-related (e.g., any abundance combinations are allowed). In principle, we can retrieve anything with an opacity (cross-sections, CK-coefficients). This tuotiral/code has these opacities for H2O,  CH4,  CO,  CO2, NH3,   HCN,   H2S,  PH3  C2H2, C2H6,  Na,    K,   TiO,   VO,   FeH,  H,    H2,   He,   e-, H- either through "line" or "continuum" opacities (e.g., H- free-free dependes on both the e- and H abundances).  Here we will assume any remaining gas is H2+He (at solar He/H2 ratio). This may not be a good assumption for ultra hot Jupiters, where H becomes dominent, or for very high metallicity objects where other species like H2O, CO2, N2, or CO become major background gases.  Free retrievals tend to work best in the "run-of-the-mill-hot-but-not-too-hot-Jupiter" realm.  Be weary, however, as free retrievals can trick one into a good fit with seemingly unphysical solutions.  None-the-less, they do provide some insight.

For the temperature profile, here, we assume the 3-parameter temperature  Guillot 2010/Parmentier et al. 2014 analytic formulism (see Line et al. 2013a for implementation details).     

The transmission spectrum routine closely follows the equations (and figure) in Tinetti et al. 2012 (as described in the tutorial text).  Instead of using line-by-line, or "sampled" cross-sections, this implementation uses the "correlated-K" method (see Lacis & Oinas 1990, or more recently Amundsen et al. 2017). Correlated-K is advantageous as it preserves the wavelength bin"integrated"  precision as line-by-line but with far less demanding computation.    See the "opacity" tutorial for more details on how correlated-K works. 

There are two different cloud prescriptions built in.  The first is the Ackerman & Marley 2001 "eddy-sed" approach that self-consistently computes the vertical particle size distribution given a sedimentation factor, $f_{sed}$ and an eddy mixing factor (K$_{zz}$) from some cloud base pressure and intrinsic condensate mixing ratio.  The classic "power-law haze" and "grey cloud" prescripton is also included.

This specific notebook goes through the steps to generate the forward model, and illustrate how to actually perform the free retrieval. However, the retrievals are bust run on a compute cluster or a node with more than 4 cores.  We will use the "benchmark" system, WASP-43b as our example utilizing the HST WFC3 data presented in Kreidberg et al. 2014b.

Software Requirements: This runs in the python 3 anaconda environment.  It is also crucial that anaconda numba is installed as many of the routines are optimized using numba's "@jit" decorator (http://numba.pydata.org/). 


# Import Routines, Load Opacities-------------------------------------  

This first segment loads in the routines from fm.py and the correlated-K coefficients.  There are two sets of correlated-K coefficients (which I've called "xsecs" here).  There are ones taylored for HST WFC3+STIS (xsects_HST function in fm.py) and JWST (xsects_JWST in fm.py).  The WFC3+STIS correlated-K coefficients are generated at an R=200 longwards of 1 $\mu$m (up to 5$\mu$m and R=500 from 0.3 - 1 $\mu$m.  Note...these *are not sampled cross-sections* so each resolution element at that R is correctly computed and matches line-by-line when binned to that same R.

Note that the "core" set of routines are all in fm.py.  If you want to know more about what is in the sausage, look into fm.py.  

In [175]:
#import all of the functions in fm, namely, the CK-coefficients (may take a minute)
from fm import *
%matplotlib notebook

#preload CK-coeffs--a giant array/variable to be passed--inputs are lower wavenumber, upper wavenumber
#between 2000 and 30000 cm-1 for HST--R=200 > 1 um, then R=500 < 1 um 
#to convert between microns and wavenumbers-- wavelength [um] = 10,000/wavenumber [cm-1]
#make sure xsec wavenumber/wavelength range is *larger* than data wavelength range
xsects=xsects_JWST(1666,20000)


Cross-sections Loaded


# Setup Atmospheric Parameters to Generate a Spectrum -------------------------

This segement defines the various atmospheric quantities and assignes them values for the generation of a simple transmission spectrum.  A description of each parameter along with a reasonable range of values is given as a comment following the assigned value. All of the parameters are then put into the parameter "state-vector" array, x.

In [179]:
#setup "input" parameters. We are defining our 1D atmosphere with these
#the parameters
#planet/star system params--xRp is the "Rp" free parameter, M right now is fixed, but could be free param
Rp= 0.865  #Planet radius in Jupiter Radii--this will be forced to be 10 bar radius--arbitrary (scaling to this is free par)
Rstar= 0.8292 # #Stellar Radius in Solar Radii
M = 0.271  #Mass in Jupiter Masses
D= 0.4225   #semimajor axis in AU

#TP profile params (3--Guillot 2010, Parmentier & Guillot 2013--see Line et al. 2013a for implementation)
Tirr=420     #Irradiation temperature as defined in Guillot 2010
logKir=-1.5  #TP profile IR opacity (log there-of) controlls the "vertical" location of the gradient
logg1=-0.7     #single channel Vis/IR (log) opacity. Controls the delta T between deep T and TOA T
Tint=200 #interior temperature...this would be the "effective temperature" if object were not irradiated

gelectron, gH2, gH, gHplus,gHminus,gVO,gTiO,gCO2,gHe,gH2O,gCH4,gCO,gNH3,gN2,gPH3,gH2S,gFe,gNa,gK = 0.104E-12, \
0.823E+00,0.262E-07,0.451E-37,0.207E-17,0.793E-18,0.295E-20,0.943E-06,0.163E+00,0.806E-02,0.451E-02,0.129E-03,\
0.146E-03,0.607E-03,0.394E-06,0.259E-03,0.111E-11,0.664E-05,0.539E-06

#rint('CO,CO2:',np.log10(gCO),np.log10(gCO2))
#log Gas abundances
H2O=np.log10(gH2O)
CO2=np.log10(gCO2)
print('CO,CO2:',CO,CO2)
NH3=np.log10(gNH3)
N2=np.log10(gN2)
HCN=-15.
H2S=np.log10(gH2S)
PH3=np.log10(gPH3)
C2H2=-15.
C2H6=-15
H4=np.log10(gCH4)
CO= np.log10(gCO)
CO2= -15.
Na=np.log10(gNa)
K=np.log10(gK)
TiO=np.log10(gTiO)
VO=np.log10(gVO)
FeH=-15.
H=np.log10(gH)
em=np.log10(gelectron)
hm=np.log10(gHminus)

#Ackerman & Marley 2001 Cloud parameters--physically motivated with Mie particles
logKzz=8 #log Kzz (cm2/s)--valid range: 2 - 11 -- higher values make larger particles
fsed=2.0 #sediminetation efficiency--valid range: 0.5 - 5--lower values make "puffier" more extended cloud 
logPbase=-1.0  #cloud base pressure--valid range: -6.0 - 1.5
logCldVMR=-5.5 #cloud condensate base mixing ratio (e.g, see Fortney 2005)--valid range: -15 - -2.0

#simple 'grey+rayleigh' parameters just in case you don't want to use a physically motivated cloud
#(most are just made up anyway since we don't really understand all of the micro-physics.....)
logKcld = -40  #uniform in altitude and in wavelength "grey" opacity (it's a cross-section)--valid range: -50 - -10 
logRayAmp = -30  #power-law haze amplitude (log) as defined in des Etangs 2008 "0" would be like H2/He scat--valid range: -30 - 3 
RaySlope = 0  #power law index 4 for Rayleigh, 0 for "gray".  Valid range: 0 - 6

#10 bar radiuss scaling param (only used in transmission)
xRp=0.917#0.991

#stuffing all variables into state vector array
x=np.array([Tirr, logKir,logg1,Tint, 0, 0, 0,0, Rp*xRp, Rstar, M, logKzz, fsed,logPbase,logCldVMR, logKcld, logRayAmp, RaySlope])
#gas scaling factors to mess with turning on various species
#set to "0" to turn off a gas. Otherwise keep set at 1
#thermochemical gas profile scaling factors
# 0   1    2    3   4    5    6     7    8    9   10    11   12   13    14   15   16   17   18  19 20   21
#H2O  CH4  CO  CO2 NH3  N2   HCN   H2S  PH3  C2H2 C2H6  Na    K   TiO   VO   FeH  H    H2   He   e- h-  mmw
gas_scale=np.array([H2O,CH4,CO,CO2,NH3,N2,HCN,H2S,PH3,C2H2,C2H6,Na,K,TiO,VO ,FeH,H,-50.,-50.,em, hm,-50.]) #

CO,CO2: -3.889410289700751 -6.0254883072626715
CO,CO2: [0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129] -6.0254883072626715


# Load in Transmission Data Set 



In [191]:
# TOI-199 transpec:
wl, wu, y_meas, err = np.loadtxt('sim_data_toi199.dat',unpack=True,usecols=(0,1,2,3))
wlgrid = ((wl+wu)*0.5)*1e-4
y_meas = y_meas*1e-6
err = err*1e-6


# Generate Model Atmosphere & Transmission Spectrum --- NIRSpec only  

Here we call the forward model routine "fx" (think F(x)) from fm.py.  fx controls the input values and calls the relevent functions to compute the transmission spectrum.  The inputs into fx are the parameter state vector, "x", the  data wavelength grid, "wlgrid", the gas scaling factors (for turning off particular gases), "gas_scale", and the correlated-K tables, "xsects".  Fx then returns the simulated model spectrum ($(R_p/R_{\star})^2$) at the native CK-table resolution, "y_mod", the native wavenumber grid, "wno", the data wavelength grid binned model spectrum, "y_binned".  The "atm" array contains the generated temperature-pressure profile and gas mixing ratio profiles generated under the chemically consistent assumption. 

In [192]:
#calling forward model, fx. This will produce the (Rp/Rstar)^2 spectrum....
y_binned,y_mod,wno,atm=fx_trans_free(x,wlgrid,gas_scale, xsects)  #returns binned model spectrum, higher res model spectrum, wavenumber grid, and vertical abundance profiles from chemistry
print('DONE')

DONE


# Plotting the Model Atmosphere & Transmission Spectrum  ---------------

Self-explanatory...

# Plot Model Atmosphere  

Spaghetti plot of the model atmosphere (this one is really boring since the abundances are forced constant with altitude...).

In [193]:
from matplotlib.pyplot import *
from matplotlib.ticker import FormatStrFormatter
%matplotlib notebook

#unpacking variables
#P is in bars
#T is in K
#H2O, CH4,CO,CO2,NH3,Na,K,TiO,VO,C2H2,HCN,H2S,FeH,H2,He are gas mixing ratio profiles
#qc is the condensate abundance profile given an "f_sed" value and cloud base pressure
#r_eff is the effective cloud droplet radius given (see A&M 2001 or Charnay et al. 2017)
#f_r is the mixing ratio array for each of the cloud droplet sizes.
P,T, H2O, CH4,CO,CO2,NH3,Na,K,TiO,VO,C2H2,HCN,H2S,FeH,H2,He,H,e, Hm,qc,r_eff,f_r=atm


fig2, ax1=subplots()
#feel free to plot whatever you want here....
ax1.semilogx(H2O,P,'b',ls='--',lw=2,label='H2O')
ax1.semilogx(CH4,P,'black',ls='--',lw=2,label='CH4')
ax1.semilogx(CO,P,'g',ls='--',lw=2,label='CO')
ax1.semilogx(CO2,P,'orange',ls='--',lw=2,label='CO2')
ax1.semilogx(NH3,P,'darkblue',ls='--',lw=2,label='NH3')
ax1.semilogx(Na,P,'b',lw=2,label='Na')
ax1.semilogx(K,P,'g',lw=2,label='K')
ax1.semilogx(TiO,P,'k',lw=2,label='TiO')
ax1.semilogx(VO,P,'orange',lw=2,label='VO')
ax1.semilogx(qc,P,'gray',lw=1,ls='--',label='Cond. VMR.')  #<---- A&M Cloud Condensate VMR profile (not droplets)

ax1.set_xlabel('Mixing Ratio',fontsize=20)
ax1.set_ylabel('Pressure [bar]',fontsize=20)
ax1.semilogy()
ax1.legend(loc=4,frameon=False)
ax1.axis([1E-9,1,100,1E-7])

#plotting TP profile on other x-axis
ax2=ax1.twiny()
ax2.semilogy(T,P,'r-',lw='4',label='TP')
ax2.set_xlabel('Temperature [K]',color='r',fontsize=20)
ax2.axis([0.8*T.min(),1.2*T.max(),100,1E-6])
for tl in ax2.get_xticklabels(): tl.set_color('r')
ax2.legend(loc=1,frameon=False)

savefig('./plots/atmosphere_transmission_WFC3_FREE.pdf',fmt='pdf')
print(CO,CO2)
show()
#close()





<IPython.core.display.Javascript object>

[0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129 0.000129
 0.000129 0.000129 0.000129] [1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15
 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15
 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15 1.e-15
 1.e-15 1.e-15 1.e-15 1.e-15

# Plot Transmission Spectrum Model and Data 


In [194]:
#finally doing some plotting
#and the usual matplotlib shenanigans
ymin=np.min(y_binned)*1E2*0.99
ymax=np.max(y_binned)*1E2*1.01
fig1, ax=subplots()
xlabel('$\lambda$ ($\mu$m)',fontsize=18)
ylabel('(R$_{p}$/R$_{*}$)$^{2} \%$',fontsize=18)
minorticks_on()
errorbar(wlgrid+0.05, y_meas*100, yerr=err*100, xerr=None, fmt='Dk', label='Data')
#print(y_meas*100)
plot(wlgrid, y_binned*1E2,'ob',label='Binned Model')
#print(wlgrid, ((y_binned*1E6) - (0.9984035616629374*1e6)))
plot(1E4/wno, y_mod*1E2, label='Model')

from scipy.ndimage import gaussian_filter1d
cent_lambda, ed = np.loadtxt('g9_10X_non-quenched.txt',unpack=True)
ed = gaussian_filter1d(ed*100,15)
www = 1E4/wno
print(www)
idx_ch = np.where((www>3)&(www<5))[0]
idx_cl = np.where((cent_lambda>3)&(cent_lambda<5))[0]
med_ch = np.median(y_mod[idx_ch]*1E2)
med_f = np.median(ed[idx_cl])
print(idx_ch)
plot(cent_lambda, ed - med_f + med_ch, label='fortney no-quenched')
fout = open('no-quench.dat','w')
for i in range(len(cent_lambda)):
    fout.write('{0:.10f} {1:.10f}\n'.format(cent_lambda[i], ed[i] - med_f + med_ch))
fout.close()
print(ed - med_f + med_ch)

cent_lambda, ed = np.loadtxt('g9_10X_quenched.txt',unpack=True)
ed = gaussian_filter1d(ed*100,15)
www = 1E4/wno
print(www)
idx_ch = np.where((www>3)&(www<5))[0]
idx_cl = np.where((cent_lambda>3)&(cent_lambda<5))[0]
med_ch = np.median(y_mod[idx_ch]*1E2)
med_f = np.median(ed[idx_cl])
print(idx_ch)
plot(cent_lambda, ed - med_f + med_ch,label='fortney quenched')
fout = open('quench.dat','w')
for i in range(len(cent_lambda)):
    fout.write('{0:.10f} {1:.10f}\n'.format(cent_lambda[i], ed[i] - med_f + med_ch))
fout.close()
print(ed - med_f + med_ch)
from scipy.interpolate import interp1d
f_fortney = interp1d(cent_lambda,(ed - med_f + med_ch)/100)

#ww,mm = np.loadtxt('model_eq.dat',unpack=True)
#plot(ww,mm,label='Non-quenched model')
#fout=open('model_eq.dat','w')
#for i in range(len(wno)):
#    fout.write('{0:.4f} {1:.10f}\n'.format(1E4/wno[i],y_mod[i]*1E2))
#fout.close()
ax.set_xscale('log')
ax.set_xticks([0.3, 0.5,0.8,1,1.4, 2, 3, 4, 5])
#ax.axis([0.3,5,ymin,ymax])
ax.axis([3,5,ymin,ymax])
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.tick_params(length=10,width=1,labelsize='large',which='major')
legend(frameon=False)
# simulate data:
fout = open('chimera_toi199.dat','w')
s = np.array([])
serr = np.array([])
print('nerr:',len(err))
print('nbin:',len(y_binned))
for i in range(len(y_binned)):
    serr = np.append(serr,err[i])
    sim_data = np.random.normal(f_fortney(wlgrid[i]),err[i],1)[0]
    s = np.append(s,sim_data)
    fout.write('{0:.4f} {1:.10f} {2:.10f}\n'.format(wlgrid[i],sim_data,err[i]))
errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
print(s*100,serr*100)
fout.close()
savefig('./plots/transmission_spectrum_WFC3_FREE.pdf',fmt='pdf')
show()
#close()


<IPython.core.display.Javascript object>

[5.97278451 5.91364803 5.85509706 5.7971258  5.73972851 5.68289952
 5.62663319 5.57092395 5.51576628 5.46115474 5.4070839  5.35354841
 5.30054298 5.24806236 5.19610135 5.1446548  5.09371762 5.04328477
 4.99335126 4.94391214 4.89496251 4.84649754 4.79851241 4.75100239
 4.70396276 4.65738887 4.61127611 4.56561991 4.52041576 4.47565917
 4.43134571 4.387471   4.34403069 4.30102049 4.25843613 4.21627339
 4.17452811 4.13319615 4.09227341 4.05175586 4.01163946 3.97192026
 3.93259432 3.89365774 3.85510667 3.8169373  3.77914584 3.74172855
 3.70468174 3.66800172 3.63168487 3.5957276  3.56012633 3.52487756
 3.48997778 3.45542354 3.42121143 3.38733805 3.35380005 3.32059411
 3.28771694 3.25516528 3.22293593 3.19102567 3.15943136 3.12814986
 3.09717808 3.06651295 3.03615143 3.00609053 2.97632725 2.94685867
 2.91768185 2.88879391 2.86019199 2.83187326 2.80383491 2.77607417
 2.74858828 2.72137454 2.69443024 2.66775271 2.64133932 2.61518744
 2.5892945  2.56365792 2.53827517 2.51314373 2.48826112 2.4636

# Now NIRSpec + NIRISS/SOSS

In [186]:
# TOI-199 transpec, nirspec:
wlgrid1, y_meas1, err1 = np.loadtxt('chimera_toi199.dat',unpack=True,usecols=(0,1,2))
#wlgrid1 = ((wl+wu)*0.5)*1e-4
#y_meas1 = y_meas*1e-6
#err1 = err*1e-6
# Same for SOSS:
wl, wu, y_meas, err = np.loadtxt('soss-precisions.dat',unpack=True,usecols=(0,1,2,3))
wlgrid2 = ((wl+wu)*0.5)*1e-4
y_meas2 = y_meas*1e-6
err2 = err*1e-6
# combine:
wlgrid = np.append(wlgrid2,wlgrid1)
y_meas = np.append(y_meas2,y_meas1)
err = np.append(err2,err1)
# order in wavelength (chimera kinds of hates data is not)
idx = np.argsort(wlgrid)
wlgrid = wlgrid[idx]
y_meas = y_meas[idx]
err = err[idx]
print(wlgrid)

[0.75       0.85124865 0.88519982 0.91915155 0.95309985 0.98704862
 1.02100272 1.05494862 1.08889988 1.12285164 1.15679742 1.19075155
 1.22469988 1.25864859 1.29260284 1.32654856 1.36050275 1.39445158
 1.42839985 1.46235455 1.4963027  1.53024866 1.56420572 1.59815162
 1.63209983 1.66605446 1.70000273 1.73394863 1.76790282 1.8018539
 1.8357998  1.86975443 1.90369983 1.93765159 1.9716056  2.00554869
 2.03950276 2.07345086 2.10740286 2.14135145 2.17530283 2.20925166
 2.24320268 2.27715175 2.31109984 2.34505453 2.37900274 2.41294852
 2.44690576 2.48085154 2.51479987 2.54875444 2.58269972 2.61665184
 2.65060274 2.68455145 2.71850272 2.75245143 2.78640342 2.904
 2.9715     3.039      3.1064     3.1738     3.2411     3.3083
 3.3754     3.4425     3.5095     3.5765     3.6434     3.9101
 3.9765     4.0429     4.1092     4.1754     4.2415     4.3075
 4.3734     4.4391     4.5048     4.5704     4.6358     4.7011
 4.7663     4.8314     4.8963     4.9611    ]


In [187]:
#calling forward model, fx. This will produce the (Rp/Rstar)^2 spectrum....
y_binned,y_mod,wno,atm=fx_trans_free(x,wlgrid,gas_scale, xsects)  #returns binned model spectrum, higher res model spectrum, wavenumber grid, and vertical abundance profiles from chemistry
print('DONE')

DONE


In [189]:
from matplotlib.pyplot import *
from matplotlib.ticker import FormatStrFormatter
%matplotlib notebook
#finally doing some plotting
#and the usual matplotlib shenanigans
ymin=np.min(y_binned)*1E2*0.99
ymax=np.max(y_binned)*1E2*1.01
fig1, ax=subplots()
xlabel('$\lambda$ ($\mu$m)',fontsize=18)
ylabel('(R$_{p}$/R$_{*}$)$^{2} \%$',fontsize=18)
minorticks_on()
errorbar(wlgrid+0.05, y_meas*100, yerr=err*100, xerr=None, fmt='Dk', label='Data')
print(y_meas*100)
#plot(wlgrid, y_binned*1E2,'ob',label='Binned Model')
print(wlgrid, ((y_binned*1E6) - (0.9984035616629374*1e6)))
#plot(1E4/wno, y_mod*1E2, label='Model')
ax.set_xscale('log')
ax.set_xticks([0.3, 0.5,0.8,1,1.4, 2, 3, 4, 5])
ax.axis([0.3,5,ymin,ymax])
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.tick_params(length=10,width=1,labelsize='large',which='major')
legend(frameon=False)

from scipy.ndimage import gaussian_filter1d
cent_lambda, ed = np.loadtxt('g9_10X_non-quenched.txt',unpack=True)
ed = gaussian_filter1d(ed*100,15)
www = 1E4/wno
print(www)
idx_ch = np.where((www>3)&(www<5))[0]
idx_cl = np.where((cent_lambda>3)&(cent_lambda<5))[0]
med_ch = np.median(y_mod[idx_ch]*1E2)
med_f = np.median(ed[idx_cl])
print(idx_ch)
plot(cent_lambda, ed - med_f + med_ch, label='fortney no-quenched')
print(ed - med_f + med_ch)

cent_lambda, ed = np.loadtxt('g9_10X_quenched.txt',unpack=True)
ed = gaussian_filter1d(ed*100,15)
www = 1E4/wno
print(www)
idx_ch = np.where((www>3)&(www<5))[0]
idx_cl = np.where((cent_lambda>3)&(cent_lambda<5))[0]
med_ch = np.median(y_mod[idx_ch]*1E2)
med_f = np.median(ed[idx_cl])
print(idx_ch)
plot(cent_lambda, ed - med_f + med_ch,label='fortney quenched')
print(ed - med_f + med_ch)
from scipy.interpolate import interp1d
f_fortney = interp1d(cent_lambda,(ed - med_f + med_ch)/100)


# simulate data:
fout = open('chimera_toi199-2.dat','w')
s = np.array([])
serr = np.array([])
print('nerr:',len(err))
print('nbin:',len(y_binned))
for i in range(len(y_binned)):
    serr = np.append(serr,err[i])
    sim_data = np.random.normal(f_fortney(wlgrid[i]),err[i],1)[0]
    s = np.append(s,sim_data)
    fout.write('{0:.4f} {1:.10f} {2:.10f}\n'.format(wlgrid[i],sim_data,err[i]))
errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
#errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
#print(s*100,serr*100)
#fout.close()
errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
print(s*100,serr*100)
fout.close()
savefig('./plots/transmission_spectrum_WFC3_FREE.pdf',fmt='pdf')
show()
#close()


<IPython.core.display.Javascript object>

[0.9546138  0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.97591991
 0.97609214 0.97566237 0.97514599 0.98609877 0.98759154 0.99294206
 0.98438228 0.98723803 0.98012246 0.98143272 0.9810287  0.97347991
 0.97448756 0.97010402 0.9673368  0.96501532 0.96903917 0.96830608
 0.96368804 0.95760514 0.96456478 0.96962388 0.96611207 0.96936845
 0.97126004 0.96855305 0.96149428 0.9619781 ]
[0.75       0.85124865 0.88519982 0.91915155 0.95309985 0.98704862
 1.02100272 1.0549486

[0.958345   0.96113088 0.96393744 0.96163769 0.96576972 0.97003196
 0.97156648 0.96081111 0.9593455  0.97813806 0.97981922 0.9775603
 0.96737667 0.95898172 0.96219692 0.97477834 0.98145039 0.98818012
 0.97889726 0.97493269 0.97445215 0.97017246 0.96666592 0.9635668
 0.97564996 0.97870296 0.98859561 0.98059811 0.98284553 0.98168978
 0.98436151 0.98154593 0.97520709 0.97467427 0.97377594 0.97124787
 0.96910245 0.96485985 0.96674774 0.97157322 0.97658843 0.98104745
 0.98180544 0.99351865 0.99122231 0.99247188 0.99074362 0.98427097
 0.97986062 0.97620367 0.97776453 0.98204943 0.98976752 0.98610039
 0.9825507  0.99065156 0.98507873 0.97959699 0.97573227 0.97661114
 0.97345858 0.97611207 0.97344156 0.98407017 0.98584669 0.99172764
 0.98480016 0.98796706 0.98175111 0.9835245  0.98060285 0.97689817
 0.9734326  0.9746865  0.96696248 0.96573844 0.96705617 0.9733206
 0.96223983 0.96096639 0.97052707 0.96943631 0.96788007 0.97139033
 0.9727805  0.9713107  0.96311305 0.96760998] [0.0255   0.001962 

## What about NIRSpec + HST?

In [172]:
# TOI-199 transpec, nirspec:
wl, wu, y_meas, err = np.loadtxt('sim_data_toi199.dat',unpack=True,usecols=(0,1,2,3))
wlgrid1 = ((wl+wu)*0.5)*1e-4
y_meas1 = y_meas*1e-6
err1 = err*1e-6
# Same for SOSS:
wlgrid2 = np.linspace(1.1515,1.624,10)
y_meas2 = np.zeros(len(wlgrid2)) + np.median(y_meas*1e-6)
err2 = np.zeros(len(wlgrid2)) + 27.85*1e-6
# combine:
wlgrid = np.append(wlgrid2,wlgrid1)
y_meas = np.append(y_meas2,y_meas1)
err = np.append(err2,err1)
# order in wavelength (chimera kinds of hates data is not)
idx = np.argsort(wlgrid)
wlgrid = wlgrid[idx]
y_meas = y_meas[idx]
err = err[idx]
print(wlgrid)
print(wlgrid2)

[0.75      1.1515    1.204     1.2565    1.309     1.3615    1.414
 1.4665    1.519     1.5715    1.624     2.9039895 2.97152   3.0389935
 3.106408  3.1737615 3.2410525 3.3082785 3.375438  3.442528  3.5095475
 3.576494  3.643366  3.910061  3.9765255 4.042902  4.1091865 4.175377
 4.241471  4.3074645 4.373355  4.439139  4.504813  4.5703725 4.635814
 4.701133  4.7663255 4.831386  4.8963095 4.961091 ]
[1.1515 1.204  1.2565 1.309  1.3615 1.414  1.4665 1.519  1.5715 1.624 ]


In [173]:
#calling forward model, fx. This will produce the (Rp/Rstar)^2 spectrum....
y_binned,y_mod,wno,atm=fx_trans_free(x,wlgrid,gas_scale, xsects)  #returns binned model spectrum, higher res model spectrum, wavenumber grid, and vertical abundance profiles from chemistry
print('DONE')

DONE


In [174]:
from matplotlib.pyplot import *
from matplotlib.ticker import FormatStrFormatter
%matplotlib notebook
#finally doing some plotting
#and the usual matplotlib shenanigans
ymin=np.min(y_binned)*1E2*0.99
ymax=np.max(y_binned)*1E2*1.01
fig1, ax=subplots()
xlabel('$\lambda$ ($\mu$m)',fontsize=18)
ylabel('(R$_{p}$/R$_{*}$)$^{2} \%$',fontsize=18)
minorticks_on()
errorbar(wlgrid+0.05, y_meas*100, yerr=err*100, xerr=None, fmt='Dk', label='Data')
print(y_meas*100)
plot(wlgrid, y_binned*1E2,'ob',label='Binned Model')
print(wlgrid, ((y_binned*1E6) - (0.9984035616629374*1e6)))
plot(1E4/wno, y_mod*1E2, label='Model')
ax.set_xscale('log')
ax.set_xticks([0.3, 0.5,0.8,1,1.4, 2, 3, 4, 5])
ax.axis([0.3,5,ymin,ymax])
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.tick_params(length=10,width=1,labelsize='large',which='major')
legend(frameon=False)
# simulate data:
fout = open('chimera_toi199-hst-f.dat','w')
"""
s = np.array([])
serr = np.array([])
print('nerr:',len(err))
print('nbin:',len(y_binned))
for i in range(len(y_binned)):
    serr = np.append(serr,err[i])
    sim_data = np.random.normal(y_binned[i],err[i],1)[0]
    s = np.append(s,sim_data)
    fout.write('{0:.4f} {1:.10f} {2:.10f}\n'.format(wlgrid[i],sim_data,err[i]))
"""
s = np.array([])
serr = np.array([])
print('nerr:',len(err))
print('nbin:',len(y_binned))
for i in range(len(y_binned)):
    serr = np.append(serr,err[i])
    sim_data = np.random.normal(f_fortney(wlgrid[i]),err[i],1)[0]
    s = np.append(s,sim_data)
    fout.write('{0:.4f} {1:.10f} {2:.10f}\n'.format(wlgrid[i],sim_data,err[i]))
#errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
#print(s*100,serr*100)
#fout.close()
errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
print(s*100,serr*100)
fout.close()
savefig('./plots/transmission_spectrum_WFC3_FREE.pdf',fmt='pdf')
show()
#close()

<IPython.core.display.Javascript object>

[0.998403  1.3389515 1.3389515 1.3389515 1.3389515 1.3389515 1.3389515
 1.3389515 1.3389515 1.3389515 1.3389515 1.348984  1.350337  1.350193
 1.350398  1.359747  1.359593  1.35842   1.356982  1.359552  1.352809
 1.350842  1.351509  1.345255  1.346664  1.339016  1.338441  1.338887
 1.337393  1.337632  1.331732  1.329854  1.335099  1.327271  1.3256
 1.333917  1.335285  1.333322  1.338     1.332205 ]
[0.75      1.1515    1.204     1.2565    1.309     1.3615    1.414
 1.4665    1.519     1.5715    1.624     2.9039895 2.97152   3.0389935
 3.106408  3.1737615 3.2410525 3.3082785 3.375438  3.442528  3.5095475
 3.576494  3.643366  3.910061  3.9765255 4.042902  4.1091865 4.175377
 4.241471  4.3074645 4.373355  4.439139  4.504813  4.5703725 4.635814
 4.701133  4.7663255 4.831386  4.8963095 4.961091 ] [-988456.57215768 -988419.08365761 -988389.19025465 -988469.90309994
 -988407.74485017 -988305.35976639 -988304.36669378 -988354.66648715
 -988379.43340266 -988426.65831139 -988335.94500633 -988287.

# Self-consistent modelling

Before I tried retrieving a transpec model computed by Fortney; what if I try retrieving a model actually generated by CHIMERA? Let's try it:

In [None]:
from matplotlib.pyplot import *
from matplotlib.ticker import FormatStrFormatter
%matplotlib notebook
#finally doing some plotting
#and the usual matplotlib shenanigans
ymin=np.min(y_binned)*1E2*0.99
ymax=np.max(y_binned)*1E2*1.01
fig1, ax=subplots()
xlabel('$\lambda$ ($\mu$m)',fontsize=18)
ylabel('(R$_{p}$/R$_{*}$)$^{2} \%$',fontsize=18)
minorticks_on()
errorbar(wlgrid+0.05, y_meas*100, yerr=err*100, xerr=None, fmt='Dk', label='Data')
print(y_meas*100)
#plot(wlgrid, y_binned*1E2,'ob',label='Binned Model')
print(wlgrid, ((y_binned*1E6) - (0.9984035616629374*1e6)))
#plot(1E4/wno, y_mod*1E2, label='Model')
ax.set_xscale('log')
ax.set_xticks([0.3, 0.5,0.8,1,1.4, 2, 3, 4, 5])
ax.axis([0.3,5,ymin,ymax])
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.tick_params(length=10,width=1,labelsize='large',which='major')
legend(frameon=False)

from scipy.ndimage import gaussian_filter1d
cent_lambda, ed = np.loadtxt('g9_10X_non-quenched.txt',unpack=True)
ed = gaussian_filter1d(ed*100,15)
www = 1E4/wno
print(www)
idx_ch = np.where((www>3)&(www<5))[0]
idx_cl = np.where((cent_lambda>3)&(cent_lambda<5))[0]
med_ch = np.median(y_mod[idx_ch]*1E2)
med_f = np.median(ed[idx_cl])
print(idx_ch)
plot(cent_lambda, ed - med_f + med_ch, label='fortney no-quenched')
print(ed - med_f + med_ch)

cent_lambda, ed = np.loadtxt('g9_10X_quenched.txt',unpack=True)
ed = gaussian_filter1d(ed*100,15)
www = 1E4/wno
print(www)
idx_ch = np.where((www>3)&(www<5))[0]
idx_cl = np.where((cent_lambda>3)&(cent_lambda<5))[0]
med_ch = np.median(y_mod[idx_ch]*1E2)
med_f = np.median(ed[idx_cl])
print(idx_ch)
plot(cent_lambda, ed - med_f + med_ch,label='fortney quenched')
print(ed - med_f + med_ch)
from scipy.interpolate import interp1d
www,ddd = np.loadtxt('final_retrieval_results.dat',unpack=True)
f_fortney = interp1d(cent_lambda,(ed - med_f + med_ch)/100)


# simulate data:
fout = open('chimera_toi199-2.dat','w')
s = np.array([])
serr = np.array([])
print('nerr:',len(err))
print('nbin:',len(y_binned))
for i in range(len(y_binned)):
    serr = np.append(serr,err[i])
    sim_data = np.random.normal(f_fortney(wlgrid[i]),err[i],1)[0]
    s = np.append(s,sim_data)
    fout.write('{0:.4f} {1:.10f} {2:.10f}\n'.format(wlgrid[i],sim_data,err[i]))
errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
#errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
#print(s*100,serr*100)
#fout.close()
errorbar(wlgrid,s*100,yerr=serr*100,fmt='o',label='simulated data')
print(s*100,serr*100)
fout.close()
savefig('./plots/transmission_spectrum_WFC3_FREE.pdf',fmt='pdf')
show()
#close()


# Explore cloud contribution

In [None]:
from matplotlib.pyplot import *
from matplotlib.ticker import FormatStrFormatter

#log Gas abundances
H2O=-4.
CH4=-10.
CO=-4.  
CO2=-10. 
NH3=-10.  
N2=-4.   
HCN=-10.   
H2S=-5.  
PH3=-6.  
C2H2=-10. 
C2H6=-10. 
Na=-6.    
K=-7.   
TiO=-15.   
VO=-15.   
FeH=-15.  
H=-10.     
em=-10. 
hm=-10.

#Ackerman & Marley 2001 Cloud parameters--physically motivated with Mie particles
logKzz=7 #log Kzz (cm2/s)--valid range: 2 - 11 -- higher values make larger particles
fsed=2.0 #sediminetation efficiency--valid range: 0.5 - 5--lower values make "puffier" more extended cloud 
logPbase=-1.0  #cloud base pressure--valid range: -6.0 - 1.5
logCldVMR=-15.5 #cloud condensate base mixing ratio (e.g, see Fortney 2005)--valid range: -15 - -2.0

#simple 'grey+rayleigh' parameters just in case you don't want to use a physically motivated cloud
#(most are just made up anyway since we don't really understand all of the micro-physics.....)
logKcld = -30  #uniform in altitude and in wavelength "grey" opacity (it's a cross-section)--valid range: -50 - -10 
logRayAmp =2  #power-law haze amplitude (log) as defined in des Etangs 2008 "0" would be like H2/He scat--valid range: -30 - 3 
RaySlope = 4  #power law index 4 for Rayleigh, 0 for "gray".  Valid range: 0 - 6

#10 bar radiuss scaling param (only used in transmission)
xRp=0.991

#stuffing all variables into state vector array
x=np.array([Tirr, logKir,logg1,Tint, 0, 0, 0,0, Rp*xRp, Rstar, M, logKzz, fsed,logPbase,logCldVMR, logKcld, logRayAmp, RaySlope])
#gas scaling factors to mess with turning on various species
#set to "0" to turn off a gas. Otherwise keep set at 1
#thermochemical gas profile scaling factors
# 0   1    2    3   4    5    6     7    8    9   10    11   12   13    14   15   16   17   18  19 20   21
#H2O  CH4  CO  CO2 NH3  N2   HCN   H2S  PH3  C2H2 C2H6  Na    K   TiO   VO   FeH  H    H2   He   e- h-  mmw
gas_scale=np.array([H2O,CH4,CO,CO2,NH3,N2,HCN,H2S,PH3,C2H2,C2H6,Na,K,TiO,VO ,FeH,H,-50.,-50.,em, hm,-50.]) #

y_binned,y_mod,wno,atm=fx_trans_free(x,wlgrid,gas_scale, xsects)  #returns binned model spectrum, higher res model spectrum, wavenumber grid, and vertical abundance profiles from chemistry

gas_scale2=np.ones(len(gas_scale))*gas_scale
gas_scale2[0:-1]=-50
y_binned2,y_mod2,wno2,atm2=fx_trans_free(x,wlgrid,gas_scale2, xsects)  #returns binned model spectrum, higher res model spectrum, wavenumber grid, and vertical abundance profiles from chemistry


ymin=np.min(y_binned)*1E2*0.99
ymax=np.max(y_binned)*1E2*1.01
fig1, ax=subplots()
xlabel('$\lambda$ ($\mu$m)',fontsize=18)
ylabel('(R$_{p}$/R$_{*}$)$^{2} \%$',fontsize=18)
minorticks_on()
errorbar(wlgrid, y_meas*100, yerr=err*100, xerr=None, fmt='Dk', label='Data')
plot(wlgrid, y_binned*1E2,'ob',label='Binned Model')
plot(1E4/wno, y_mod*1E2, label='Model')
plot(1E4/wno, y_mod2*1E2,label='Cloud+H2/He Continuum')
ax.set_xscale('log')
ax.set_xticks([0.3, 0.5,0.8,1,1.4, 2, 3, 4, 5])
ax.axis([0.3,5,ymin,ymax])
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.tick_params(length=10,width=1,labelsize='large',which='major')
legend(frameon=False)
show()
#close()



# Time to Get to the "Retrieval" ------------------------------------- 

The "retrieval" is performed using the DYNESTY (https://dynesty.readthedocs.io/en/latest/index.html) nested sampling suite.  It's basically like all of the others (e.g., multinest, pymultnest, nestle etc.), though it's more flexible in terms of sampling methods optimized for certain numbers of parameters.  This example just uses "generic" settings.

In [None]:
#set up a "dynest" nested sampling run--see https://dynesty.readthedocs.io/en/latest/index.html
#a super cool useful comparision of all MCMC/Multinests out there...
#http://mattpitkin.github.io/samplers-demo/pages/samplers-samplers-everywhere/
#(feel free to mix and match samplers--probably not a bad idea...)
#for safty, just reloading everything again
import numpy as np
from matplotlib import pyplot as plt
import dynesty
from multiprocessing import Pool
from fm import *
import pickle
#load crosssections between wnomin and wnomax
xsecs=xsects_HST(6000,9400)  #make sure this range is *larger* than the data wavelength grid (but not by too much)
#6000 cm-1 (1.68 um) to 9400 cm-1 (1.06 um)

# Defining log-likelihood function
This computes "chi-square" log-likelihood function.  The input value is "theta" which is the same as "x"--the parameter state vector, though just for the parameters we care about. These can be a sub-set of the full "x" defined above.  The first block defines a bunch of parameters--same as what gets passed into fx--to a generic default value.  Thes parameter values get overridden with the values in the "theta (parameters to retrieve) vector. In this particular example only Tirr, H2O,CO,NH3,CH4, logKcld, xRp are retrieved.  All other values passed into fx are assigned the other values.  

In [None]:
#defining log-likelihood function
# log-likelihood
def loglike(theta):

    #setting default parameters---will be fixed to these values unless replaced with 'theta'
    #planet/star system params--xRp is the "Rp" free parameter, M right now is fixed, but could be free param
    Rp= 1.036#0.930#*x[4]# Planet radius in Jupiter Radii--this will be forced to be 10 bar radius--arbitrary (scaling to this is free par)
    Rstar=0.667#0.598   #Stellar Radius in Solar Radii
    M =2.034#1.78    #Mass in Jupiter Masses

    #TP profile params (3--Guillot 2010, Parmentier & Guillot 2013--see Line et al. 2013a for implementation)
    Tirr=1400#1500#x[0]#544.54 #terminator **isothermal** temperature--if full redistribution this is equilibrium temp
    logKir=-5  #TP profile IR opacity controlls the "vertical" location of the gradient
    logg1=0.0     #single channel Vis/IR opacity. Controls the delta T between deep T and TOA T
    Tint=0.
    
    #A&M Cloud parameters--includes full multiple scattering (for realzz) in both reflected and emitted light
    logKzz=7 #log Rayleigh Haze Amplitude (relative to H2)
    fsed=3.0 #haze slope--4 is Rayeigh, 0 is "gray" or flat.  
    logPbase=-1.0  #gray "large particle" cloud opacity (-35 - -25)
    logCldVMR=-25.0 #cloud fraction
    
    #simple 'grey+rayleigh' parameters--non scattering--just pure extinction
    logKcld = -40
    logRayAmp = -30
    RaySlope = 0
    
    H2O=-15.
    CH4=-15.
    CO=-15.  
    CO2=-15. 
    NH3=-15.  
    N2=-15.   
    HCN=-15.   
    H2S=-15.  
    PH3=-15.  
    C2H2=-15. 
    C2H6=-15. 
    Na=-15.    
    K=-15.   
    TiO=-15.   
    VO=-15.   
    FeH=-15.  
    H=-15.     
    em=-15. 
    hm=-15.

    
    #unpacking parameters to retrieve (these override the fixed values above)
    Tirr, H2O,CO,NH3,CH4, logKcld, xRp=theta
   
    ##all values required by forward model go here--even if they are fixed
    x=np.array([Tirr, logKir,logg1, Tint,0,0,0,0, Rp*xRp, Rstar, M, logKzz, fsed,logPbase,logCldVMR, logKcld, logRayAmp, RaySlope])
    # 0   1    2    3   4    5    6     7    8    9   10    11   12   13    14   15   16   17   18  19 20   21
    #H2O  CH4  CO  CO2 NH3  N2   HCN   H2S  PH3  C2H2 C2H6  Na    K   TiO   VO   FeH  H    H2   He   e- h-  mmw
    gas_scale=np.array([H2O,CH4,CO,CO2,NH3,N2,HCN,H2S,PH3,C2H2,C2H6,Na,K,TiO,VO ,FeH,H,-50.,-50.,em, hm,-50.]) #
    foo=fx_trans_free(x,wlgrid,gas_scale,xsecs)
    y_binned=foo[0]
    
    loglikelihood=-0.5*np.sum((y_meas-y_binned)**2/err**2)  #your typical "quadratic" or "chi-square"
    
    return loglikelihood


# Defining the Prior Function
Nested samplers use this "cube" concept whereby all of the parameter values are transformed on the interval [0,1]. This makes the sampling more-or-less scale indepenent.  The prior_transform function maps those back onto the actual parameter value ranges we want.  In this case, the prior ranges are all uniform over the range as defined here. Live points will be drawn from this hypercube uniformly.  These "live point" parameter values are then assesed by the log-likelihood function.    

In [None]:
#defining prior cube (cube is a standard multi-nest way of doing priors)
def prior_transform(utheta):
    Tirr, H2O,CO,CO2,CH4, logKcld, xRp=utheta
    #uniform prior ranges--each "variable", say Tirr is sampled over the interval [0,1]--the numbers here transform that
    Tirr = 1900 * Tirr + 100  #Tirr uniform from 400 - 3000K (add the lower value to the multiplier to get upper bound)
    H2O=12*H2O-12
    CO=12*CO-12
    CO2=12*CO2-12
    CH4=12*CH4-12
    logKcld=20*logKcld-45
    xRp=1*xRp+0.5

    return Tirr, H2O,CO,CO2,CH4, logKcld, xRp




# Loading in Data and Setting up Nested Sampling Parameters
This segment first loads in the "3 column" data file...Here it is just a pickle, but this could be replaced with a 3-column ascii file read in--wavelength grid, data values, error bars.  The other knobs are self-explanatory.  The number of live points is "problem dependent".  Some problems can get away with less. I wouldn't go below 100.  It's safer to use more (e.g., just like walkers in emcee).  For "real science" I prefer 1000+ live points to make sure the posterior is well sampled and no modes are missed.  Of course, this takes longer to run, but no sense in getting the the wrong answer faster!

In [None]:
#setting up other dynesty run params and loading in the data
#WASP43b transmission spectrum...a 3 column ascii file
wlgrid, y_meas, err=np.loadtxt('w43b_trans.txt').T
outname='./OUTPUT/dyn_output_trans_WFC3_FREE.pic'  #dynesty output file name (saved as a pickle)
Nparam=7  #number of parameters--make sure it is the same as what is in prior and loglike
Nproc=4  #number of processors for multi processing--best if you can run on a 12 core+ node or something
Nlive=500 #number of nested sampling live points


# Running Nested Sampler
This calls the Nested Sampler function to compute the posterior.  This may take a few hours depending on the number of parameters and number of live points.

In [None]:
#running the "standard" nested sampler.  Again, see https://dynesty.readthedocs.io/en/latest/index.html for details
#depending on the number of live points (I like lots, usually 1000+, but I'm paranoid), number of params, and number of
#processors on your computer, this could take some time. -- your computer will make loud noises..1-2 hours with 4 cores
#in real life, you should probably run this on a multi-core machine (8+)
#in really real life, you should probably use pymultinest...it tends to "converge" faster with fewer likelihood
#evaluations than does dynest, even with the same number of live points (see: https://mattpitkin.github.io/samplers-demo/pages/samplers-samplers-everywhere/)
#e.g., when I ran this with 56 cores (2x28 core nodes) with multinest it took ~20 min.
import time
pool = Pool(processes=Nproc)
dsampler = dynesty.NestedSampler(loglike, prior_transform, ndim=Nparam,
                                        bound='multi', sample='auto', nlive=Nlive,
                                        update_interval=3., pool=pool, queue_size=Nproc)
#this executes and runs it
t1=time.time()
dsampler.run_nested()
t2=time.time()
print("Run Time:", t2-t1)
#extracting results from sampler object
dres = dsampler.results
#dumping as a pickle
pickle.dump(dres,open(outname,'wb'))  
#some real time dynesty output/status will pop up down here (this one took 51931 lnL calls over 4309s = 1.2 hrs)

# Plotting Corner Plot
Plots the corner plots and spectral fit plot. Note: All of the plotting below can be used for a "previously" generated run. Just start here and load in the sampler output.

In [None]:
#plotting dynest runs corner plot
from matplotlib import pyplot as plt
from dynesty import plotting as dyplot
import pickle


labels=['Tirr', 'H2O', 'CO', 'CO2', 'CH4' ,'logKcld','xRp']

#import past run 
#samples=pickle.load(open('dyn_output_100LP.pic','rb')) #an example 100 live point run
samples=pickle.load(open('./OUTPUT/dyn_output_trans_WFC3_FREE.pic','rb')) 
#samples=pickle.load(open('dyn_output_1000LP.pic','rb'))  #an example 1000 live point run

#printing evidence:
print('ln(Z)= ', samples.logz[-1])

# corner plot
#NOTE...DEFAULT CONFIDENCE INTERVAL IS THE 95%!!!
fig, axes = dyplot.cornerplot(samples,smooth=0.05, color='blue',show_titles=True, labels=labels,title_kwargs={'y': 1.04}, fig=plt.subplots(7, 7, figsize=(12, 12)))
plt.savefig('./plots/Dynesty_WFC3_free_stair_pairs.pdf',fmt='pdf')
plt.show()
#plt.close()




# Plotting Spectral Fits
Generating spectra from parameters of a subset of samples drawn from the posterior.  These spectra, as always, are then summarized with their median, 1-, 2-, sigma confidence intervals.  

In [None]:
import numpy as np
xsecs=xsects_HST(2000, 30000)

Nspectra=200

#loading in data again just to be safe
wlgrid, y_meas, err=np.loadtxt('w43b_trans.txt').T


#setting up default parameter values--SET THESE TO SAME VALUES AS IN LOG-LIKE FUNCTION
#planet/star system params--xRp is the "Rp" free parameter, M right now is fixed, but could be free param
#setting default parameters---will be fixed to these values unless replaced with 'theta'
#planet/star system params--xRp is the "Rp" free parameter, M right now is fixed, but could be free param
Rp= 1.036#0.930#*x[4]# Planet radius in Jupiter Radii--this will be forced to be 10 bar radius--arbitrary (scaling to this is free par)
Rstar=0.667#0.598   #Stellar Radius in Solar Radii
M =2.034#1.78    #Mass in Jupiter Masses

#TP profile params (3--Guillot 2010, Parmentier & Guillot 2013--see Line et al. 2013a for implementation)
Tirr=1400#1500#x[0]#544.54 #terminator **isothermal** temperature--if full redistribution this is equilibrium temp
logKir=-1.5  #TP profile IR opacity controlls the "vertical" location of the gradient
logg1=-0.7     #single channel Vis/IR opacity. Controls the delta T between deep T and TOA T
Tint=200.

#A&M Cloud parameters--includes full multiple scattering (for realzz) in both reflected and emitted light
logKzz=7 #log Rayleigh Haze Amplitude (relative to H2)
fsed=3.0 #haze slope--4 is Rayeigh, 0 is "gray" or flat.  
logPbase=-1.0  #gray "large particle" cloud opacity (-35 - -25)
logCldVMR=-25.0 #cloud fraction

#simple 'grey+rayleigh' parameters--non scattering--just pure extinction
logKcld = -40
logRayAmp = -30
RaySlope = 0

H2O=-15.
CH4=-15.
CO=-15.  
CO2=-15. 
NH3=-15.  
N2=-15.   
HCN=-15.   
H2S=-15.  
PH3=-15.  
C2H2=-15. 
C2H6=-15. 
Na=-15.    
K=-15.   
TiO=-15.   
VO=-15.   
FeH=-15.  
H=-15.     
em=-15. 
hm=-15.


#weighting the posterior samples for appropriate random drawing
from dynesty import utils as dyfunc
samp, wts = samples.samples, np.exp(samples.logwt - samples.logz[-1])
samples2 = dyfunc.resample_equal(samp, wts)

#choosing random indicies to draw from properly weighted posterior samples
draws=np.random.randint(0, samples2.shape[0], Nspectra)
Nwno_bins=xsecs[2].shape[0]
y_mod_array=np.zeros((Nwno_bins, Nspectra))
y_binned_array=np.zeros((len(wlgrid), Nspectra))

for i in range(Nspectra):
    print(i)
    #make sure this is the same as in log-Like
    Tirr, H2O,CO,CO2,CH4, logKcld, xRp=samples2[draws[i],:]
    x=np.array([Tirr, logKir,logg1, Tint,0,0,0,0, Rp*xRp, Rstar, M, logKzz, fsed,logPbase,logCldVMR, logKcld, logRayAmp, RaySlope])
    print(samples.samples[draws[i],:])
    gas_scale=np.array([H2O,CH4,CO,CO2,NH3,N2,HCN,H2S,PH3,C2H2,C2H6,Na,K,TiO,VO ,FeH,H,-50.,-50.,em, hm,-50.]) #
    y_binned,y_mod,wno,atm=fx_trans_free(x,wlgrid,gas_scale,xsecs)   
    y_mod_array[:,i]=y_mod
    y_binned_array[:,i]=y_binned
    
#saving these arrays since it takes a few minutes to generate    
pickle.dump([wlgrid, y_meas, err, y_binned_array, wno, y_mod_array],open('./OUTPUT/spectral_samples_trans_dyn_wfc3_free.pic','wb'))



In [None]:
wlgrid, y_meas, err, y_binned_array, wno, y_mod_array=pickle.load(open('./OUTPUT/spectral_samples_trans_dyn_wfc3_free.pic','rb'))
y_median=np.zeros(wno.shape[0])
y_high_1sig=np.zeros(wno.shape[0])
y_high_2sig=np.zeros(wno.shape[0])
y_low_1sig=np.zeros(wno.shape[0])
y_low_2sig=np.zeros(wno.shape[0])

for i in range(wno.shape[0]):
    percentiles=np.percentile(y_mod_array[i,:],[4.55, 15.9, 50, 84.1, 95.45])
    y_low_2sig[i]=percentiles[0]
    y_low_1sig[i]=percentiles[1]
    y_median[i]=percentiles[2]
    y_high_1sig[i]=percentiles[3]
    y_high_2sig[i]=percentiles[4]
    
    
from matplotlib.pyplot import *
from matplotlib.ticker import FormatStrFormatter

ymin=np.min(y_meas)*1E2*0.995
ymax=np.max(y_meas)*1E2*1.005
fig1, ax=subplots()
xlabel('$\lambda$ ($\mu$m)',fontsize=14)
ylabel('(R$_{p}$/R$_{*}$)$^{2} \%$',fontsize=14)
minorticks_on()


#for i in range(20): plot(wlgrid, y_binned_array[:,i]*100.,alpha=0.5,color='red')
#for i in range(20): plot(1E4/wno, y_mod_array[:,i]*100.,alpha=0.5,color='red')

fill_between(1E4/wno[::-1],y_low_2sig[::-1]*100,y_high_2sig[::-1]*100,facecolor='r',alpha=0.5,edgecolor='None')  
fill_between(1E4/wno[::-1],y_low_1sig[::-1]*100,y_high_1sig[::-1]*100,facecolor='r',alpha=1.,edgecolor='None')  


errorbar(wlgrid, y_meas*100, yerr=err*100, xerr=None, fmt='Dk')
plot(1E4/wno, y_median*1E2)
ax.set_xscale('log')
ax.set_xticks([0.3, 0.5,0.8,1,1.4, 2, 3, 4, 5])
ax.axis([0.3,5.0,ymin,ymax])
ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.tick_params(length=5,width=1,labelsize='small',which='major')
savefig('./plots/dyn_transmission_spectrum_fits_WFC3_FREE.pdf',fmt='pdf')
show()
#close()

    

# Doing all of that again, but with PyMultiNest ------------------------------------- 
PyMultiNest (a python wrapper for MultiNest): Buchner et al. 2014 <br>
Paper: https://ui.adsabs.harvard.edu/abs/2014A%26A...564A.125B/abstract)<br>
GitHub: https://johannesbuchner.github.io/PyMultiNest/<br><br>
MultiNest: Feroz & Hobson 2008<br>
Paper: https://ui.adsabs.harvard.edu/abs/2009MNRAS.398.1601F/abstract<br>
GitHub:https://github.com/farhanferoz/MultiNest<br><br>
It is highly advised that you read the relevant sections of these papers and online documentation

# First have to install MultiNest (a bunch of Fortran, C) and PyMultiNest
While dynesty is easy to install and use, it's not quite as an efficient as a sampler as MultiNest/PyMultiNest (at least that I have found).  Here we will re-do the above retrieval, but with the PyMultiNest package. The MultiNest package is readily parallelizeble on any compute cluster or multicore desktop/laptop (using mpi).  Be forewarned, installing PyMultiNest can be a challenge as it is a combination of python, c, and fortran codes. Most compute cluster admins are able to install this package (both within slurm and torque).  Below I outline the steps that I took on my 2016 MacBook Pro with High Sierra (10.13.6) based on a combination of https://www.astrobetter.com/wiki/MultiNest+Installation+Notes and https://exoai.github.io/software/taurex/installation.

Assuming you have some version of MacPorts, execute the following in this order from a terminal command line (note this can take several hours):
1. sudo port install gcc5
2. sudo port select --set gcc mp-gcc5
3. sudo port install cmake
4. sudo port install openblas +gcc5
5. sudo port install openmpi +gcc5
6. sudo port select --set mpi openmpi-mp-fortran
7. git clone https://github.com/JohannesBuchner/MultiNest.git
8. cd MultiNest/build/
9. cmake ..
10. make
11. sudo make install
12. git clone https://github.com/JohannesBuchner/PyMultiNest.git 
13. cd PyMultiNest 
14. python setup.py install





# Time To Run It
OK, time to run. Sorry, you will have to leave the comfort and safety of this notebook, but I'll walk you through it here.  We will be using "mpirun" to run on your multiple processors on your laptop which I have not figured out how to do from a jupyter notebook (maybe you can use spawn or whatever, if your ambitious). 

# STEP 1
 Open the routine "call_pymultinest_transmission.py". Have a look, digest it, read the comments, etc.  It should look farily similar to the above dynesty fundtions (e.g., a loglikelihood, prior cube, etc.).  No need to modify anything for this. However, you would modify this script if you wanted to add/remove particular parameters or change the wavelength/wavenumber ranch of the run. 
 
# STEP 2 (really a pseudostep)
Go to the folder (in terminal) where you have downloaded the CHIMERA code (mainly, where "call_pymultinest_transmission.py" lives).  To run using one CPU, simpley type the following:

python call_pymultinest_transmission_wfc3_FREE.py

Wait a minute.  You should see some useless unimportant error messages pop out followed by "Cross-sections Loaded" followed shortely there after by


$*************************$<br>
MultiNest v3.10<br>
Copyright Farhan Feroz & Mike Hobson<br>
Release Jul 2015<br><br>
no. of live points =  500<br> 
dimensionality =    8<br> 
$*************************$ <br>
Starting MultiNest<br>
generating live points



$------------------$

Eventually (a few minutes) you whould see more output:

 live points generated, starting sampling
<br>
Acceptance Rate:                        0.998185<br>
Replacements:                                550<br>
Total Samples:                               551<br>
Nested Sampling ln(Z):            $**************$<br>
Acceptance Rate:                        0.977199<br>
Replacements:                                600<br>
Total Samples:                               614<br>
Nested Sampling ln(Z):            $**************$<br>
Acceptance Rate:                        0.965825<br>
Replacements:                                650<br>
Total Samples:                               673<br>
Nested Sampling ln(Z):            $**************$<br>
Acceptance Rate:                        0.939597<br>
Replacements:                                700<br>
Total Samples:                               745<br>
Nested Sampling ln(Z):            $**************$<br>



Do not be alarmed by the $*****$. It just means that there are too many digits to print out for the inital ln(Z) values.  Eventually normalish looking numbers (usually negative) print out here.  Ok, feel free to kill this now, because we will use mpi to make it faster.

# STEP 3
To do mpi, if everything installed correctly, you just need to type the following into the same command line: <br><br>
mpirun -np 4 python call_pymultinest_transmission_wfc3_FREE.py<br><br>
Note, you can use more than 4 cpu's/cores if your computer has them. Use as many as you have. Now, just wait. You should see similar output as for the single cpu, ubt at a faster rate.  For the default setup, with 4 cpu's, this takes $\sim$1hr 10 min (15452 calls). When it is done it will print out this:<br>

 ln(ev)=  -25.249086782326863      +/-  0.12562433089471531    
 Total Likelihood Evaluations:        15452<br>
 Sampling finished. Exiting MultiNest<br>
  analysing data from ./pmn$_$transmission/template_.txt<br>
  analysing data from ./pmn$_$transmission/template_.txt<br>
  analysing data from ./pmn$_$transmission/template_.txt<br>
  analysing data from ./pmn$_$transmission/template_.txt<br>




# STEP 4
When this is done, it is time to plot. Open the routine, "plot_PMN_transmission.py".  This will make the usual obnoxious corner plots, a reconstructed "TP" profile (which is kind of meaningless), and will generate sample spectra from random parameter vectors drawn from the posterior.  This last part takes some time.  When this is complete a nice 1-,2-sigma spectral spread plot will pop up.  Feel free to manipulate this to make plots to your liking.  Note, the spectral draws will be saved as a python pickle so that you can twiddle with plot adjustments without haveing to regenerate that each time.  

That's it! You are done!  So after having worked throught his notebook you should feel comfortable playing around with the forward model to gain an understanding of how each parameter influences the spectrum, comfortable running both the dynesty and pymultinest samplers, and plotting up standard output/results.  Good luck.  Note, there is no warrenty, if you get an unpysical answer, that's on you =)

Feel free to move onto the emission tutorial!





