From [single_exposure_modeling_test](single_exposure_modeling_test.ipynb) we know that the bottleneck right now is resampling the spectrum. This notebook experiments with ways to do better.  First by using rust as a fast inner-loop, and then try an ML/generative model style approach.

## Resample spectrum directly

In [1]:
import numpy as np

from astropy.io import fits

from matplotlib import pyplot as plt

In [2]:
%load_ext Cython

In [55]:
%%cython --annotate

import numpy as np
cimport cython

@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
def integrate_spectrum_cy(double[::1] model, double[::1] model_wav, double[::1] target_wav_bins):
    assert len(model) == len(model_wav)
    
    cdef Py_ssize_t model_size = model.shape[0]
    
    dwav = np.diff(model_wav)
    cdef double[:] dwav_arr = dwav
    
    resampled = np.zeros(len(target_wav_bins) - 1)
    cdef Py_ssize_t resampled_size = target_wav_bins.shape[0] - 1
    cdef double[::1] resampled_arr = resampled
    
    cdef double lower_wave = 0
    cdef double upper_wave = 0
    cdef Py_ssize_t lower_mindex = 0
    cdef Py_ssize_t upper_mindex = 0
    
    cdef Py_ssize_t i, j
    
    for i in range(resampled_size):
        lower_wave = target_wav_bins[i]
        upper_wave = target_wav_bins[i+1]
    
        for j in range(model_size):
            if model_wav[j] > lower_wave:
                lower_mindex = j
                break
                
        for j in range(lower_mindex, model_size):
            if model_wav[j] > upper_wave:
                upper_mindex = j -1  # this -1 might not be right
                break
                
        # TODO: deal with the edge pixels properly
                
        for j in range(lower_mindex, upper_mindex):
            resampled_arr[i] += dwav_arr[j] * model[j]

    return resampled

In [56]:
%%time

v = 0
integrate_spectrum_cy(phoenix_model, phoenix_wave*(1+v/3e5), spec_wl_bins)

CPU times: user 776 ms, sys: 3.25 ms, total: 780 ms
Wall time: 779 ms


array([7.13169459e+13, 6.97866645e+13, 6.69200754e+13, ...,
       7.04013630e+13, 7.22418473e+13, 7.12038230e+13])

Well, clearly the rust compiler/numpy interaction needs some tweaking.  The above seems more reasonable.

In [61]:
spec_wl_bins

array([11571.856     , 11574.21226904, 11576.56853809, ...,
       16392.78246191, 16395.13873096, 16397.495     ])

In [71]:
msk = (spec_wl_bins[0]*0.9 < phoenix_wave)&(phoenix_wave < spec_wl_bins[-1]*1.1)
np.sum(msk), np.sum(msk)/len(msk)

(330508, 0.21063163744449145)

In [65]:
sub_phoenix_model = phoenix_model[msk]
sub_phoenix_wave = phoenix_wave[msk]

In [66]:
%%time

v = 100
integrate_spectrum_cy(sub_phoenix_model, sub_phoenix_wave*(1+v/3e5), spec_wl_bins)

CPU times: user 133 ms, sys: 9.47 ms, total: 142 ms
Wall time: 141 ms


array([6.92313694e+13, 7.03019160e+13, 7.07506249e+13, ...,
       5.33259905e+13, 6.31527975e+13, 7.36211956e+13])

And since we might be able to get away with sub-sampling by ~10 after LSF convolution:

In [72]:
sub_phoenix_model = np.ascontiguousarray(phoenix_model[msk][::10])
sub_phoenix_wave = np.ascontiguousarray(phoenix_wave[msk][::10])

In [76]:
%%timeit

v = 100
integrate_spectrum_cy(sub_phoenix_model, sub_phoenix_wave*(1+v/3e5), spec_wl_bins)

13.5 ms ± 664 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [77]:
%%timeit

integrate_spectrum_cy(sub_phoenix_model, sub_phoenix_wave, spec_wl_bins)

12 ms ± 502 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


That seems doable!

# Neural Network

Goal is basically the bottom half of an auto-encoder

* 5-10 fully connected layers, final layer has as many neurons as features
* deconvolutions, not exactly clear what the right thing to do is there, size of spectrum would grow, but dimension of channels should end at 1 but not clear how it grows/not in time 

# Rust version
This turned out to be way slower than Cython for some reason 🤷

In [2]:
%load_ext rustdef
%rustdef deps add ndarray@0.15.0 numpy@0.15.0

load rustdef


<IPython.core.display.Javascript object>

     Created library package
    Updating crates.io index
      Adding pyo3 v0.15.1 to dependencies.
             Features:
             + extension-module
             + indoc
             + macros
             + paste
             + pyo3-macros
             + unindent
             - abi3
             - abi3-py310
             - abi3-py36
             - abi3-py37
             - abi3-py38
             - abi3-py39
             - anyhow
             - auto-initialize
             - eyre
             - hashbrown
             - indexmap
             - inventory
             - multiple-pymethods
             - nightly
             - num-bigint
             - num-complex
             - serde
    Updating crates.io index
      Adding ndarray v0.15.0 to dependencies.
             Features:
             + std
             - approx
             - blas
             - cblas-sys
             - docs
             - libc
             - matrixmultiply-threading
             - rayon
             - rayon

The files below should be from [single_exposure_modeling_test](single_exposure_modeling_test.ipynb).  Assertion confirms the wavelengths are pre-sorted.

In [8]:
phoenix_model = fits.getdata('lte03300-0.00-0.0.PHOENIX-ACES-AGSS-COND-2011-HiRes.fits', 0).astype(float)
phoenix_wave = fits.getdata('WAVE_PHOENIX-ACES-AGSS-COND-2011.fits').astype(float) # angstrom

assert np.all(np.sort(phoenix_wave) == phoenix_wave)

This is an estimate based on the ranges in one of the data files

In [9]:
spec_wl_bins = np.linspace(1.1571856, 1.6397495, 2049) * 10000 # microns->angstroms

Note, the below does not check that the data is entirely inside the spectrum, nor that the various sizes actually match.  Nor that the resampled starts out initialized at 0.  Should wrap.

In [13]:
%%rustdef
use pyo3::prelude::*;
use numpy::{PyReadonlyArray1, PyArray1};

#[pyfn(m, "integrate_spectrum")]
fn integrate_spectrum_rust<'py>(
    py: Python<'py>,
    model: PyReadonlyArray1<'py, f64>,
    model_wav: PyReadonlyArray1<'py, f64>,
    target_wav_bins: PyReadonlyArray1<'py, f64>,
    resampled: &'py PyArray1<f64>
) -> PyResult<()>{
    let model_arr = model.as_array();
    let model_wav_arr = model_wav.as_array();
    let target_wav_bins_arr = target_wav_bins.as_array();
    let mut resampled_arr = unsafe { resampled.as_array_mut() };
    for i in 0..resampled_arr.len() {
        let lower_wave = target_wav_bins_arr[i];
        let upper_wave = target_wav_bins_arr[i+1];
        
        let mut lower_mindex = 0;
        for j in 0..model_wav_arr.len() {
            if model_wav_arr[j] > lower_wave {
                lower_mindex = j;
                break;
            }
        }
        
        let mut upper_mindex = 0;
        for j in lower_mindex..model_wav_arr.len() {
            if model_wav_arr[j] > upper_wave {
                upper_mindex = j - 1;
                break;
            }
        }
        
        // TODO: deal with the edge pixels properly
        
        for j in lower_mindex..upper_mindex {
            let dwav = model_wav_arr[j] - model_wav_arr[j-1];
            resampled_arr[i] += model_arr[j] * dwav;
        }
    }
    
    Ok(())
}

Building..
🔗 Found pyo3 bindings
🐍 Found CPython 3.11 at /usr/local/bin/python
[0m[0m[1m[32m   Compiling[0m rustdef_cell_03d1f08fdbd33b30996dbbd5005e029fdb76df20 v0.1.0 (/home/m31_jwst_2609er/.rustdef/rustdef_cell_03d1f08fdbd33b30996dbbd5005e029fdb76df20)
  --> rustdef_cell_03d1f08fdbd33b30996dbbd5005e029fdb76df20/src/lib.rs:11:11
   |
11 | #[pyfn(m, "integrate_spectrum")]
   |           ^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default




[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.48sl_03d1f08fdbd3...
📦 Built wheel for CPython 3.11 to /home/m31_jwst_2609er/.rustdef/target/wheels/rustdef_cell_03d1f08fdbd33b30996dbbd5005e029fdb76df20-0.1.0-cp311-cp311-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
Processing /home/m31_jwst_2609er/.rustdef/target/wheels/rustdef_cell_03d1f08fdbd33b30996dbbd5005e029fdb76df20-0.1.0-cp311-cp311-linux_x86_64.whl
Installing collected packages: ru

In [16]:
%%time

resampled_model = np.zeros(spec_wl_bins.size-1)
v = 0
integrate_spectrum_rust(phoenix_model, phoenix_wave*(1+v/3e5), spec_wl_bins, resampled_model)
resampled_model

CPU times: user 1min 34s, sys: 72.9 ms, total: 1min 34s
Wall time: 1min 34s


array([7.13169459e+13, 6.97866645e+13, 6.69200754e+13, ...,
       7.04013630e+13, 7.22418473e+13, 7.12038230e+13])

Well that's really slow.  Try switchg to Cython since that might be easier to optimize.