### Time consuming of semblance

The main problem of semblance calculation is thar it very time consuming opeartion. This notebook is aimed to show few approaches to semblance calculation and redusing time.

In [1]:
import sys

import numpy as np
import matplotlib.pyplot as plt

sys.path.append('..')

from seismicpro.batchflow import ImagesBatch, Dataset, V, B, Pipeline, D
from seismicpro import FieldIndex, CustomIndex, SeismicDataset, seismic_plot, SeismicBatch

In [2]:
path_raw = '/data/H2_PAL/SEMB/H2_PAL_va_supergather_11.sgy'

index_raw = FieldIndex(name='raw', extra_headers=['offset', 'CDP'], path=path_raw)
ix_raw = CustomIndex(index_raw, index_name='CDP')

### Semblance calculation

For semblance calculation we use the following formula:
$$S = \frac{\sum^{k+N/2}_{k-N/2}(\sum^M_1 f_{ij})^2}{M \sum^{k+N/2}_{k-N/2}\sum^M_1 (f_{ij})^2} \text{ ,where }$$

* k - time sample
* N - window size
* M - number of traces
* f - value of amplitude

This secotion will contains few approaches for semblance calculation in order to reduce the time.

Frist approach is numba with 3 nested loops. Function ```_calc_semb_hard``` in utils that one can find [here](../seismicpro/src/semblance_utils.py).

In [3]:
dset = SeismicDataset(ix_raw, batch_class=SeismicBatch)

In [4]:
pipeline = (
    dset.p
    .load(fmt='segy', components='raw', tslice=slice(1500))
    .sort_traces(src='raw', dst='raw', sort_by='offset')
    .calculate_semblance('raw', 'semblance_hard', [1200, 6000], 
                         30, window=51, method='hard')
)
pipeline.run(1, n_iters=1, shuffle=1980, profile=True);

In [5]:
pipeline.elapsed_time

4.383702516555786

In [6]:
pipeline.show_profile_info(per_iter=True, detailed=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,calculate_semblance #2,4.166964,4.152993,139937300000000.0
1,load #0,0.170426,0.094868,139937300000000.0
1,sort_traces #1,0.021943,0.014503,139937300000000.0


Funciton with one numba loop and matrix operations. Function ```_calc_semb_hard_numba_mx```.

In [7]:
pipeline_v2 = (
    dset.p
    .load(fmt='segy', components='raw', tslice=slice(1500))
    .sort_traces(src='raw', dst='raw', sort_by='offset')
    .calculate_semblance('raw', 'semblance_numba_matrix', [1200, 6000], 
                         30, window=51, method='numba_matrix')
)
pipeline_v2.run(1, n_iters=1, shuffle=1980, profile=True);

In [8]:
pipeline_v2.elapsed_time

7.747872352600098

In [9]:
pipeline_v2.show_profile_info(per_iter=True, detailed=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,calculate_semblance #2,7.536926,7.523454,139937292035744
1,load #0,0.159981,0.137634,139937292035744
1,sort_traces #1,0.024984,0.016741,139937292035744


One loop with pure numpy matrix operations. Function ```_calc_semb_hard_matrix```.

In [10]:
pipeline_v3 = (
    dset.p
    .load(fmt='segy', components='raw', tslice=slice(1500))
    .sort_traces(src='raw', dst='raw', sort_by='offset')
    .calculate_semblance('raw', 'semblance_matrix', [1200, 6000], 
                         30, window=51, method='matrix')
)
pipeline_v3.run(1, n_iters=1, shuffle=1980, profile=True);

In [13]:
pipeline_v3.elapsed_time

12.361353397369385

In [14]:
pipeline_v3.show_profile_info(per_iter=True, detailed=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,calculate_semblance #2,12.175185,12.162147,139937300000000.0
1,load #0,0.137138,0.114964,139937300000000.0
1,sort_traces #1,0.024242,0.015421,139937300000000.0


To conclude we can say that the fastest method is to use numba with 3 nested loops. 