### Time consuming of semblance

The main problem of semblance calculation is thar it very time consuming opeartion. This notebook is aimed to show few approaches to semblance calculation and redusing time.

In [1]:
import sys

import numpy as np
import matplotlib.pyplot as plt

sys.path.append('..')

from seismicpro.batchflow import ImagesBatch, Dataset, V, B, Pipeline, D
from seismicpro import FieldIndex, CustomIndex, SeismicDataset, seismic_plot, SeismicBatch

In [2]:
path_raw = '/data/H2_PAL/SEMB/H2_PAL_va_supergather_11.sgy'

index_raw = FieldIndex(name='raw', extra_headers=['offset', 'CDP'], path=path_raw)
ix_raw = CustomIndex(index_raw, index_name='CDP')

### Semblance calculation

For semblance calculation we use the following formula:
$$S = \frac{\sum^{k+N/2}_{k-N/2}(\sum^M_1 f_{ij})^2}{M \sum^{k+N/2}_{k-N/2}\sum^M_1 (f_{ij})^2} \text{ ,where }$$

* k - time sample
* N - window size
* M - number of traces
* f - value of amplitude

This secotion will contains few approaches for semblance calculation in order to reduce the time.

Frist approach is numba with 3 nested loops. Function ```_calc_semb_hard``` in utils that one can find [here](../seismicpro/src/semblance_utils.py).

In [3]:
dset = SeismicDataset(ix_raw, batch_class=SeismicBatch)

In [4]:
pipeline = (
    dset.p
    .load(fmt='segy', components='raw', tslice=slice(1500))
    .sort_traces(src='raw', dst='raw', sort_by='offset')
    .calculate_semblance('raw', 'semblance_hard', [1200, 6000], 
                         30, window=51, method='hard')
)
pipeline.run(1, n_iters=1, shuffle=1980, profile=True);

In [5]:
pipeline.elapsed_time

4.427786350250244

In [6]:
pipeline.show_profile_info(per_iter=True, detailed=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,calculate_semblance #2,4.211385,4.19837,139795398802848
1,load #0,0.169323,0.09442,139795398802848
1,sort_traces #1,0.023449,0.016205,139795398802848


In [8]:
pipeline.show_profile_info(per_iter=True, detailed=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ncalls,tottime,cumtime
iter,action,id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,calculate_semblance #2,<method 'acquire' of '_thread.lock' objects>::/usr/lib/python3.6/threading.py::263::wait,4,4.196443,4.196443
1,calculate_semblance #2,<method 'acquire' of '_thread.lock' objects>::/usr/lib/python3.6/threading.py::1062::_wait_for_tstate_lock,1,0.000164,0.000164
1,calculate_semblance #2,wait::../seismicpro/batchflow/batchflow/decorators.py::239::wrap_with_threads,1,0.000119,4.196527
1,calculate_semblance #2,wrapped_method::../seismicpro/batchflow/batchflow/decorators.py::31::_action_wrapper,1,0.000101,4.198023
1,calculate_semblance #2,_call_post_fn::../seismicpro/batchflow/batchflow/decorators.py::239::wrap_with_threads,1,9.2e-05,0.000193
1,calculate_semblance #2,remove::../seismicpro/batchflow/batchflow/decorators.py::336::wrapped_method,1,8.3e-05,8.3e-05
1,calculate_semblance #2,wrap_with_threads::../seismicpro/batchflow/batchflow/decorators.py::336::wrapped_method,1,7.9e-05,4.197804
1,calculate_semblance #2,shutdown::/usr/lib/python3.6/concurrent/futures/_base.py::610::__exit__,1,7.6e-05,0.000477
1,calculate_semblance #2,calculate_semblance::../seismicpro/batchflow/batchflow/decorators.py::31::_action_wrapper,1,6.2e-05,4.198152
1,calculate_semblance #2,_action_wrapper::../seismicpro/batchflow/batchflow/pipeline.py::656::_exec_one_action,1,5.7e-05,4.198209


Funciton with one numba loop and matrix operations. Function ```_calc_semb_hard_numba_mx```.

In [9]:
pipeline_v2 = (
    dset.p
    .load(fmt='segy', components='raw', tslice=slice(1500))
    .sort_traces(src='raw', dst='raw', sort_by='offset')
    .calculate_semblance('raw', 'semblance_numba_matrix', [1200, 6000], 
                         30, window=51, method='numba_matrix')
)
pipeline_v2.run(1, n_iters=1, shuffle=1980, profile=True);

In [10]:
pipeline_v2.elapsed_time

8.129067182540894

In [11]:
pipeline_v2.show_profile_info(per_iter=True, detailed=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,calculate_semblance #2,7.912481,7.89875,139795400000000.0
1,load #0,0.162511,0.140115,139795400000000.0
1,sort_traces #1,0.025054,0.016523,139795400000000.0


In [12]:
pipeline_v2.show_profile_info(per_iter=True, detailed=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ncalls,tottime,cumtime
iter,action,id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,calculate_semblance #2,<method 'acquire' of '_thread.lock' objects>::/usr/lib/python3.6/threading.py::263::wait,4,7.896239,7.896239
1,calculate_semblance #2,<built-in method _thread.start_new_thread>::/usr/lib/python3.6/threading.py::828::start,1,0.000378,0.000378
1,calculate_semblance #2,wait::../seismicpro/batchflow/batchflow/decorators.py::239::wrap_with_threads,1,0.000155,7.896402
1,calculate_semblance #2,<method 'acquire' of '_thread.lock' objects>::/usr/lib/python3.6/threading.py::1062::_wait_for_tstate_lock,1,0.000113,0.000113
1,calculate_semblance #2,wrapped_method::../seismicpro/batchflow/batchflow/decorators.py::31::_action_wrapper,1,0.000104,7.898332
1,calculate_semblance #2,wrap_with_threads::../seismicpro/batchflow/batchflow/decorators.py::336::wrapped_method,1,9.8e-05,7.898096
1,calculate_semblance #2,remove::../seismicpro/batchflow/batchflow/decorators.py::336::wrapped_method,1,8.9e-05,8.9e-05
1,calculate_semblance #2,wait::/usr/lib/python3.6/threading.py::533::wait,2,8.6e-05,7.896348
1,calculate_semblance #2,calculate_semblance::../seismicpro/batchflow/batchflow/decorators.py::31::_action_wrapper,1,8.4e-05,7.89851
1,calculate_semblance #2,_call_post_fn::../seismicpro/batchflow/batchflow/decorators.py::239::wrap_with_threads,1,7.2e-05,0.000206


One loop with pure numpy matrix operations. Function ```_calc_semb_hard_matrix```.

In [13]:
pipeline_v3 = (
    dset.p
    .load(fmt='segy', components='raw', tslice=slice(1500))
    .sort_traces(src='raw', dst='raw', sort_by='offset')
    .calculate_semblance('raw', 'semblance_matrix', [1200, 6000], 
                         30, window=51, method='matrix')
)
pipeline_v3.run(1, n_iters=1, shuffle=1980, profile=True);

In [14]:
pipeline_v3.elapsed_time

12.415731191635132

In [17]:
pipeline_v3.show_profile_info(per_iter=True, detailed=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,calculate_semblance #2,12.192032,12.179128,139795400000000.0
1,load #0,0.176353,0.152676,139795400000000.0
1,sort_traces #1,0.018305,0.011403,139795400000000.0


In [16]:
pipeline_v3.show_profile_info(per_iter=True, detailed=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ncalls,tottime,cumtime
iter,action,id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,calculate_semblance #2,<method 'acquire' of '_thread.lock' objects>::/usr/lib/python3.6/threading.py::263::wait,4,12.177078,12.177078
1,calculate_semblance #2,<method 'acquire' of '_thread.lock' objects>::/usr/lib/python3.6/threading.py::1062::_wait_for_tstate_lock,1,0.000155,0.000155
1,calculate_semblance #2,wait::../seismicpro/batchflow/batchflow/decorators.py::239::wrap_with_threads,1,0.000111,12.177099
1,calculate_semblance #2,<built-in method _thread.start_new_thread>::/usr/lib/python3.6/threading.py::828::start,1,0.000107,0.000107
1,calculate_semblance #2,wrap_with_threads::../seismicpro/batchflow/batchflow/decorators.py::336::wrapped_method,1,9.3e-05,12.178509
1,calculate_semblance #2,wrapped_method::../seismicpro/batchflow/batchflow/decorators.py::31::_action_wrapper,1,9.1e-05,12.178711
1,calculate_semblance #2,_action_wrapper::../seismicpro/batchflow/batchflow/pipeline.py::656::_exec_one_action,1,8.2e-05,12.178958
1,calculate_semblance #2,_call_post_fn::../seismicpro/batchflow/batchflow/decorators.py::239::wrap_with_threads,1,8.1e-05,0.000154
1,calculate_semblance #2,calculate_semblance::../seismicpro/batchflow/batchflow/decorators.py::31::_action_wrapper,1,7.7e-05,12.178876
1,calculate_semblance #2,remove::../seismicpro/batchflow/batchflow/decorators.py::336::wrapped_method,1,6.9e-05,6.9e-05


To conclude we can say that the fastest method is to use numba with 3 nested loops. 