Possible to keep things in the memory? #19

wjlei1990 · 2016-01-16T23:40:08Z

Hi Lion,

We are now integrating the whole workflow. For example, we want to combine signal processing(for both observed and synthetic) and window selection together. During the processing, we don't want any I/Os. To be more specific, we want the workflow to read in the raw data(one observed and one synthetic asdf file) at the very beginning, process them and select windows. After window selections, only the windows are written out.

Currently, I am using "process" function in the asdf_data_set.py. However, there is one argument called output_filename:

def process(self, process_function, output_filename, tag_map,
                traceback_limit=3, **kwargs):

meaning currently implementation requires to write the processed files out. However, if possible, my preferred way would be keep things in the memory. I am guessing if so, it is even against the basic implementation of the asdf, right? Cause when you initialize the "read", it is not even read the whole thing into the memory. So there is not such a thing called "keep all the things in the memory".

Or I am thinking about another option would be modify the process function. So the process function would take one observed and one synthetic, and walked all the way down to window selection. I think that might be the right and possible way.

If my words get you confused here, I will just illustrate a little more:
For example, you have two files, one raw observed asdf and one raw synthetic asdf. You want to process them and select windows. There are two ways of doing so:

process the whole observed asdf(but keep all things in the memory), process the whole synthetic asdf(and keep things in the memory), and select windows(for traces in the memory). The advantage of this is to make my code modulelized so I can simply ensemble different parts together. But dis-advantage is this method might be not applicable to the currently asdf implementation.
modify the process_function, to make it incorporate all the procedures. I think if so, it is possible to implement. But disadvantage is this will make the process_function so big and not very user-friendly?

Sorry to bring it up so late. I looked through the code and found if so, it might involves a lot of changes in the code.

The text was updated successfully, but these errors were encountered:

krischer · 2016-01-18T14:33:37Z

Hi Wenjie,

you should be able to do this with the already existing process_two_files_without_parallel_output() method.

http://seismicdata.github.io/pyasdf/parallel_processing.html#process-two-files-without-parallel-output-method

No need to add anything else I think.

modify the process_function, to make it incorporate all the procedures. I think if so, it is possible to implement. But disadvantage is this will make the process_function so big and not very user-friendly?

I guess my suggestions also applies to that fear of your's but just split it up in a couple of functions and you should be more than fine.

Let me know if there is some roadblock in that approach and we can certainly add something else as well if required.

Cheers!

Lion

wjlei1990 · 2016-01-19T04:23:56Z

Hi Lion,

I ensembled an example for the preprocessing workflow based on your suggestion, using process_two_files_without_parallel_output() method.

This example includes:

observed signal processing
synthetic singal processing
window selection
adjoint source constructor

I uploaded the file here(this is just an example and it is not yet 100% complete). But it delievers what we have been discussed.

A bit concern about this way is we have two many in-place function definition. The major reason for that is we need to define some variables outside the function and the argument list is pretty limited for process function passing into the process_two_files_without_parallel_output().

Sorry, I named it proc_combo.py.txt for uploading. Please rename it to *.py.
proc_combo.py.txt

krischer · 2016-01-19T15:53:00Z

Hi Wenjie,

yea that is actually an intended design choice. Otherwise the process()/process_two_files_without_parallel_output() methods would have very awkward function signatures.

Python has reasonable support for functional programming, in this case closures and function currying. You are using closures: the function definitions you make bind the outside variables and thus the function can see them.

An alternative would be to use functools.partial for function currying/partial application in Python. Your example could look like this:

from functools import partial

def combo_func(obsd_station_group, synt_station_group,
              process_obsd_function, process_synt_function, window_function,
              adjoint_source):
    ...


results = obsd_ds.process_two_files_without_parallel_output(
    synt_ds, partial(combo_func,
                     process_obsd_function=process_obsd_function,
                     process_synt_function=process_synt_function,
                     window_function=window_function
                     adjoint_source=adjoint_source))

Using either allows to write a well structure program.

wjlei1990 · 2016-01-19T23:12:44Z

Thanks for the suggestion. I will use the functools.partial
👍

wjlei1990 added the enhancement label Jan 16, 2016

wjlei1990 closed this as completed Jan 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to keep things in the memory? #19

Possible to keep things in the memory? #19

wjlei1990 commented Jan 16, 2016

krischer commented Jan 18, 2016

wjlei1990 commented Jan 19, 2016

krischer commented Jan 19, 2016

wjlei1990 commented Jan 19, 2016

Possible to keep things in the memory? #19

Possible to keep things in the memory? #19

Comments

wjlei1990 commented Jan 16, 2016

krischer commented Jan 18, 2016

wjlei1990 commented Jan 19, 2016

krischer commented Jan 19, 2016

wjlei1990 commented Jan 19, 2016