# MTH5 Example 07

## Make an MTH5 from ZEN data

ZEN instruments are made by Zonge International and are broadband instruments.  They output binary formatted Z3D files.  These files have a good amount of metadata included in them making they nearly self describing. They can include the coil calibration and board calibration.  The files have a GPS stamp every second, which makes it nice to keep the timing consistent.  The sample rates are 4096, 1024, and 256 samples per second.  A common setup would be to collect at 256 for 6-8 hours then collect 4096 for 5-15 minutes and repeat that over the course of 1-3 days depending on logistics and the target depth.  One issue to deal with is that when the ZEN switches sampling rates there is about a 20 second time gap caused by the instrument changing settings and getting GPS lock.  This only is an issue when combining time series to process a longer continuous chunk of data.  

This example is from Yellowstone where the operators collected the data continuously for a couple of days at a sample rate of 1024.  Therefore there is only 1 run per station.

In [1]:
from pathlib import Path
from mth5.mth5 import MTH5
from mth5.io.zen import Z3DCollection
from mth5 import read_file

2022-09-27 14:47:48,125 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\Users\jpeacock\OneDrive - DOI\Documents\GitHub\mth5\logs\mth5_debug.log


### Path to Z3D files

Set the path to the Z3D files, these are stored locally.

In [2]:
z3d_path = Path().cwd().parent.parent.joinpath("data", "time_series", "zen")

### Z3D Collection

We will use the `Z3DCollection` to assemble the *.z3d* files into a logical order by schedule action or run. 

**Note**: `n_samples` is an estimate based on file size not the data.  To get an accurate number you should read in the full file.  Same with `start` and `end`. `start` is based on the schedule start time which is usually 2 seconds earlier than the data start because of instrument buffer while chaning sampling rates. `end` is based on file size and sample rate.

The `Z3DCollection.get_runs()` will return a two level ordered dictionary (`OrderedDict`).  The first level is keyed by station ID.  These objects are in turn ordered dictionaries by run ID.  Therefore you can loop over stations and runs.

**IMPORTANT** These data were collected continuously at 1024 samples per second, so we should see one run for each channel for the 2 different stations wb280 and wb380.

In [3]:
zc = Z3DCollection(z3d_path)
runs = zc.get_runs(sample_rates=[1024])
print(f"Found {len(runs)} station with {len(runs[list(runs.keys())[0]])} runs")

Found 2 station with 1 runs


In [4]:
for station in runs.keys():
    display(runs[station]["sr1024_0001"])

Unnamed: 0,survey,station,run,start,end,channel_id,component,fn,sample_rate,file_size,n_samples,sequence_number,instrument_id,calibration_fn
0,yellowstone,wb280,sr1024_0001,2017-07-01 02:19:58+00:00,2017-07-03 21:21:43.405273+00:00,0,ex,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,988356732,247096735,1,ZEN_016,
1,yellowstone,wb280,sr1024_0001,2017-07-01 02:19:58+00:00,2017-07-03 21:21:43.405273+00:00,0,ey,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,988356732,247096735,1,ZEN_016,
2,yellowstone,wb280,sr1024_0001,2017-07-01 02:19:58+00:00,2017-07-03 21:21:43.405273+00:00,1,hx,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,988359292,247096735,1,ZEN_016,
3,yellowstone,wb280,sr1024_0001,2017-07-01 02:19:58+00:00,2017-07-03 21:21:43.405273+00:00,2,hy,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,988359292,247096735,1,ZEN_016,
4,yellowstone,wb280,sr1024_0001,2017-07-01 02:19:58+00:00,2017-07-03 21:21:43.405273+00:00,3,hz,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,988359804,247096735,1,ZEN_016,


Unnamed: 0,survey,station,run,start,end,channel_id,component,fn,sample_rate,file_size,n_samples,sequence_number,instrument_id,calibration_fn
5,yellowstone,wb380,sr1024_0001,2017-07-02 02:59:58+00:00,2017-07-05 17:43:53.625000+00:00,0,ex,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,1278886912,319729280,1,ZEN_018,
6,yellowstone,wb380,sr1024_0001,2017-07-02 02:59:58+00:00,2017-07-05 17:43:53.625000+00:00,0,ey,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,1278886912,319729280,1,ZEN_018,
7,yellowstone,wb380,sr1024_0001,2017-07-02 02:59:58+00:00,2017-07-05 17:43:53.625000+00:00,1,hx,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,1278889984,319729280,1,ZEN_018,
8,yellowstone,wb380,sr1024_0001,2017-07-02 02:59:58+00:00,2017-07-05 17:43:53.625000+00:00,2,hy,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,1278889984,319729280,1,ZEN_018,
9,yellowstone,wb380,sr1024_0001,2017-07-02 02:59:58+00:00,2017-07-05 17:43:52.609375+00:00,3,hz,C:\Users\jpeacock\OneDrive - DOI\Documents\Git...,1024.0,1278885824,319728240,1,ZEN_018,


## Build MTH5

Now that we have a logical collection of files, lets load them into an MTH5.  We will simply loop of the stations, runs, and channels in the ordered dictionary.

There are a few things that we need to keep track of.  

- The station metadata pulled directly from the Z3D files can be input into the station metadata, be sure to use the `write_metadata` method to write the metadata to the MTH5.
- The Z3D files have the coil response and zen response embedded in the file, so we can put those into the appropriate filter container in MTH5.  This is important for calibrating later.  
- Since this is a MTH5 file version 0.2.0 the filters are in the `survey_group` so add them there.
- If you want to calibrate the data set calibrate to `True`.  

In [5]:
calibrate = False
m = MTH5()
if calibrate:
    m.data_level = 2
m.open_mth5(zc.file_path.joinpath("from_z3d.h5"))

In [6]:
survey_group = m.add_survey("yellowstone")

2022-09-27 14:47:48,821 [line 285] mth5.groups.base.MasterSurvey.add_survey - INFO: survey test already exists, returning existing group.


In [7]:
%%time
for station_id in runs.keys():
    station_group = survey_group.stations_group.add_station(station_id)
    station_group.metadata.update(zc.station_metadata_dict[station_id])
    station_group.write_metadata()
    for run_id, run_df in runs[station_id].items():
        run_group = station_group.add_run(run_id)
        for row in run_df.itertuples():
            ch_ts = read_file(row.fn)
            # NOTE: this is where the calibration occurs
            if calibrate:
                ch_ts = ch_ts.remove_instrument_response()
            run_group.from_channel_ts(ch_ts)
            
    # update station metadata from all the new runs
    station_group.validate_station_metadata()

# update survey metadata from added stations 
survey_group.update_survey_metadata()
        

2022-09-27 14:47:48,853 [line 303] mth5.groups.base.MasterStation.add_station - INFO: Station wb280 already exists, returning existing group.
2022-09-27 14:47:48,893 [line 784] mth5.groups.base.Station.add_run - INFO: run sr1024_0001 already exists, returning existing group.


Wall time: 3min 17s


In [12]:
%%time


Wall time: 4.01 s


#### MTH5 Structure

Have a look at the MTH5 structure and make sure it looks correct.

In [13]:
m

/:
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
            |- Group: test
            --------------
                |- Group: Filters
                -----------------
                    |- Group: coefficient
                    ---------------------
                        |- Group: dipole_50.00m
                        -----------------------
                        |- Group: zen_counts2mv
                        -----------------------
                    |- Group: fap
                    -------------
                        |- Group: ant4_2404_response
                        ----------------------------
                            --> Dataset: fap_table
                            ........................
                        |- Group: ant4_2414

### Channel Summary

Have a look at the channel summary and make sure everything looks good.

In [14]:
m.channel_summary.summarize()
m.channel_summary.to_dataframe()

Unnamed: 0,survey,station,run,latitude,longitude,elevation,component,start,end,n_samples,sample_rate,measurement_type,azimuth,tilt,units,hdf5_reference,run_hdf5_reference,station_hdf5_reference
0,test,wb280,sr1024_0001,44.147916,-111.049752,1954.239,ex,2017-07-01 02:19:59+00:00,2017-07-03 20:19:42+00:00,243284992,1024.0,electric,0.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
1,test,wb280,sr1024_0001,44.147916,-111.049752,1954.239,ey,2017-07-01 02:19:59+00:00,2017-07-03 20:19:42+00:00,243284992,1024.0,electric,90.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
2,test,wb280,sr1024_0001,44.147916,-111.049752,1954.239,hx,2017-07-01 02:19:59+00:00,2017-07-03 20:19:41.997070+00:00,243284989,1024.0,magnetic,0.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
3,test,wb280,sr1024_0001,44.147916,-111.049752,1954.239,hy,2017-07-01 02:19:59+00:00,2017-07-03 20:19:41.999023+00:00,243284991,1024.0,magnetic,90.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
4,test,wb280,sr1024_0001,44.147916,-111.049752,1954.239,hz,2017-07-01 02:19:59+00:00,2017-07-03 20:19:42+00:00,243284992,1024.0,magnetic,90.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
5,test,wb380,sr1024_0001,44.291193,-110.614549,2392.466,ex,2017-07-02 03:01:00+00:00,2017-07-05 16:24:42+00:00,314800128,1024.0,electric,0.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
6,test,wb380,sr1024_0001,44.291193,-110.614549,2392.466,ey,2017-07-02 03:01:00+00:00,2017-07-05 16:24:42+00:00,314800128,1024.0,electric,90.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
7,test,wb380,sr1024_0001,44.291193,-110.614549,2392.466,hx,2017-07-02 03:01:00+00:00,2017-07-05 16:24:42+00:00,314800128,1024.0,magnetic,0.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
8,test,wb380,sr1024_0001,44.291193,-110.614549,2392.466,hy,2017-07-02 03:01:00+00:00,2017-07-05 16:24:42+00:00,314800128,1024.0,magnetic,90.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>
9,test,wb380,sr1024_0001,44.291193,-110.614549,2392.466,hz,2017-07-02 03:01:00+00:00,2017-07-05 16:24:41+00:00,314799104,1024.0,magnetic,90.0,0.0,digital counts,<HDF5 object reference>,<HDF5 object reference>,<HDF5 object reference>


## Add Transfer functions

We have some processed data from these 2 stations that we can add to the file and we can compare them later to processed versions

In [17]:
from mt_metadata.transfer_functions.core import TF

In [19]:
for avg_fn in z3d_path.rglob("*.avg"):
    avg_tf_object = TF(avg_fn)
    avg_tf_object.read_tf_file(z_positive="up")
    avg_tf_object.survey_metadata.id = "yellowstone"
    m.add_transfer_function(avg_tf_object)




#### Transfer Function Summary 

Lets make sure the transfer functions went in properly.

In [20]:
m.tf_summary.summarize()
m.tf_summary.to_dataframe()

Unnamed: 0,station,survey,latitude,longitude,elevation,tf_id,units,has_impedance,has_tipper,has_covariance,period_min,period_max,hdf5_reference,station_hdf5_reference
0,wb28_rr_wb13_mtedit,yellowstone,44.147916,-111.049752,0.0,wb28_rr_wb13_mtedit,none,True,False,False,0.003906,1024.002621,<HDF5 object reference>,<HDF5 object reference>
1,wb28_rr_wb27_mtedit,yellowstone,44.147916,-111.049752,0.0,wb28_rr_wb27_mtedit,none,True,False,False,0.003906,1024.002621,<HDF5 object reference>,<HDF5 object reference>
2,wb38_rr_wb28_mtedit,yellowstone,44.291193,-110.614549,0.0,wb38_rr_wb28_mtedit,none,True,False,False,0.003906,1024.002621,<HDF5 object reference>,<HDF5 object reference>


## Close the MTH5

This is important, you should close the file after you are done using it.  Otherwise bad things can happen if you try to open it with another program or Python interpreter. These are in the same folder that the time series are in `data/time_series/zen`

In [21]:
m.close_mth5()

2022-09-27 15:06:33,454 [line 753] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing C:\Users\jpeacock\OneDrive - DOI\Documents\GitHub\mt_examples\data\time_series\zen\from_z3d.h5
