##  LEMI Example

Recived from Andy Frassetto via email, 10 May, 2022.
_______________________
Karl,

Here's one candidate. PASSCAL test set from fall 2020 in the Magdalena
mountains, so...should be fairly quiet.

Cheers, A


The data recieved were from a single station, and sit in a folder called 
DATA0110.  In general, it is recommended to group the LEMI files like this, in one folder per station.

Within a station folder, there can be many files.

Every file is associated with exactly one run.
However, some runs are associated with more than one file.

Therefore it is desireable to group the files according to their runs.

We could do this with subfolders, but in this example we use a dataframe


We can take advantage of the highly regular LEMI filename structure, 
which is if the form YYYYMMDDhhmm.TXT
i.e. LEMI files start on the UTC minute.

Thus we can easily sort these, and determine, based on filename _only_ whether the data are contiguous or not
202009302021.TXT
YYYYMMDDHHMM.TXT




In [1]:
import os
import pandas as pd
from pathlib import Path
#from string import zfill

from mth5 import read_file
from mth5 import mth5
from mth5.io.lemi424_new import LEMI424

from mt_metadata import timeseries as metadata
from mt_metadata.utils.mttime import MTime


2022-07-30 09:33:40,298 [line 135] mth5.setup_logger - INFO: Logging file can be found /home/kkappler/software/irismt/mth5/logs/mth5_debug.log


### Define path to the data

The original data dump was in a folder called DATA0110.


In [2]:
survey_dir = Path(r"/home/kkappler/software/irismt/aurora/tests/LEMI/")
cmd = f"ls {survey_dir}"
print("Survey Directory Contents")
os.system(cmd)

Survey Directory Contents
DATA0110
from_lemi424.mth5
lemi_reader_test.py
stations
test_read_multiple_lemi.py


0

Let's make a _stations_ folder to better emulate how the data would be stored in a survey directory

In [3]:
stations_dir = survey_dir.joinpath("stations")
stations_dir.mkdir(exist_ok=True)


In [4]:
os.system(cmd)

DATA0110
from_lemi424.mth5
lemi_reader_test.py
stations
test_read_multiple_lemi.py


0

Now in the stations folder, let's create a symlink to DATA0110.
Give the station a name, like 53

In [5]:
original_station_dir = survey_dir.joinpath("DATA0110")
symlink_path = stations_dir.joinpath("station_53")
cmd = f"ln -s {original_station_dir} {symlink_path}"
#cmd = f"ln -s {symlink_path} {original_station_dir}"
print(cmd)
os.system(cmd)

ln -s /home/kkappler/software/irismt/aurora/tests/LEMI/DATA0110 /home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53


ln: failed to create symbolic link '/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/DATA0110': File exists


256

In [6]:
os.system(f"ls {symlink_path}")

202009302021.TXT
202009302029.TXT
202009302054.TXT
202009302112.TXT
202009302114.TXT
202010010000.TXT
202010020000.TXT
202010030000.TXT
202010040000.TXT
202010050000.TXT
202010060000.TXT
202010070000.TXT
DATA0110
readme


0

In [7]:
p = symlink_path.glob("*.TXT")
files_list = [x for x in p if x.is_file()]
files_list.sort() #Important: List is sorted so the files are sequential. We leverage this property

print("FILES:\n")
for file in files_list:
    print(file)



FILES:

/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302021.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302029.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302054.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302112.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302114.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010010000.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010020000.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010030000.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010040000.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010050000.TXT
/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010060000.TXT
/home/kkappler/software/irismt/aurora/tests/LE

### Make a list of LEMI424 objects, one per file

In [8]:
l424_list = [LEMI424(fn=x) for x in files_list]

### Read in the data

In [9]:
for l424 in l424_list:
    l424.read()

In [10]:
L0 = l424_list[0]
print(L0.fn[0].stem)
print(L0.start)
print(L0.end)


202009302021
2020-09-30T20:21:00
2020-09-30T20:28:15


#### Now info can be accessed via data frame

In [11]:
columns = ["year", "month", "day", "hour", "minute", "second", "bx", "by", "bz", 'e1', 'e2', 'e3', 'e4',"latitude", "longitude"]
l424_list[0]._df[columns][0:5]

Unnamed: 0,year,month,day,hour,minute,second,bx,by,bz,e1,e2,e3,e4,latitude,longitude
0,2020,9,30,20,21,0,23813.621,729.816,41802.042,131.013,-111.026,164.166,9.715,3404.83911,10712.84475
1,2020,9,30,20,21,1,23813.586,729.842,41802.03,130.917,-111.204,164.061,9.54,3404.83911,10712.84473
2,2020,9,30,20,21,2,23813.553,729.875,41802.058,130.918,-111.227,164.071,9.521,3404.8391,10712.8447
3,2020,9,30,20,21,3,23813.477,729.878,41802.042,130.918,-111.396,164.06,9.357,3404.8391,10712.84468
4,2020,9,30,20,21,4,23813.449,729.908,41802.034,131.018,-111.326,164.17,9.428,3404.83909,10712.84467


In [12]:
l424_list[0]._df.columns

Index(['year', 'month', 'day', 'hour', 'minute', 'second', 'bx', 'by', 'bz',
       'temperature_e', 'temperature_h', 'e1', 'e2', 'e3', 'e4', 'battery',
       'elevation', 'latitude', 'lat_hemisphere', 'longitude',
       'lon_hemisphere', 'n_satellites', 'gps_fix', 'tdiff'],
      dtype='object')

In [13]:
l424_list[0]._df[0:5]

Unnamed: 0,year,month,day,hour,minute,second,bx,by,bz,temperature_e,...,e4,battery,elevation,latitude,lat_hemisphere,longitude,lon_hemisphere,n_satellites,gps_fix,tdiff
0,2020,9,30,20,21,0,23813.621,729.816,41802.042,39.76,...,9.715,13.01,2204.5,3404.83911,N,10712.84475,W,12,2,0
1,2020,9,30,20,21,1,23813.586,729.842,41802.03,39.76,...,9.54,13.01,2204.5,3404.83911,N,10712.84473,W,12,2,0
2,2020,9,30,20,21,2,23813.553,729.875,41802.058,39.75,...,9.521,13.01,2204.6,3404.8391,N,10712.8447,W,12,2,0
3,2020,9,30,20,21,3,23813.477,729.878,41802.042,39.81,...,9.357,13.01,2204.7,3404.8391,N,10712.84468,W,12,2,0
4,2020,9,30,20,21,4,23813.449,729.908,41802.034,39.77,...,9.428,13.01,2204.7,3404.83909,N,10712.84467,W,12,2,0


In [14]:
l424_list[0]._df[-5:]

Unnamed: 0,year,month,day,hour,minute,second,bx,by,bz,temperature_e,...,e4,battery,elevation,latitude,lat_hemisphere,longitude,lon_hemisphere,n_satellites,gps_fix,tdiff
431,2020,9,30,20,28,11,23784.69,215.653,41830.503,42.52,...,11.65,13.01,2204.8,3404.83945,N,10712.84481,W,12,2,0
432,2020,9,30,20,28,12,23784.723,215.637,41830.472,42.54,...,12.193,13.01,2204.8,3404.83946,N,10712.84481,W,12,2,0
433,2020,9,30,20,28,13,23784.766,215.619,41830.441,42.54,...,12.816,13.01,2204.8,3404.83948,N,10712.84479,W,12,2,0
434,2020,9,30,20,28,14,23784.856,215.601,41830.441,42.58,...,13.345,13.0,2204.8,3404.83949,N,10712.84479,W,12,2,0
435,2020,9,30,20,28,15,23784.928,215.573,41830.445,42.57,...,13.817,13.0,2204.7,3404.8395,N,10712.84478,W,12,2,0


In [15]:
COLUMNS = ["file_path",
           "first_sample_time", 
           "last_sample_time", 
           "num_lines", 
           "run_id", 
           "sample_rate",
           "new_run",
           "file_base"]


In [16]:
n_files = len(files_list)
n_files
#start, end are first and last sampe time respectivel

12

In [17]:
data_dict = {}
for col in COLUMNS:
    data_dict[col] = n_files * [None]

In [18]:
for i_file in range(n_files):
    data_dict["file_path"][i_file] = files_list[i_file]
    data_dict["first_sample_time"][i_file] = pd.Timestamp(l424_list[i_file].start)
    data_dict["last_sample_time"][i_file] = pd.Timestamp(l424_list[i_file].end)
    data_dict["num_lines"][i_file] = len(l424_list[i_file]._df)
    data_dict["run_id"][i_file] = ""
    data_dict["sample_rate"][i_file] = l424_list[i_file].sample_rate
    data_dict["new_run"][i_file] = True
    data_dict["file_base"][i_file] = files_list[i_file].name
    

In [19]:
station_data_df = pd.DataFrame(data=data_dict)
station_data_df[COLUMNS[1:]]

Unnamed: 0,first_sample_time,last_sample_time,num_lines,run_id,sample_rate,new_run,file_base
0,2020-09-30 20:21:00,2020-09-30 20:28:15,436,,1.0,True,202009302021.TXT
1,2020-09-30 20:29:00,2020-09-30 20:42:16,797,,1.0,True,202009302029.TXT
2,2020-09-30 20:54:00,2020-09-30 21:11:01,1022,,1.0,True,202009302054.TXT
3,2020-09-30 21:12:00,2020-09-30 21:13:45,106,,1.0,True,202009302112.TXT
4,2020-09-30 21:14:00,2020-09-30 23:59:59,9960,,1.0,True,202009302114.TXT
5,2020-10-01 00:00:00,2020-10-01 23:59:59,86400,,1.0,True,202010010000.TXT
6,2020-10-02 00:00:00,2020-10-02 23:59:59,86400,,1.0,True,202010020000.TXT
7,2020-10-03 00:00:00,2020-10-03 23:59:59,86400,,1.0,True,202010030000.TXT
8,2020-10-04 00:00:00,2020-10-04 23:59:59,86400,,1.0,True,202010040000.TXT
9,2020-10-05 00:00:00,2020-10-05 23:59:59,86400,,1.0,True,202010050000.TXT


In [20]:
new_run = n_files * [True]
for i_row, row in station_data_df.iterrows():
    if i_row == 0:
        pass
#        row.run_id = run_id_str
    else:
        #Check of sample rate changed
        previous = station_data_df.loc[i_row-1]
        if previous.sample_rate != row.sample_rate:
            print("CHANGED SAMPLE RATE")
            new_run[i_row] = True
            #row.new_run = True
            continue
            
        #check for continuity with previous
        dt = pd.Timedelta(seconds=1./previous.sample_rate)
        previous_next_sample = previous.last_sample_time + dt

        
        if row.first_sample_time == previous_next_sample:
            print("SAME RUN")
            new_run[i_row] = False
            #station_data_df.iloc[i_row].replace(to_replace=True, value = False)
            #station_data_df.at[i_row].new_run = False
            continue
                                                                                            
station_data_df["new_run"] = new_run    
station_data_df[COLUMNS[1:]]
    

SAME RUN
SAME RUN
SAME RUN
SAME RUN
SAME RUN
SAME RUN
SAME RUN


Unnamed: 0,first_sample_time,last_sample_time,num_lines,run_id,sample_rate,new_run,file_base
0,2020-09-30 20:21:00,2020-09-30 20:28:15,436,,1.0,True,202009302021.TXT
1,2020-09-30 20:29:00,2020-09-30 20:42:16,797,,1.0,True,202009302029.TXT
2,2020-09-30 20:54:00,2020-09-30 21:11:01,1022,,1.0,True,202009302054.TXT
3,2020-09-30 21:12:00,2020-09-30 21:13:45,106,,1.0,True,202009302112.TXT
4,2020-09-30 21:14:00,2020-09-30 23:59:59,9960,,1.0,True,202009302114.TXT
5,2020-10-01 00:00:00,2020-10-01 23:59:59,86400,,1.0,False,202010010000.TXT
6,2020-10-02 00:00:00,2020-10-02 23:59:59,86400,,1.0,False,202010020000.TXT
7,2020-10-03 00:00:00,2020-10-03 23:59:59,86400,,1.0,False,202010030000.TXT
8,2020-10-04 00:00:00,2020-10-04 23:59:59,86400,,1.0,False,202010040000.TXT
9,2020-10-05 00:00:00,2020-10-05 23:59:59,86400,,1.0,False,202010050000.TXT


In [21]:
run_id_int = 1
run_id_str = str(run_id_int).zfill(3)
run_id_str

'001'

In [22]:
run_ids = n_files * [""]
run_ids[0] = run_id_str

In [23]:
for i_row, row in station_data_df.iterrows():
    if i_row==0:
        run_ids[i_row] = run_id_str
        continue
    if row.new_run:
        run_id_int += 1
        run_id_str = str(run_id_int).zfill(3)
    run_ids[i_row] = run_id_str
print(run_ids)

['001', '002', '003', '004', '005', '005', '005', '005', '005', '005', '005', '005']


In [24]:
station_data_df["run_id"] = run_ids
station_data_df[COLUMNS[1:]]

Unnamed: 0,first_sample_time,last_sample_time,num_lines,run_id,sample_rate,new_run,file_base
0,2020-09-30 20:21:00,2020-09-30 20:28:15,436,1,1.0,True,202009302021.TXT
1,2020-09-30 20:29:00,2020-09-30 20:42:16,797,2,1.0,True,202009302029.TXT
2,2020-09-30 20:54:00,2020-09-30 21:11:01,1022,3,1.0,True,202009302054.TXT
3,2020-09-30 21:12:00,2020-09-30 21:13:45,106,4,1.0,True,202009302112.TXT
4,2020-09-30 21:14:00,2020-09-30 23:59:59,9960,5,1.0,True,202009302114.TXT
5,2020-10-01 00:00:00,2020-10-01 23:59:59,86400,5,1.0,False,202010010000.TXT
6,2020-10-02 00:00:00,2020-10-02 23:59:59,86400,5,1.0,False,202010020000.TXT
7,2020-10-03 00:00:00,2020-10-03 23:59:59,86400,5,1.0,False,202010030000.TXT
8,2020-10-04 00:00:00,2020-10-04 23:59:59,86400,5,1.0,False,202010040000.TXT
9,2020-10-05 00:00:00,2020-10-05 23:59:59,86400,5,1.0,False,202010050000.TXT


In [25]:
grouper = station_data_df.groupby("run_id")

In [26]:
print(len(grouper))
fns = {}#len(grouper) * [None]
for run, grouped_df in grouper:
    print(run)
    #print(grouped_df["run_id"])
    fns[run] = grouped_df["file_path"].to_list()
    #print(grouped_df["file_path"].to_list())

5
001
002
003
004
005


In [27]:
fns

{'001': [PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302021.TXT')],
 '002': [PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302029.TXT')],
 '003': [PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302054.TXT')],
 '004': [PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302112.TXT')],
 '005': [PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202009302114.TXT'),
  PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010010000.TXT'),
  PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010020000.TXT'),
  PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010030000.TXT'),
  PosixPath('/home/kkappler/software/irismt/aurora/tests/LEMI/stations/station_53/202010040000.TXT'),
  PosixPath('/home/kkappler/software/irismt

In [28]:
lemis = {}

In [29]:
for run_id in fns.keys():
    tmp = LEMI424(fn=fns[run_id]) 
    tmp = tmp.to_run_ts()
    tmp.run_metadata.id = run_id
    lemis[run_id] = tmp

In [30]:
lemis["001"]

RunTS Summary:
	Station:     None
	Run:         001
	Start:       2020-09-30T20:21:00+00:00
	End:         2020-09-30T20:28:15+00:00
	Sample Rate: 1.0
	Components:  ['bx', 'by', 'bz', 'e1', 'e2', 'temperature_e', 'temperature_h']

In [31]:
lemis["001"].run_metadata

{
    "run": {
        "channels_recorded_auxiliary": [
            "temperature_e",
            "temperature_h"
        ],
        "channels_recorded_electric": [
            "e1",
            "e2"
        ],
        "channels_recorded_magnetic": [
            "bx",
            "by",
            "bz"
        ],
        "data_logger.firmware.author": null,
        "data_logger.firmware.name": null,
        "data_logger.firmware.version": null,
        "data_logger.id": null,
        "data_logger.manufacturer": null,
        "data_logger.timing_system.drift": 0.0,
        "data_logger.timing_system.type": "GPS",
        "data_logger.timing_system.uncertainty": 0.0,
        "data_logger.type": null,
        "data_type": "BBMT",
        "id": "001",
        "sample_rate": 1.0,
        "time_period.end": "2020-09-30T20:28:15+00:00",
        "time_period.start": "2020-09-30T20:21:00+00:00"
    }
}

In [32]:
type(lemis["001"])

mth5.timeseries.run_ts.RunTS

In [33]:
lemis["001"].dataset


In [34]:
lemis["005"].dataset


In [35]:
lemis["005"].run_metadata

{
    "run": {
        "channels_recorded_auxiliary": [
            "temperature_e",
            "temperature_h"
        ],
        "channels_recorded_electric": [
            "e1",
            "e2"
        ],
        "channels_recorded_magnetic": [
            "bx",
            "by",
            "bz"
        ],
        "data_logger.firmware.author": null,
        "data_logger.firmware.name": null,
        "data_logger.firmware.version": null,
        "data_logger.id": null,
        "data_logger.manufacturer": null,
        "data_logger.timing_system.drift": 0.0,
        "data_logger.timing_system.type": "GPS",
        "data_logger.timing_system.uncertainty": 0.0,
        "data_logger.type": null,
        "data_type": "BBMT",
        "id": "005",
        "sample_rate": 1.0,
        "time_period.end": "2020-10-07T14:19:46+00:00",
        "time_period.start": "2020-09-30T21:14:00+00:00"
    }
}


### We have run time series, now let's pack it into an mth5

In [36]:
h5_fn = "magdelena.h5"
station_id = "0110"

In [37]:
# write some simple metadata for the survey
survey = metadata.Survey()
survey.acquired_by.author = "MT Meister"
survey.archive_id = "LEMI_TEST_01"
survey.archive_network = "MT"
survey.name = "magdelena"

In [38]:
m = mth5.MTH5(file_version="0.1.0")
m.open_mth5(h5_fn, "w")


2022-07-30 09:35:14,652 [line 656] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.1.0 file magdelena.h5 in mode w


In [39]:
# add survey metadata
survey_group = m.survey_group
survey_group.metadata.update(survey)
survey_group.write_metadata()

In [40]:
# initialize a station
station_group = m.add_station(station_id)
print("Station was just initialized")
station_group.validate_station_metadata()

Station was just initialized


In [41]:
for run, run_ts in lemis.items():
    run_group = station_group.add_run(run_ts.run_metadata.id, run_metadata=run_ts.run_metadata)


In [42]:
station_group.validate_station_metadata()

In [43]:
survey_group.update_survey_metadata()

In [44]:
m.close_mth5()

2022-07-30 09:35:21,938 [line 731] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing magdelena.h5


In [45]:
mm = mth5.MTH5(file_version="0.1.0")
mm.open_mth5(h5_fn, "a")


In [46]:
mm

/:
    |- Group: Survey
    ----------------
        |- Group: Filters
        -----------------
            |- Group: coefficient
            ---------------------
            |- Group: fap
            -------------
            |- Group: fir
            -------------
            |- Group: time_delay
            --------------------
            |- Group: zpk
            -------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Stations
        ------------------
            |- Group: 0110
            --------------
                |- Group: 001
                -------------
                |- Group: 002
                -------------
                |- Group: 003
                -------------
                |- Group: 004
                -------------
                |- Group: 005
                -------------
                |- Group: Tra

In [47]:
mm.close_mth5()

2022-07-30 09:35:33,832 [line 731] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing magdelena.h5
