# Matlab to .pkl Conversion

The purpose of this notebook is to streamline the conversion of the WIM data from matlab into a useable form. This involves converting first the Matlab table to a struct and then in python converting this struct as well as combining different structs representing different years. Currently years 2011-2019 are combined. The date/times are also converted to a single datetime.


In [11]:
import scipy.io
import pandas as pd
import numpy as np

In [12]:
def load_table_from_struct(table_structure) -> pd.DataFrame():

    # get prepared data structure
    data = table_structure[0, 0]['table']['data']
    # get prepared column names
    data_cols = [name[0] for name in table_structure[0, 0]['columns'][0]]

    # create dict out of original table
    table_dict = {}
    for colidx in range(len(data_cols)):
        if data_cols[colidx] != 'HH':
            table_dict[data_cols[colidx]] = [val[0] for val in data[0, 0][0, colidx]]
    
    return pd.DataFrame(table_dict)

In [13]:
def df_cleaning(df, counting_only):
    
    #If this will be used for counting only, eliminate the axle weights, etc
    if counting_only:
        df = df[['FS', 'GW_TOT', 'CLASS', 'HHMMSS', 'JJJJMMTT', 'ZST', 'LENTH', 'CS']]
    
    df['HHMMSS'] = df['HHMMSS'].astype(str) 
    df['HH'] = df['HHMMSS'].str[:-4]
    df['MMSS'] = df['HHMMSS'].str[-4:]
    df['MM'] = df['MMSS'].str[:-2]
    df = df.replace(r'', np.nan, regex=True)
    df = df.fillna(0)
    df['HH'] = df['HH'].astype(int)
    df['MM'] = df['MM'].astype(int)
    df['Date'] = df['JJJJMMTT'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
    df = df.drop(columns=['JJJJMMTT'])
    df['Date'] += pd.to_timedelta(df.HH, unit='h')
    df['Date'] += pd.to_timedelta(df.MM, unit='m')
    df['SS'] = df['MMSS'].str[-2:]
    df['SS'] = df['SS'].astype(int)
    df['Date'] += pd.to_timedelta(df.SS, unit='s')
    df = df.drop(columns=['HH', 'MM', 'SS', 'MMSS', 'HHMMSS'])
    return df

In [22]:
#Loading matrices from struct in matlab

dfs = []

station = "Gotthard"

for year in range(2011, 2020):
    struct = scipy.io.loadmat('{}Data/{}_{}_struct.mat'.format(station, station, year))
    df = load_table_from_struct(struct['{}_struct'.format(station)])
    df = df_cleaning(df, True)
    dfs.append(df)
    
df = pd.concat(dfs, ignore_index=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http

In [23]:
df.keys()

Index(['FS', 'GW_TOT', 'CLASS', 'ZST', 'LENTH', 'CS', 'Date'], dtype='object')

All of the columns from the classified data from matlab were decided to be kept. Below the times are converted to a functioning datetime, then the now not needed other columsn for time/date are dropped.

In [24]:
df.to_pickle('{}Data/2011_2019_datetime.pkl'.format(station))