# Time Series Data Standardization

This project revolves around the harmonization of time series data obtained from multiple sensors, characterized by varying sampling rates and temporal discretization. For instance, the project demonstrates this process using datasets from the PDI13215 (pressure sensor) and FI12110 (flow sensor), spanning observations over a period of several months. This project serves as an exemplar of the approach's versatility and can be seamlessly extended to harmonize and standardize datasets from various sensors.

### Tasks

1. #### Adaptive Temporal Data Transformation:


 * Normalize disparate raw data into a standardized forma: Timestamp, PDI13215 Sensor Value, FI12110 Sensor Value
 * Address discrepancies in timestamps and missing values through interpolation, ensuring a consistent N-minute interval (10 minute interval in this example, e.g., 9:00, 9:10, 9:20, etc.). Excessively frequent data points (more frequent than once every N minutes) will be filtered out.


2. #### Data Cleaning and Export:

- Remove rows cointaining empty or NULL values.
- Save the processed data into a `.csv` file encoded in UTF-8, using a comma (`,`) as a delimiter, and timestamp format: `YYYY-MM-DD HH:MI:SS`
- Format the output file similar to the provided template, which serves as a foundation for subsequent machine learning training and analysis:

1: Point name, PDI13215, FI12110  
2: Description, Value for PDI13215, Value for FI12110  
3: Extended Name,  
4: Extended Description,  
5: Units, PSI, M3/h  
6: TIMESTAMP, PDI13215_SENSOR_VALUE, FI12110_SENSOR_VALUE  


<div class="alert alert-block alert-info">
<b>Tip:</b> This project serves as an example of the approach's applicability to PDI13215 and FI12110 data, but it can be readily tailored to handle similar challenges present in data collected from other sensors.
</div>

In [29]:
import pandas as pd
import numpy as np
import pyodbc
from datetime import datetime
import seaborn as sns
import os
import matplotlib.pyplot as plt

In [2]:
current_directory = os.getcwd()

<b>Table_name_1:</b> FI12110_1 <br>
<b>Columns:</b> Time Status Source Value <br>
<b>Table_name_2:</b> PDI13215 <br>
<b>Columns:</b> Time Source Value <br>

#### Reading data from SQL databases

In [3]:
path1 = current_directory+'\\FI12110.accdb'
path2 =  current_directory+'\\PDI13215.accdb'

In [33]:
conn = pyodbc.connect(f'Driver={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={path1};')

In [34]:
FI12110 = pd.read_sql_query('select Time, Value from FI12110_1 WHERE Time IS NOT NULL AND Value IS NOT NULL', parse_dates = {'Time':'%Y%m%d %H:%M:%S'}, con = conn)

  FI12110 = pd.read_sql_query('select Time, Value from FI12110_1', parse_dates = {'Time':'%Y%m%d %H:%M:%S'}, con = conn)


In [8]:
conn.close()

In [37]:
conn = pyodbc.connect(f'Driver={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={path2};')

In [38]:
PDI13215 = pd.read_sql_query('select Time, Value from PDI13215', parse_dates = {'Time':'%Y%m%d %H:%M:%S'}, con = conn)

  PDI13215 = pd.read_sql_query('select Time, Value from PDI13215', parse_dates = {'Time':'%Y%m%d %H:%M:%S'}, con = conn)


In [39]:
conn.close()

In [22]:
FI12110.shape

(13281216, 2)

In [24]:
PDI13215.shape

(12439496, 2)

In [67]:
FI12110.head()

Unnamed: 0,Time,Value
0,2019-10-01 09:00:01,503.131744
1,2019-10-01 09:00:27,507.307098
2,2019-10-01 09:00:28,503.131744
3,2019-10-01 09:00:30,507.307098
4,2019-10-01 09:00:31,503.131744


In [64]:
PDI13215.head()

Unnamed: 0,Time,Value
0,2019-10-01 00:15:25,70.0
1,2019-10-01 01:15:25,70.0
2,2019-10-01 02:15:26,70.0
3,2019-10-01 03:15:26,70.0
4,2019-10-01 04:15:27,70.0
