# Task
Join PdM_telemetry.csv and PdM_machines.csv using machineID.

Perform time series Exploratory Data Analysis (EDA). Use plotly for data visualizations (Create time series plots, Time series plots and correlation heatmap using plotly)

Here is all the data you need:
"PdM_machines.csv"
"PdM_telemetry.csv"

## Data loading

### Subtask:
Load the two CSV files, "PdM_machines.csv" and "PdM_telemetry.csv", into pandas DataFrames.


**Reasoning**:
Load the two CSV files into pandas DataFrames and display the first 5 rows of each.



In [None]:
import pandas as pd

machines_df = pd.read_csv('PdM_machines.csv')
telemetry_df = pd.read_csv('PdM_telemetry.csv')

display(machines_df.head())
display(telemetry_df.head())

## Data preparation

### Subtask:
Join the `machines_df` and `telemetry_df` DataFrames.


**Reasoning**:
Merge the two dataframes and display the first few rows and shape of the merged dataframe.



In [None]:
# Merge the two dataframes
merged_df = pd.merge(telemetry_df, machines_df, on='machineID', how='inner')

# Display the first few rows of the merged dataframe
display(merged_df.head())

# Print the shape of the merged dataframe
print(merged_df.shape)

## Data exploration

### Subtask:
Explore the merged dataframe `merged_df`, focusing on time-related columns.


**Reasoning**:
Examine the data types of the columns in `merged_df`, check for missing values, especially in the timestamp column, and calculate descriptive statistics for numeric columns.



In [None]:
# Check data types
print(merged_df.dtypes)

# Check for missing values
print(merged_df.isnull().sum())

# Descriptive statistics for numeric columns
print(merged_df.describe())

# Descriptive statistics grouped by machineID
print(merged_df.groupby('machineID').describe())

# Convert 'datetime' to datetime objects if it's not already
if merged_df['datetime'].dtype != 'datetime64[ns]':
    merged_df['datetime'] = pd.to_datetime(merged_df['datetime'])

# Analyze the distribution of the timestamp column
print(merged_df['datetime'].describe())

# Time range of data
print(f"Minimum datetime: {merged_df['datetime'].min()}")
print(f"Maximum datetime: {merged_df['datetime'].max()}")

**Reasoning**:
The previous code block showed that the 'datetime' column is of object type. It needs to be converted to datetime objects for proper time series analysis. Also, the descriptive statistics grouped by machineID are not very informative in the current format.  We need to plot the distributions of sensor readings grouped by machineID to better understand their variations across different machines.



In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Convert 'datetime' column to datetime objects if it's not already
if merged_df['datetime'].dtype != 'datetime64[ns]':
    merged_df['datetime'] = pd.to_datetime(merged_df['datetime'])

# Analyze the distribution of the timestamp column
print(merged_df['datetime'].describe())

# Time range of data
print(f"Minimum datetime: {merged_df['datetime'].min()}")
print(f"Maximum datetime: {merged_df['datetime'].max()}")

# Plot distributions of sensor readings grouped by machineID
numeric_cols = ['volt', 'rotate', 'pressure', 'vibration']
plt.figure(figsize=(15, 10))
for i, col in enumerate(numeric_cols):
    plt.subplot(2, 2, i + 1)
    sns.histplot(data=merged_df, x=col, hue='machineID', kde=True, element="step")
    plt.title(f'Distribution of {col} by machineID')
plt.tight_layout()
plt.show()

## Data wrangling

### Subtask:
Data Wrangling: Prepare the merged dataframe for time series analysis.


**Reasoning**:
Convert the 'datetime' column to datetime objects, set it as the index, and create new time-based features.



In [None]:
# Convert 'datetime' column to datetime objects if it's not already
if merged_df['datetime'].dtype != 'datetime64[ns]':
    merged_df['datetime'] = pd.to_datetime(merged_df['datetime'])

# Set 'datetime' column as index
merged_df = merged_df.set_index('datetime')

# Create new features from the datetime index
merged_df['day_of_week'] = merged_df.index.dayofweek
merged_df['hour_of_day'] = merged_df.index.hour
merged_df['month'] = merged_df.index.month
merged_df['week_of_year'] = merged_df.index.isocalendar().week

# Verify data types
print(merged_df.dtypes)

## Data visualization

### Subtask:
Create time series plots for key variables (volt, rotate, pressure, vibration) using Plotly.


**Reasoning**:
Create time series plots for 'volt', 'rotate', 'pressure', and 'vibration' using Plotly.



In [None]:
import plotly.express as px

# Create time series plots for key variables
variables = ['volt', 'rotate', 'pressure', 'vibration']
for var in variables:
    fig = px.line(merged_df, y=var, title=f'Time Series Plot of {var}',
                  labels={'value': var, 'datetime': 'Date and Time'})
    fig.update_xaxes(rangeslider_visible=True)  # Add interactive range slider
    fig.show()

## Data visualization

### Subtask:
Generate a correlation heatmap of the variables in `merged_df` using Plotly.


**Reasoning**:
Calculate the correlation matrix and generate the heatmap using Plotly.



In [None]:
import plotly.express as px
import pandas as pd
import numpy as np

# Calculate the correlation matrix
numeric_cols = merged_df.select_dtypes(include=np.number).columns
numeric_cols = numeric_cols.drop('machineID') # Exclude machineID
correlation_matrix = merged_df[numeric_cols].corr()

# Create the heatmap
fig = px.imshow(correlation_matrix,
                x=correlation_matrix.columns,
                y=correlation_matrix.index,
                color_continuous_scale='RdBu',  # Use a diverging color scale
                title='Correlation Heatmap of Numerical Variables')
fig.update_layout(width=800, height=600) # Adjust figure size for better visualization
fig.show()

## Data analysis

### Subtask:
Perform time series decomposition on relevant variables (volt, rotate, pressure, vibration) to identify trends, seasonality, and residuals.


**Reasoning**:
Decompose the time series data for the specified variables and plot the results.



In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

# Iterate through the relevant variables
variables = ['volt', 'rotate', 'pressure', 'vibration']
for var in variables:
    # Perform seasonal decomposition
    decomposition = seasonal_decompose(merged_df[var], model='additive', period=24)  # Assuming daily seasonality

    # Plot the decomposed components
    plt.figure(figsize=(12, 8))
    plt.subplot(4, 1, 1)
    plt.plot(decomposition.observed)
    plt.title(f'Observed {var}')
    plt.subplot(4, 1, 2)
    plt.plot(decomposition.trend)
    plt.title(f'Trend {var}')
    plt.subplot(4, 1, 3)
    plt.plot(decomposition.seasonal)
    plt.title(f'Seasonality {var}')
    plt.subplot(4, 1, 4)
    plt.plot(decomposition.resid)
    plt.title(f'Residuals {var}')
    plt.tight_layout()
    plt.show()

    # Analyze the plots and document observations (example)
    print(f"Analysis of {var}:")
    print("Trend: ", end="") # Add your trend observation here
    print("Seasonality: ", end="") # Add your seasonality observation here
    print("Residuals: ", end="") # Add your residual observation here
    print("-" * 20)

## Summary:

### Q&A
No questions were posed in the provided task.

### Data Analysis Key Findings
*   The `PdM_telemetry.csv` and `PdM_machines.csv` datasets were successfully merged on `machineID` resulting in a combined dataset with 876100 rows and 8 columns.
*   No missing values were found in the merged dataset.
*   Time series plots of `volt`, `rotate`, `pressure`, and `vibration` were generated, revealing the trends of these variables over time.  An interactive range slider was included for detailed exploration.
*   A correlation heatmap visualized the relationships between the numerical variables in the dataset, excluding `machineID`.
*   Time series decomposition was performed on `volt`, `rotate`, `pressure`, and `vibration` to identify trends, seasonality, and residuals.  However, the provided output lacks the analysis of the generated plots.
*   New time-based features (`day_of_week`, `hour_of_day`, `month`, `week_of_year`) were engineered from the datetime index.


### Insights or Next Steps
*   Analyze the trend, seasonality, and residuals from the time series decomposition plots to gain a deeper understanding of the underlying patterns in the data.  Look for anomalies in the residuals.
*   Investigate the correlations between the variables identified in the correlation heatmap to understand potential interdependencies.  Consider feature engineering based on these relationships.


## plotly

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Iterate through the relevant variables
variables = ['volt', 'rotate', 'pressure', 'vibration']
for var in variables:
    # Perform seasonal decomposition
    # Assuming daily seasonality, adjust 'period' if needed
    decomposition = seasonal_decompose(merged_df[var], model='additive', period=24)

    # Create subplots using Plotly
    fig = make_subplots(rows=4, cols=1, subplot_titles=[f'Observed {var}', f'Trend {var}', f'Seasonality {var}', f'Residuals {var}'])

    # Add observed data
    fig.add_trace(go.Scatter(x=decomposition.observed.index, y=decomposition.observed.values, mode='lines', name='Observed'), row=1, col=1)

    # Add trend component
    fig.add_trace(go.Scatter(x=decomposition.trend.index, y=decomposition.trend.values, mode='lines', name='Trend'), row=2, col=1)

    # Add seasonal component
    fig.add_trace(go.Scatter(x=decomposition.seasonal.index, y=decomposition.seasonal.values, mode='lines', name='Seasonality'), row=3, col=1)

    # Add residuals
    fig.add_trace(go.Scatter(x=decomposition.resid.index, y=decomposition.resid.values, mode='lines', name='Residuals'), row=4, col=1)

    # Update layout
    fig.update_layout(height=800, title_text=f'Time Series Decomposition of {var}')
    fig.show()

    # Analyze the plots and document observations (example)
    print(f"Analysis of {var}:")
    print("Trend: ", end="") # Add your trend observation here
    print("Seasonality: ", end="") # Add your seasonality observation here
    print("Residuals: ", end="") # Add your residual observation here
    print("-" * 20)