#### As we have already seen ([here](http://https://www.kaggle.com/arnabbiswas1/are-the-models-throwing-noise)), features present in the data have very little predictive power. That made me think if there is any information hidden inside the data which could be extracted to help the models.

### In this notebook, I have tried to figure out, if there is any feature which is temporal in nature? 

### Spoiler Alert: None of the features are time series.

#### How do I do that?
#### For every feature, I have plotted Autocorrelation as well as Partial Autocorrelation plot. If there is any temporal dependepcy in the data, that should be displayed in the plot.

In [None]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf

matplotlib.style.use("dark_background")

In [None]:
def plot_acf_pacf(df, feature_name, lags=50, figsize=(10, 4)):
    """
    Plot ACF and PACF side by side
    """
    fig, ((ax1, ax2)) = plt.subplots(1, 2, figsize=figsize)
    plot_acf(df[feature_name], ax=ax1, lags=lags, title=f"ACF for {feature_name}")
    plot_pacf(df[feature_name], ax=ax2, lags=lags, title=f"PACF for {feature_name}")
    plt.show()

    
def plot_point(df, feature_name, figsize=(10, 5)):
    """
    Plot line for a particular feature for the DF
    """
    df[feature_name].plot(
        kind="line",
        style=".",
        figsize=figsize,
        alpha=0.4,
        title=f"Plot for {feature_name} distribution",
    )
    plt.ylabel(f"Value of {feature_name}")
    plt.legend()
    plt.show()

### To see how Autocorrelation and Partial Autocorrelation looks like for a real Time Series data, I am going to plot for Temperature and Relative Humidity from July month's competition data. And then try those two plots on the features from August month's data.

#### Load training data from July and August.

In [None]:
train_july = pd.read_csv("/kaggle/input/tabular-playground-series-jul-2021/train.csv")
train_aug = pd.read_csv("/kaggle/input/tabular-playground-series-aug-2021/train.csv")

### Plot Autocorrelation and Partial Autocorrelation for July month's data

In [None]:
for name in ["deg_C", "relative_humidity"]:
    plot_acf_pacf(train_july, feature_name=name, figsize=(15, 3))

#### The repetative pattern is very clear when the data is temporal in nature.

### Plot Autocorrelation and Partial Autocorrelation for all the 100 features from August

#### As we will see, none of the features displays the pattern present in the temperature & relative humidity from July. That indicates none of the features are temporal in nature (To keep things simple, I am not getting into the details of Autocorrelation/Partial Autocorrelation here)

In [None]:
for name in train_aug.columns.drop(["id", "loss"]):
    plot_acf_pacf(train_aug, feature_name=name, figsize=(15, 3))

#### Just as an additional check, I am plotting first 10000 values of every feature to see if any cyclic behavior (which is common in Time Series Data) is displayed.

In [None]:
for name in train_aug.columns.drop(["id", "loss"]):
    plot_point(train_aug[0:10000], feature_name=name, figsize=(15, 3))

#### Conclusion: None of the features are displaying any temporal behavior. So, time series releated feature engineering may not help.