In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

# Introduction to Cyclic Features

#### The cat in the dat competition is really one of its kind and there's a lot to learn from it. There are two cyclic features in the data of this competition.  But how did the cyclic features get identified? How to identify cyclic features in any dataset?

Though it is not very difficult to understand that cyclic data should show some sort of repetition, some sort of cyclic behaviour. They repeat their values after a certain number of observations and that too in a certain order. Some suitable examples are the data regarding

* days of a week (1 - 7), 
* hour of the day (1 - 24), 
* number of month (1 - 12)

Let's explore the cat data

In [None]:
train_data = pd.read_csv("/kaggle/input/cat-in-the-dat/train.csv")
test_data = pd.read_csv("/kaggle/input/cat-in-the-dat/test.csv")

In [None]:
train_data.head().T

In [None]:
train_data.day.value_counts()

In [None]:
train_data.month.value_counts()

So the cyclic columns are :-
* Day
* Month

# Why Transform?

The transformation of cyclic features is important because when cyclic features are untransformed then there's no way for the model to understand that the smallest value in the cycle is actually next to the largest value. 
For example the month December is encoded 12 and January is encoded 1 and the model should understand that they are quantitatively adjacent to each other. However in their untransformed form they are far apart numerically and categorically. That's why cyclic transformations are good to explore.

# Visualising cyclic data before transformation

In [None]:
import seaborn as sns
sns.set()

In [None]:
p = sns.countplot(train_data.day)

In [None]:
p = sns.lineplot(x = list(range(0, train_data.shape[0])), y = sorted(train_data.day))

In [None]:
p = sns.countplot(train_data.month)

In [None]:
p = sns.lineplot(x = list(range(0, train_data.shape[0])), y = sorted(train_data.month))

# Fundamental of cyclic transformation

The main idea behind cyclic transformation is to enable cyclic data to be represented on a circle.

![](http://blog.davidkaleko.com/images/unit_circle.png)

The transformations are as follows:-
* x_sin=sin(2∗π∗x/max(x)) 
* x_cos=cos(2∗π∗x/max(x))


*max(x) could be replaced by number of unique values of the cyclic features*

In [None]:
import numpy as np

train_data['day_sin'] = np.sin((train_data.day-1)*(2.*np.pi/7))
train_data['day_cos'] = np.cos((train_data.day-1)*(2.*np.pi/7))
train_data['month_sin'] = np.sin((train_data.month-1)*(2.*np.pi/12))
train_data['month_cos'] = np.cos((train_data.month-1)*(2.*np.pi/12))

In [None]:
p = sns.lineplot(x = list(range(0,30)), y = train_data.day_cos[:30])

In [None]:
p = sns.lineplot(x = list(range(0,30)), y = train_data.day_sin[:30])

# Visualising after cyclic transformation

In [None]:
sample = train_data[:5000] # roughly the first week of the data


In [None]:
ax = sample.plot.scatter('month_sin', 'month_cos').set_aspect('equal')

In [None]:
ax = sample.plot.scatter('day_sin', 'day_cos').set_aspect('equal')

### Big thanks to :-

* http://blog.davidkaleko.com/feature-engineering-cyclical-features.html
* https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/
* https://medium.com/ai%C2%B3-theory-practice-business/top-6-errors-novice-machine-learning-engineers-make-e82273d394db
* https://datascience.stackexchange.com/questions/5990/what-is-a-good-way-to-transform-cyclic-ordinal-attributes
* https://www.kaggle.com/avanwyk/encoding-cyclical-features-for-deep-learning