#### Importing required libraries

In [None]:
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")

#### Let us look at the list of files avaiable

In [None]:
base_dir = "/kaggle/input/liverpool-ion-switching"
print(os.listdir(base_dir))

#### Let us read the files and look what is present in the files

In [None]:
train = pd.read_csv(f"{base_dir}/train.csv")
test = pd.read_csv(f"{base_dir}/test.csv")
sample_submission = pd.read_csv(f"{base_dir}/sample_submission.csv")
print("Train Dimensions: ",train.shape)
print("Test Dimensions: ",test.shape)

#### Lets have a glimpse at train data and see what columns are present

In [None]:
train.head(10)

It was mentioned in the competition data page that `open_channels` is our target column. Also data is from discrete batches of 50 seconds long 10 kHz samples (500,000 rows per batch). Lets explore few things
* Distiribution of target column `open_channels`
* Is there any difference between data produced in different batches
* How signal varies over time 

#### Distribution of open_channels

In [None]:
channels_dist = train['open_channels'].value_counts().rename_axis('Channel').reset_index(name='count')

plt.figure(figsize=(12, 6))
sns.barplot(x = 'Channel', y = 'count', data = channels_dist,palette="Blues_d")
plt.title("Count of open channels in train data")
plt.show()

There are 11 possible values of `open-channels` ranging from 0 to 10. The most common value is 0 which means there are no open channels most of them time.

Lets look at the first 10 seconds signal vs time data and see if we can infer anything

In [None]:
fig,ax = plt.subplots(ncols=1, nrows=2,figsize=(16,10))
sns.lineplot(x="time", y="signal", data=train[train['time'] <=10], ax = ax[0])
sns.lineplot(x="time", y="open_channels", data=train[train['time'] <=10], ax = ax[1])
plt.show()

In [None]:
train[train['time'] <=50]['open_channels'].value_counts()

Lets create an identifier for the batch in which signal is generated 

In [None]:
train['batch'] =  pd.cut(train['time'],10, labels = list(range(1,11)))

Lets look at the distribution of `open_channels` for different batches of time

In [None]:
grid_data  = train.groupby(['batch','open_channels']).count().reset_index()
grid_data = grid_data.rename(columns = {'time':'count'})
plt.figure(figsize = (16,16))
g = sns.FacetGrid(grid_data, col="batch", col_wrap=3, height=5)
g = g.map(plt.bar, "open_channels", "count")
plt.show()

### Inference 
* It looks like almost no channels are open in batches 1 and 2
* Distribution of `open_channels` in batch (5,10), (3,7) and (4,8) looks similar


Lets look at how `signal` and `open_channels` varies together with time

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(20,10))
ax1.plot(train["time"], train["signal"], color="blue")
ax1.set_title('Signal',fontsize=20)
ax2.plot(train["time"], train["open_channels"], color="blue")
ax2.set_title('Open Channels', fontsize=20)
plt.xlabel("Time", fontsize=20)
plt.show()

We can see from above graphs that `signal` is negative and maximum of 1 channel is open for the first 150 seconds. Number of open channels increases as the signal increases from 200s to 300s. We can say that there is high correlation between strenth of the `signal` and number of `open_channels`. It is alo interesting that `singal` has some sinusoidal growth from t=300s. 

### Stay tuned for further updates