# Introduction
This is generated kernel with starter code demonstrating how to read in the data and begin exploring. If you're inspired to dig deeper, click the blue "Fork Notebook" button at the top of this kernel to begin editing.

## Exploratory Analysis
To begin this exploratory analysis, first import libraries and define functions for plotting the data using `matplotlib`. Depending on the data, not all plots will be made.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Let's check the file: /kaggle/input/bitcoin-stock-rates-20092020/BTC-INR.csv

In [None]:
df = pd.read_csv("/kaggle/input/bitcoin-stock-rates-20092020/BTC-INR.csv")
df

Lower the column names for easy typing and fast.

In [None]:
df.columns= df.columns.str.lower()
df.describe()

Getting info for dataset

In [None]:
df.info()

Let's add the Year and Month columns for analysis different Year and Months.

In [None]:
df["date_year"] = pd.DatetimeIndex(df["date"]).year
df["date_month"] = pd.DatetimeIndex(df["date"]).month

In [None]:
df.info()

Check the null values.

In [None]:
df.isnull().sum()

Drop the NULL rows.

In [None]:
df = df.dropna()

### Import libraries for generating graphs.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

Creating year-wise dataframes except 2020 because not having the data for all months in 2020.

In [None]:
df_2015 = df[df["date_year"]==2015]
df_2016 = df[df["date_year"]==2016]
df_2017 = df[df["date_year"]==2017]
df_2018 = df[df["date_year"]==2018]
df_2019 = df[df["date_year"]==2019]

In [None]:
df_2015

### Plot the close cost for Bitcoin for all available years.

In [None]:
fig = plt.figure(figsize=(16,9), dpi= 80)
ax1 = fig.add_subplot()
# some data
for i in range(2015, 2020):
    x = eval(f"df_{i}.index")
    y = eval(f"df_{i}['close']")
    ax1.plot(x, y)

In [None]:
plt.figure(figsize=(16,9), dpi= 80)
sns.set_color_codes()
sns.distplot(df_2015['close'],color = 'slateblue')
plt.show()

In [None]:
sns.set_style("white")

x1 = df.loc[df.date_year==2015, 'close']
x2 = df.loc[df.date_year==2016, 'close']

# Plot
kwargs = dict(hist_kws={'alpha':.6}, kde_kws={'linewidth':2})

plt.figure(figsize=(16,9), dpi= 80)
sns.distplot(x1, label=2015, **kwargs)
sns.distplot(x2, label="2016", **kwargs)
plt.legend()

In [None]:
sns.set_style("white")
plt.figure(figsize=(16,9), dpi= 80)
for i in range(2015, 2017):
    x = eval(f"df.loc[df.date_year=={i}, 'close']")
    # Plot
    kwargs = dict(hist_kws={'alpha':.6}, kde_kws={'linewidth':2})
    sns.distplot(x, label=i, **kwargs)
    plt.legend()

In [None]:
# instanciate the figure
fig = plt.figure(figsize = (10, 8))

# ----------------------------------------------------------------------------------------------------
# plot the data
# the idea is to iterate over each class
# extract their data ad plot a sepate density plot
for i in range(2017, 2020):
    # extract the data
    x = eval(f"df.loc[df.date_year=={i}, 'close']")
    # plot the data using seaborn
    sns.kdeplot(x, shade=True, label = "{} year".format(i))

# set the title of the plot
plt.title("Density Plot of City Mileage by n_cilinders")

In [None]:
# Scatter and density plots
def plotScatterMatrix(df, plotSize, textSize):
    df = df.select_dtypes(include =[np.number]) # keep only numerical columns
    # Remove rows and columns that would lead to df being singular
    df = df.dropna('columns')
    df = df[[col for col in df if df[col].nunique() > 1]] # keep columns where there are more than 1 unique values
    columnNames = list(df)
    if len(columnNames) > 10: # reduce the number of columns for matrix inversion of kernel density plots
        columnNames = columnNames[:10]
    df = df[columnNames]
    ax = pd.plotting.scatter_matrix(df, alpha=0.75, figsize=[plotSize, plotSize], diagonal='kde')
    corrs = df.corr().values
    for i, j in zip(*plt.np.triu_indices_from(ax, k = 1)):
        ax[i, j].annotate('Corr. coef = %.3f' % corrs[i, j], (0.8, 0.2), xycoords='axes fraction', ha='center', va='center', size=textSize)
    plt.suptitle('Scatter and Density Plot')
    plt.show()

### Scatter and density plots:

In [None]:
plotScatterMatrix(df_2015, 15, 10)

In [None]:
plotScatterMatrix(df_2016, 15, 10)

In [None]:
plotScatterMatrix(df_2017, 15, 10)

In [None]:
plotScatterMatrix(df_2018, 15, 10)

In [None]:
plotScatterMatrix(df_2019, 15, 10)

# Pandas Profiling for Dataset

1. Install it with this command:

In [None]:
pip install pandas-profiling

## import pandas_profiling

In [None]:
from pandas_profiling import ProfileReport

### Generate Profile

Generate profile of your data using ProfileReport. And save it to .html file.

In [None]:
profile = ProfileReport(df, title="Profiling Report: bitcoin-stock-rates")
profile.to_file(output_file='profile.html')

In [None]:
!pip install art
import art

In [None]:
art.tprint("DONE.")