# Introduction

This Dataset includes stock prices of Turkish Airlines since 2013. I dedicate this to my cousin, who have invested long ago, waiting for the stock prices to elevate into the previous values.

1. [Load The Data](#1)
2. [Variable Description](#2)
     * [Univariate Variable Analysis](#3)
         * [Categorical Variable Analysis](#4)
         * [Numerical Variable Analysis](#5)
3. [Basic Data Analysis](#6)
4. [Outlier Detection](#7)
5. [Missing Value](#8)
    * [Find Missing Value](#9)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import seaborn as sns
from collections import Counter 
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

<a id = '1' >  </a> 
# Load The Data

In [None]:
data = pd.read_csv('/kaggle/input/turkish-airlines-daily-stock-prices-since-2013/cleanThy.csv')

In [None]:
data.columns
data.columns = ["Date", "Last_Price", "Lowest_Price", "Highest_Price", "Volume"]
#some preprocess on column indexes

In [None]:
data.head(20) #let us have a glance on the data, to have an idea whats going on

In [None]:
data.describe()

Since data contains a wide range of time interval, these values should be deceptive for us.

<a id = '2' >  </a> 
# Variable Description
* Date
* Last Price
* Lowest Price 
* Highest Price 
* Volume

In [None]:
data.info()

* float64(3) : Last Price, Lowest Price, Highest Price
* int64(1)   : Volume
* object(1)  : Date

<a id = '4' >  </a> 
## Categorical Variable Analysis
'Date' is a special type of categorical variable. Proper pre-processing may lead obtaining important result in may datasets. However, we will not consider this variable in this notebook.

<a id = '5' >  </a> 
## Numerical Variable Analysis
' Last Price', ' Lowest Price', ' Highest Price', ' Volume' are the numerical variables in this dataset.

In [None]:
def plot_variables_byDate(variable):
    #plt.plot(figsize = (9,3))
    #plt.plot(data[variable])
    sns.set(style="whitegrid")
    sns.lineplot(data= data[variable], palette="magma",)
    plt.xlabel('Date')
    plt.ylabel(variable)
    plt.title( '{} vs Date' .format(variable) )
    plt.show()

In [None]:
myVariables = ['Last_Price', 'Lowest_Price', 'Highest_Price', 'Volume']
for i in myVariables:
    plot_variables_byDate(i)

<a id = '6' >  </a> 
# Basic Data Analysis

In [None]:
data[['Last_Price','Volume']].groupby(['Volume'],as_index = False).mean()

In [None]:
data[['Lowest_Price','Volume']].groupby(['Volume'],as_index = False).mean()

In [None]:
data[['Highest_Price','Volume']].groupby(['Volume'],as_index = False).mean()

In [None]:
def scatterPlots(variable1,variable2):
    data.plot(kind = 'scatter' , x = variable1, y = variable2, color = 'red', figsize = (15,15))
    plt.xlabel('Lowest_Price')
    plt.ylabel('Highest_Price')
    plt.show()

In [None]:
variable1 = ['Last_Price','Lowest_Price','Highest_Price']
for i in variable1:
    scatterPlots(i,'Volume')

In [None]:
variable1 = ['Last_Price','Lowest_Price']
for i in variable1:
    scatterPlots(i,'Highest_Price')

In [None]:
variable1 = ['Last_Price']
for i in variable1:
    scatterPlots(i,'Lowest_Price')

<a id = '7' >  </a> 
# Outlier Detection
For a stock data, I think looking for outlier does not make sense. It is actually loosing actual data, since stock data has outliers in nature and loosing that outliers may mean loosing important anomaly data, that may required to be detected (Anomaly and outliers are different things). But it may make sense for some cases, therefore I present it anyway.

In [None]:
def detect_outliers(data,features):
    outlier_indices = []
    
    for c in features:
        #1st Quartile
        Q1 = np.percentile(data[c],25)
        #3rd Quartile
        Q3 = np.percentile(data[c],75)
        #IQR
        IQR = Q3 - Q1
        #Outlier Step
        outlier_step = IQR * 1.5
        #detect outliers and indices
        outlier_list_col = data[(data[c] < Q1 - outlier_step)|( data[c] < Q3 + outlier_step )].index
        outlier_indices.extend(outlier_list_col)
        
    outlier_indices = Counter(outlier_indices)
    multiple_outliers = list(i for i,v in outlier_indices.items() if v > 2)
    
    return multiple_outliers

In [None]:
data.loc[detect_outliers(data,['Last_Price','Highest_Price','Lowest_Price','Volume'])]

In [None]:
#Drop Outliers
data.drop(detect_outliers(data,['Last_Price','Highest_Price','Lowest_Price','Volume']),axis = 0).reset_index(drop = True)

In [None]:
myVariables = ['Last_Price', 'Lowest_Price', 'Highest_Price', 'Volume']
for i in myVariables:
    plot_variables_byDate(i)

<a id = '8' >  </a> 
# Missing Value
* Find Missing Value

<a id = '9' >  </a> 
## Find Missing Value

In [None]:
data.columns[data.isnull().any()]

No missing data is detected. Please leave comments for any contributions.