**What is the Stock Market?**
            
            The stock market refers to the collection of markets and exchanges where regular activities of buying, selling, and issuance of shares of publicly-held companies take place. Such financial activities are conducted through institutionalized formal exchanges or over-the-counter (OTC) marketplaces which operate under a defined set of regulations. There can be multiple stock trading venues in a country or a region which allow transactions in stocks and other forms of securities.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from pandas.plotting import table
import datetime as dt
import warnings
warnings.filterwarnings("ignore")

**Dataset Loading and Processing Segment**

In [None]:
# Load the dataset
Data = pd.read_csv("../input/msft-dataset/MSFT.csv")

In [None]:
Data.head()

In [None]:
Data.tail()

In [None]:
Data.info()

In [None]:
Data.shape

In [None]:
Data.dtypes

In [None]:
Data.columns

In [None]:
Data['Date'] = pd.to_datetime(Data['Date'])

In [None]:
# Get initial descriptive statistics
Data.describe(include="all")

In [None]:
plt.plot(Data['Date'], Data['Close'])
plt.show()

In [None]:
# Before proceeding, check for NULL values. If found, perform imputation
Data.isnull().values.sum() # In this case, it is 0. So, we can proceed

**Variation of Stock Trade Over Time**

In [None]:
# A glimpse of how the market shares varied over the given time

# Create a list for numerical columns that are to be visualized
Column_List = ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']

# Plot to view the same
Data.plot(x = "Date", y = Column_List, subplots = True, layout = (3, 3), figsize = (15, 15), sharex = False, title = "Stock Value Trend from 2010 - 2012", rot = 90)

In [None]:
# Visualize the spread and skweness through the distribution plot

# Use the Column_List : list initialized above in the following steps
fig, ax = plt.subplots(len(Column_List), figsize = (15, 10))

for i, col_list in enumerate(Column_List):
    sns.distplot(Data[col_list], hist = True, ax = ax[i])
    ax[i].set_title ("Frequency Distribution of" + " " + col_list, fontsize = 10)
    ax[i].set_xlabel (col_list, fontsize = 8)
    ax[i].set_ylabel ('Distribution Value', fontsize = 8)
    fig.tight_layout (pad = 1.1) # To provide space between plots
    ax[i].grid('on') # Enabled to view and make markings

In [None]:
import seaborn as sns
plt.figure(figsize=(12,6))
autoDataColumns = ['Open','High','Low','Close','Adj Close','Volume']
sns.heatmap(Data[autoDataColumns].corr(), annot=True, fmt='.6f', linewidths=.5)

**Correlation Analysis**

In [None]:
# View the matrix in a table to identify the numerical values of strengths
corr_matrix

In [None]:
Data.index = Data['Date']
plt.figure(figsize=(16,8))
plt.plot(Data['Close'], label='Close Price History',color='r')
plt.xlabel('Date',size=20)
plt.ylabel('Stock Price',size=20)
plt.title('Stock Price of Microsoft over the Years',size=15)

**Outlier Detection and Removal**

**Extensive Analysis on Historical Data to Find Patterns**

In [None]:
# Since the data is a time series data, we should be able to predict the future through forecasting techniques

# Delete the index column due to reset
#del Data['index']

# Decompose the time series year-wise and month-wise to analyse further
Data['Year'] = Data['Date'].dt.year
Data['Month'] = Data['Date'].dt.month
Data['WeekDay'] = Data['Date'].dt.weekday

# Firstly, plot the data year-wise to see the duration of when it hiked and dipped
fig, ax = plt.subplots(len(Column_List), figsize = (10, 20))

# Group the data by year and plot
for i, col_list in enumerate(Column_List):
    Data.groupby('Year')[col_list].plot(ax = ax[i], legend = True)
    ax[i].set_title("Stock Price Movement Grouped by Year on" + " " + col_list, fontsize = 10)
    ax[i].set_ylabel(col_list + " " + "Price", fontsize = 8)
    fig.tight_layout(pad = 1.1)
    ax[i].yaxis.grid(True) # To enable grid only on the Y-axis

**Pie charts to show the extensive influence of time in the overall volume trade**

**Analysis:**

**Year Information:** 1986 - 2020

**Month Information:** All 12 months (January, February, March, April, May, June, July, August, September, October, November, and December)

**Day Information:** Only 5 working days (Monday, Tuesday, Wednesday, Thursday and Friday)

In [None]:
# Analyse based on Year
for i, col_list in enumerate(Column_List):
    var = Data.groupby('Year')[col_list].sum()
    
# Convert the variable into a pandas dataframe
var = pd.DataFrame(var)

# Plot to understand the trend
plt.figure(figsize = (16, 7))
ax1 = plt.subplot(121)
var.plot(kind = "pie", y = "Volume", legend = False, fontsize = 12, sharex = False, title = "Time Series Influence on Total Volume Trade by Year", ax = ax1)

# Plot the table to identify numbers
ax2 = plt.subplot(122)
plt.axis('off') # Since we are plotting the table
tbl = table(ax2, var, loc = 'center')
tbl.auto_set_font_size(False)
tbl.set_fontsize(12)
plt.show()

In [None]:
# Analyse based on Year
for i, col_list in enumerate(Column_List):
    var = Data.groupby('Month')[col_list].sum()
    
# Convert the variable into a pandas dataframe
var = pd.DataFrame(var)

# Plot to understand the trend
plt.figure(figsize = (16, 7))
ax1 = plt.subplot(121)
var.plot(kind = "pie", y = "Volume", legend = False, fontsize = 12, sharex = False, title = "Time Series Influence on Total Volume Trade by Month", ax = ax1)

# Plot the table to identify numbers
ax2 = plt.subplot(122)
plt.axis('off') # Since we are plotting the table
tbl = table(ax2, var, loc = 'center')
tbl.auto_set_font_size(False)
tbl.set_fontsize(12)
plt.show()

In [None]:
plt.figure(figsize=(12,8))
Data.groupby([Data.Date.dt.year,Data.Date.dt.month])["Volume"].mean().plot(color="#ad4073",marker=".")
plt.xlabel("Date",color="#913653", size=18)
plt.ylabel("Volumne",color="#913653", size=18)
plt.title("All Times Monthly Avarage Volume",size=22)

plt.yscale("log")
plt.grid(color="blue", linestyle="--", which="minor")
plt.xticks(fontsize=12) 
plt.yticks(fontsize=12, color="#3d8f6e") 
plt.show()

In [None]:
# Analyse based on Year
for i, col_list in enumerate(Column_List):
    var = Data.groupby('WeekDay')[col_list].sum()
    
# Convert the variable into a pandas dataframe
var = pd.DataFrame(var)

# Plot to understand the trend
plt.figure(figsize = (16, 7))
ax1 = plt.subplot(121)
var.plot(kind = "pie", y = "Volume", legend = False, fontsize = 12, sharex = False, title = "Time Series Influence on Total Volume Trade by WeekDay", ax = ax1)

# Plot the table to identify numbers
ax2 = plt.subplot(122)
plt.axis('off') # Since we are plotting the table
tbl = table(ax2, var, loc = 'center')
tbl.auto_set_font_size(False)
tbl.set_fontsize(12)
plt.show()