# AMAZON STOCKS TREND

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import dates

from pylab import rcParams
import statsmodels.api as sm

import warnings
warnings.simplefilter(action='ignore', category=Warning)

I used the ‘parse_dates’ parameter in the read_csv function to convert the ‘Date’ column to the DatetimeIndex format.

In [None]:
df = pd.read_csv("../input/amazon-stock-data/AMZN.csv", parse_dates=True, index_col = "Date")
df.head()

# FEATURE DESCRIPTION:

* Open = Price from the first transaction of a trading day
* High = Maximum price in a trading day
* Low = Minimum price in a trading day
* Close = Price from the last transaction of a trading day
* Adj Close = Closing price adjusted to reflect the value after accounting for any corporate actions
* Volume = Number of units traded in a day**

> BASIC PLOT FOR CHECKING THE TRENDS OF VOLUME:

In [None]:
df['Volume'].plot(figsize=(10,6))

**We can see alot of peaks and density between 2000 and 2010**

Lets look at how the other features are distributed

In [None]:
df.plot(subplots=True, figsize=(10,12))


**The shape of the curve for ‘Open’, ‘Close’, ‘High’ and ‘Low’ data have the same shape. Only the ‘Volume’ has a different shape.**

# SEASONALITY:

Resampling for months or weeks and making bar plots is another very simple and widely used method of finding seasonality. Here I am making a bar plot of month data in 2020

In [None]:
df_month = df.resample("M").mean()
fig, ax = plt.subplots(figsize=(12, 6))
ax.xaxis.set_major_formatter(dates.DateFormatter('%Y-%m'))
ax.bar(df_month['2020':].index, df_month.loc['2020':, "Volume"], width=25, align='center')

Each bar represents a month. A huge spike in April 2020. Otherwise, there is monthly seasonality after 2020 ended.

# RESAMPLING AND ROLLING:

Resampling is very common in time-series data. Most of the time resampling is done to a lower frequency.Though resampling of higher frequency is also necessary especially for modeling purposes. Not so much in data analysis purpose.
In the ‘Volume’ data we are working on right now, we can observe some big spikes here and there. These types of spikes are not helpful for data analysis or for modeling. normally to smooth out the spikes, resampling to a lower frequency and rolling is very helpful.

In [None]:
start, end = '2017-01', '2017-06'
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df.loc[start:end, 'Volume'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
ax.plot(df_month.loc[start:end, 'Volume'],
marker='o', markersize=8, linestyle='-', label='Monthly Mean Resample')
ax.set_ylabel('Volume')
ax.legend();

**WEEKLY RESAMPLE**

In [None]:
df_week = df.resample("W").mean()

In [None]:
start, end = '2020-01', '2020-08'
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df.loc[start:end, 'Volume'], marker='.', linestyle='-', linewidth = 0.5, label='Daily', color='black')
ax.plot(df_week.loc[start:end, 'Volume'], marker='o', markersize=8, linestyle='-', label='Weekly', color='coral')
ax.set_ylabel("Open")
ax.legend()

# ROLLING:

Rolling is another very helpful way of smoothing out the curve. It takes the average of a specified amount of data. If I want a 7-day rolling, it gives us the 7-day average data.

We are doing it on the above plot

In [None]:
df_7d_rolling = df.rolling(7, center=True).mean()
start, end = '2016-06', '2017-05'
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df.loc[start:end, 'Volume'], marker='.', linestyle='-', 
        linewidth=0.5, label='Daily')
ax.plot(df_week.loc[start:end, 'Volume'], marker='o', markersize=5, 
        linestyle='-', label = 'Weekly mean volume')
ax.plot(df_7d_rolling.loc[start:end, 'Volume'], marker='.', linestyle='-', label='7d Rolling Average')
ax.set_ylabel('Stock Volume')
ax.legend()

 7-d rolling average is a bit smoother than the weekly average.

# PLOTTING THE CHANGE:

# SHIFT:

The shift function shifts the data before or after the specified amount of time. It will shift the data by one day by default. That means you will get the previous day's data. In financial data like this one, it is helpful to see previous day data and today's data side by side.

In [None]:
df['Change'] = df.Close.div(df.Close.shift())
df['Change'].plot(figsize=(20, 8), fontsize = 16)

In the code above, .div() helps to fill up the missing data. Actually, div() means division. df. div(6) will divide each element in df by 6. But here I used ‘df.Close.shift()’. So, Each element of df will be divided by each element of ‘df.Close.shift()’. We do this to avoid the null values that are created by the ‘shift()’ operation.

This is the plot of 2001 only.

In [None]:
df['2001']['Change'].plot(figsize=(10, 6))

# PERCENTAGE CHANGE:

There is a percent change function available to get the percent_change data.

I've chose only the first 100 data entries.

In [None]:
df_month.loc[:, 'pct_change'] = df.Close.pct_change()*100
fig, ax = plt.subplots(figsize=(20, 8))
df_month['pct_change' ].head(100).plot(kind='bar', color='violet', ax=ax)
ax.xaxis.set_major_locator(dates.WeekdayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%b %d'))
plt.xticks(rotation=45)
ax.legend()

We can clearly see the percentage change in the data.

# DIFFERENCING:

Differencing takes the difference in values of a specified distance.It is a popular method to remove the trend in the data. The trend is not good for forecasting or modeling.

I've used expanding window,an another way of transformation. It keeps adding the cumulative. For example, if you add an expanding function to the ‘High’ column first element remains the same. The second element becomes cumulative of the first and second element, the third element becomes cumulative of the first, second, and third element, and so on. You can use aggregate functions like mean, median, standard deviation, etc. on it too

In [None]:
fig, ax = plt.subplots(figsize=(20, 8))
ax = df.High.plot(label='High')
ax = df.High.expanding().mean().plot(label='High expanding mean')
ax = df.High.expanding().std().plot(label='High expanding std')
ax.legend()

# DECOMPOSITION:

Decomposition will show the observations and these three elements in the same plot:
* Trend: Consistent upward or downward slope of a time series.
* Seasonality: Clear periodic pattern of a time series
* Noise: Outliers or missing values

In [None]:
rcParams['figure.figsize'] = 11, 9
decomposition = sm.tsa.seasonal_decompose(df_month['Volume'], model='Additive')
fig = decomposition.plot()
plt.show()


Here the trend is the moving average. To give you a high-level idea of residuals, here is the general formula:
**Original observations = Trend + Seasonality + Residuals**

# REFERENCE:
[https://towardsdatascience.com/a-complete-guide-to-time-series-data-visualization-in-python-da0ddd2cfb01](http://)