Everyone into the **Stock Markets** these days and I got curious as well. How do stocks could be analyzed? Even though I am pretty sure I did a good job in a process, but I would not use it as a 100% sure thing to invest😊. And in a process, I learned a lot of Python which is always a good thing.

In this kernel, I would take a look at the latest **Shopify**, **JD.com** and **Alibaba** stocks data beginning from 2015.

**Shopify** is a multinational Canadian e-commerce company with headquarters in Ottawa, Ontario that was founded in 2004. The company offers business e-commerce platforms for online stores, including payments, marketing, shipping and customer engagement tools to simplify the process of running an online store for small merchants. In 2019 Shopify announced reaching over 1,000,000 businesses in approximately 175 countries.
**JD.com** is an e-commerce Chinese company headquartered in Beijing that was founded in 1998. It is one of two huge business-to-consumer online retailers, a member of Fortune Global 500 with a total revenue of $67 billion in 2018.

**Alibaba** is the second top e-commerce and technology company in China and a major competitor to JD.com. It was created in 1999 and it's now headquartered in Hangzhou. The company offers sales services via web portals such as business-to-business, business-to-consumer, and consumer-to-consumer, as well as electronic payment services, shopping search engines, and cloud computing services. It' revenue in 2019 was over $56 billion.

Ok. Here we go...

In [1]:
pip install pandas-datareader



In [2]:
#importing all of the neccesary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import pandas_datareader
import datetime

import pandas_datareader.data as web

To keep the dataset up to date, I decided to use .today() instead of just setting an exact date. Now it always has the latest data.

In [3]:
#creating a start and an end dates:
start = datetime.datetime(2016,1,1)
end = datetime.datetime.today()

The next step will be downloading the data. I used 'yahoo' as a source since 'google' would result in "*NotImplementedError: data_source='google' is not implemented*"
Tickers for the stocks can be found online. I found mine by just googling the company names.

In [5]:
!pip install yfinance
import yfinance as yf



In [7]:
#Shopify
# shop = web.DataReader("SHOP", "yahoo", start, end)
# shop.head()

# Fetch the data using yfinance
shop = yf.download("SHOP", start=start, end=end)
print(shop.head())

[*********************100%***********************]  1 of 1 completed

Price      Adj Close  Close    High    Low   Open   Volume
Ticker          SHOP   SHOP    SHOP   SHOP   SHOP     SHOP
Date                                                      
2016-01-04     2.572  2.572  2.5835  2.452  2.542  6018000
2016-01-05     2.531  2.531  2.6500  2.526  2.550  4182000
2016-01-06     2.530  2.530  2.5340  2.418  2.495  2566000
2016-01-07     2.467  2.467  2.5580  2.463  2.500  4160000
2016-01-08     2.493  2.493  2.5370  2.470  2.500  1374000





In [9]:
#JD.com
jd = yf.download('JD', start, end)
jd.head()

[*********************100%***********************]  1 of 1 completed


Price,Adj Close,Close,High,Low,Open,Volume
Ticker,JD,JD,JD,JD,JD,JD
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2016-01-04,27.58563,29.530001,30.66,29.0,30.66,18265300
2016-01-05,28.024683,30.0,30.299999,29.6,30.049999,9426400
2016-01-06,27.800488,29.76,30.030001,29.049999,29.23,12988900
2016-01-07,26.109663,27.950001,29.15,27.65,28.34,18155700
2016-01-08,25.801392,27.620001,28.9,27.469999,28.58,15164100


In [11]:
#Alibaba
alibaba = yf.download('BABA', start, end)
alibaba.head()

[*********************100%***********************]  1 of 1 completed


Price,Adj Close,Close,High,Low,Open,Volume
Ticker,BABA,BABA,BABA,BABA,BABA,BABA
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2016-01-04,74.06311,76.690002,78.309998,75.18,78.18,23066300
2016-01-05,75.936653,78.629997,78.68,77.260002,77.919998,14258900
2016-01-06,74.68119,77.330002,78.485001,76.970001,77.120003,11569300
2016-01-07,70.229095,72.720001,75.5,71.540001,73.290001,27288100
2016-01-08,68.374863,70.800003,74.660004,70.669998,74.330002,20814600


***Stock Market Data Visualization***

First, plot 'Open Price' for each of the stocks at the same graph. The trading 'Opening Price' for JD.com is pretty consistent with a small increase in 2017 and 2018. However, 'Opening Prices' for the other two have been increasing over time with a positive trend. Alibaba's Opening Price does not have any high picks. However, for Shopify, there 2 distinctive high picks, and the second pick is the all-time highest and still is in progress.

*Note: to get rid of all of the black-boxed messages type ";" at the end of the code.*

In [None]:
shop['Open'].plot(label = 'Shopify', figsize = (16,10), title = 'Opening Prices')
jd['Open'].plot(label = 'JD.com')
alibaba['Open'].plot(label = 'Alibaba')
plt.legend(loc = 'best');

Now let's plot the Volume of each stock that was traded every day

In [None]:
shop['Volume'].plot(label = 'Shopify', figsize = (16,10), title = 'Volume Traded')
jd['Volume'].plot(label = 'JD.com')
alibaba['Volume'].plot(label = 'Alibaba')
plt.legend();

Shopify's stock is not being traded at the same volume as the other two. However, for JD.com I would distinguish 2 high picks and for Alibaba company there 4 very high peaks.

Another useful visual would be a visual of the total volume that was traded.

In [None]:
shop['Total Traded'] = shop['Open']*shop['Volume']
jd['Total Traded'] = jd['Open']*jd['Volume']
alibaba['Total Traded'] = alibaba['Open']*alibaba['Volume']

shop['Total Traded'].plot(figsize = (16,8), label = 'Shopify')
jd['Total Traded'].plot(figsize = (16,8), label = 'JD.com')
alibaba['Total Traded'].plot(figsize = (16,8), label = 'Alibaba')
plt.legend(loc = 'best');

To make it more precise, I would use average prices instead of the price at the opening.

In [None]:
shop['Avg'] = shop[['High', 'Low']].mean(axis=1)
jd['Avg'] = jd[['High', 'Low']].mean(axis=1)
alibaba['Avg'] = alibaba[['High', 'Low']].mean(axis=1)

shop['Total Traded New'] = shop['Avg']*shop['Volume']
jd['Total Traded New'] = jd['Avg']*jd['Volume']
alibaba['Total Traded New'] = alibaba['Avg']*alibaba['Volume']

shop['Total Traded New'].plot(figsize = (16,8), label = 'Shopify')
jd['Total Traded New'].plot(figsize = (16,8), label = 'JD.com')
alibaba['Total Traded New'].plot(figsize = (16,8), label = 'Alibaba')
plt.legend(loc = 'best');

Now we can compare Total Traded vs Total Traded New.

In [None]:
shop['Total Traded'].plot(figsize = (16,8), label = 'Shopify')
shop['Total Traded New'].plot(figsize = (16,8), label = 'Shopify of Avg')
plt.legend(loc = 'best');

### **Moving Average**

Moving Average (MA) is widely used in technical analysis to smooth out the price by taking out the "noise" from random short-term price changes. Since it is based on historical prices, it is a lagging or trend-following indicator.

In [None]:
shop['MA50'] = shop['Open'].rolling(50).mean()
shop['MA200'] = shop['Open'].rolling(200).mean()
shop[['Open','MA50','MA200']].plot(figsize = (16,10))

jd['MA50'] = jd['Open'].rolling(50).mean()
jd['MA200'] = jd['Open'].rolling(200).mean()

alibaba['MA50'] = alibaba['Open'].rolling(50).mean()
alibaba['MA200'] = alibaba['Open'].rolling(200).mean()

## Basic Analysis

In [None]:
from pandas.plotting import scatter_matrix

ret_comp = pd.concat([shop['Open'],jd['Open'],alibaba['Open']], axis = 1)
ret_comp.columns = ['Shopify Open', 'JD Open', 'Alibaba Open']
scatter_matrix(ret_comp, figsize =(8,8), alpha = 0.2, hist_kwds={'bins':50});

Based on the graphs above, Alibaba's and JD's opening prices are positively correlated, but their histograms have 2 distinctive peaks showing that the data bimodal distribution. It often means that there are two different processes in the displayed data. None of the histograms are symmetrical or skewed.

### Candlestick chart for a specific date

In [None]:
pip install mpl-finance

In [None]:
from mpl_finance import candlestick_ohlc
from matplotlib.dates import DateFormatter,date2num,WeekdayLocator, DayLocator, MONDAY

shop_jan19 = shop.loc['2020-01'].reset_index()
shop_jan19['date_ax'] = shop_jan19['Date'].apply(lambda date: date2num(date))

list_of_col = ['date_ax', 'Open', 'High', 'Low', 'Close']
shop_values = [tuple(vals) for vals in shop_jan19[list_of_col].values]

mondays = WeekdayLocator(MONDAY)        # major ticks on the mondays
alldays = DayLocator()              # minor ticks on the days
weekFormatter = DateFormatter('%b %d')  # e.g., Jan 12
dayFormatter = DateFormatter('%d')      # e.g., 12

fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)

candlestick_ohlc(ax, shop_values, width=0.6, colorup='g',colordown='r');

In January of 2020, Shopify's stock price keeps growing with just a few decreases marked in red. Latest move down was around January 24th, and on January 27th it picks back up.  

In [None]:
jd_jan19 = jd.loc['2020-01'].reset_index()
jd_jan19['date_ax'] = jd_jan19['Date'].apply(lambda date: date2num(date))

list_of_col = ['date_ax', 'Open', 'High', 'Low', 'Close']
jd_values = [tuple(vals) for vals in jd_jan19[list_of_col].values]

fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)

candlestick_ohlc(ax, jd_values, width=0.6, colorup='g',colordown='r');

In [None]:
alibaba_jan19 = alibaba.loc['2020-01'].reset_index()
alibaba_jan19['date_ax'] = alibaba_jan19['Date'].apply(lambda date: date2num(date))

list_of_col = ['date_ax', 'Open', 'High', 'Low', 'Close']
alibaba_values = [tuple(vals) for vals in alibaba_jan19[list_of_col].values]

fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)

candlestick_ohlc(ax, alibaba_values, width=0.6, colorup='g',colordown='r');

However, Alibaba's stock had a pretty big decrease after January 13th of 2020.

## Basic Financial Analysis

By looking at a daily percentage change, it can be concluded stock's volatility and how risky it is. The formula is simple: more volatility = more risk.

In [None]:
shop['Returns'] = shop['Close'].pct_change(1)
jd['Returns'] = jd['Close'].pct_change(1)
alibaba['Returns'] = alibaba['Close'].pct_change(1)
shop['Returns'].hist(bins = 50);

In [None]:
jd['Returns'].hist(bins = 50);

In [None]:
alibaba['Returns'].hist(bins = 50);

Since the histograms have a 'bell'-shaped form and are symmetric around zero, the data is normally distributed.

By plotting all 3 on the graph, let us compare them back to back.

In [None]:
shop['Returns'].hist(bins = 100, figsize=(16,8), label = 'Shopify', alpha=0.4)
jd['Returns'].hist(bins = 100, figsize=(16,8), label = 'JD.com', alpha=0.4)
alibaba['Returns'].hist(bins = 100, figsize=(16,8), label = 'Alibaba', alpha=0.4)
plt.legend();

### Kernal Density Estimate type of graph

This graph is used to visualize the Probability Density of a time series. The higher the pick = less volatile the time series.

In [None]:
shop['Returns'].plot(kind='kde', label = 'Shopify', figsize=(10,8))
jd['Returns'].plot(kind='kde', label = 'JD.com',figsize=(10,8))
alibaba['Returns'].plot(kind='kde', label = 'Alibaba', figsize=(10,8))
plt.legend();

So, Shopify's data is the most volatile out of 3.

### Boxplots on the returns

In [None]:
box_df = pd.concat([shop['Returns'], jd['Returns'], alibaba['Returns']], axis = 1)
box_df.columns = ['Shopify Ret', 'JD Ret', 'Alibaba Ret']
box_df.plot(kind='box', figsize = (8,10));

Most of the data is between 0.025 and -0.025 with the mean around 0. Unfortunately, there are many outliers.

### Scatterplot

To see the correlation of the daily returns scatterplots can used.

In [None]:
scatter_matrix(box_df, figsize=(8,8), alpha = 0.2, hist_kwds={'bins':100});

Based on the scatterplots, can be said that the data is normally distributed and symmetric around 0. Also, as seen in the graph below, a positive correlation is still present between Alibaba and JD.com stock returns.

In [None]:
box_df.plot(kind='scatter', x='JD Ret', y='Alibaba Ret', alpha = 0.5, figsize = (10,8));