# Cannabis Stocks and Media Hype

Cannabis stocks have for a few years now been hyped up as the new "tech stocks", promising to be to current investors what Apple and Microsoft were decades ago. With marijuana fully legalized in Canada, and fully legal status or reduced decriminalization in thirty-seven of the United States, many investors jumped onboard in 2018 when major cannabis companies, some believing in the product, others believing in the profit. 

As the demand for the stocks increased, certain stocks saw near-exponential growth, to the wonder and admiration of investors and analysts alike. But how much of that growth was a natural reflection on the product and how much was artificially inflated hype?


## Step 1: Source Stock Data from NASDAQ

Using open-source Python code, I scraped stock information from NASDAQ for Tilray, Inc. (TLRY), Canopy Growth Corp. (CGC), Aurora Cannabis (ACB), and Cronos Group (CRON), four of the most commonly discussed cannabis stocks. These were then stored in JSON files.

In [1]:
from lxml import html
import requests
from time import sleep
import json
import argparse
from random import randint

import pandas as pd

In [2]:
f = open('tlry-summary.json')
data = json.load(f)
print(type(data))
data

<class 'dict'>


{'company_name': '',
 'ticker': 'tlry',
 'url': 'http://www.nasdaq.com/symbol/tlry',
 'open price': None,
 'open_date': None,
 'close_price': None,
 'close_date': None,
 'key_stock_data': {}}

That's not very useful. Also, for our purposes, we need historical data. Fortunately, NASDAQ provides historical data in a csv file.

In [8]:
tlry = pd.read_csv('TLRYHistoricalQuotes.csv')
tlry_asc = tlry.iloc[::-1] #Putting the dates in ascending order
tlry_asc.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
346,07/19/2018,$22.39,11912880,$23.05,$24.00,$20.10
345,07/20/2018,$29.77,13947110,$24.25,$31.80,$23.50
344,07/23/2018,$29.45,9984060,$33.48,$34.10,$29.31
343,07/24/2018,$25.36,5494133,$28.80,$29.43,$25.25
342,07/25/2018,$26.49,3845034,$25.31,$27.15,$24.20


In [9]:
acb = pd.read_csv('ACBHistoricalQuotes.csv')
acb_asc = acb.iloc[::-1]
acb_asc.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
1258,12/03/2014,$0.91,,$0.91,$0.91,$0.91
1257,12/04/2014,$0.91,,$0.91,$0.91,$0.91
1256,12/05/2014,$0.91,,$0.91,$0.91,$0.91
1255,12/08/2014,$0.91,,$0.91,$0.91,$0.91
1254,12/09/2014,$0.91,,$0.91,$0.91,$0.91


In [10]:
cgc = pd.read_csv('CGCHistoricalQuotes.csv')
cgc_asc = cgc.iloc[::-1]
cgc_asc.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
1057,09/22/2015,$1.38,5324,$1.36,$1.38,$1.36
1056,09/23/2015,$1.33,10500,$1.36,$1.36,$1.33
1055,09/24/2015,$1.30,3734,$1.34,$1.35,$1.29
1054,09/25/2015,$1.32,9354,$1.34,$1.34,$1.32
1053,09/28/2015,$1.23,5914,$1.30,$1.30,$1.22


In [11]:
cron = pd.read_csv('CRONHistoricalQuotes.csv')
cron_asc = cron.iloc[::-1]
cron_asc.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
20,11/04/2019,$8.08,4312321,$8.20,$8.30,$8.08
19,11/05/2019,$8.23,3596968,$8.12,$8.43,$8.08
18,11/06/2019,$8.33,3645152,$8.31,$8.48,$8.17
17,11/07/2019,$7.93,6158625,$8.40,$8.52,$7.86
16,11/08/2019,$8.52,7920348,$7.95,$8.60,$7.86


From the previews of the dataframes, we can see that Aurora Cannabis (ACB) was the first to go public in December 2014, followed by Canopy Growth Corporation (CGC) in September 2015. Tilray, Inc. (TLRY) and Cronos Group (CRON) are relatively new, going public in July 2018 and November 2019, respectively.

## Step 2: Data Preprocessing/Cleaning