## Introduction

In this notebook, our primary goal is to import financial data of biotech companies, whose names and tickers have already been saved into a .csv file.

## Import Packages & Data

In [14]:
import pandas as pd
from datetime import datetime
from modules.data_collection_functions import *
import os

In [15]:
companies = pd.read_csv('../data/biotech_companies.csv')

In [16]:
companies.head()

Unnamed: 0,ticker,company name
0,ABEO,Abeona Therapeutics Inc.
1,ABIO,ARCA biopharma Inc.
2,ABUS,Arbutus Biopharma Corporation
3,ACAD,ACADIA Pharmaceuticals Inc.
4,ACER,Acer Therapeutics Inc.


In [17]:
companies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 491 entries, 0 to 490
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ticker        491 non-null    object
 1   company name  491 non-null    object
dtypes: object(2)
memory usage: 7.8+ KB


As can be seen, there are **491** companies whose historical data will be retrieved.

## Data Source 1: Yahoo Finance

In [18]:
period = '5y'
symbols = companies['ticker']

yf_data = get_historical_data_from_yf(symbols, period)

[*********************100%***********************]  491 of 491 completed

119 Failed downloads:
- ZSAN: No data found, symbol may be delisted
- TTPH: No data found, symbol may be delisted
- EDGE: No data found, symbol may be delisted
- TRPX: No data found, symbol may be delisted
- PTIE: No data found, symbol may be delisted
- PRNB: No data found, symbol may be delisted
- INSY: No data found, symbol may be delisted
- THOR: No data found, symbol may be delisted
- NVLN: No data found, symbol may be delisted
- XLRN: No data found, symbol may be delisted
- ALXN: No data found, symbol may be delisted
- REPH: No data found, symbol may be delisted
- BSTC: No data found, symbol may be delisted
- BOLD: No data found, symbol may be delisted
- AKCA: No data found, symbol may be delisted
- XON: No data found, symbol may be delisted
- OSMT: No data found, symbol may be delisted
- PTI: No data found, symbol may be delisted
- HSGX: No data found, symbol may be delisted
- VTL: No data found, symbol may

Data retrieved from Yahoo Finance is incomplete. 119 companies of the initial 491 do not have any data recorded.

In [19]:
yf_data = yf_data.swaplevel(axis=1).sort_index(axis=1)

In [20]:
symbols_set = set(symbols)

In [21]:
retrieved = get_retrieved_symbols(yf_data)
len(retrieved)

491

In [22]:
yf_data

Unnamed: 0_level_0,ABEO,ABEO,ABEO,ABEO,ABEO,ABEO,ABIO,ABIO,ABIO,ABIO,...,ZYME,ZYME,ZYME,ZYME,ZYNE,ZYNE,ZYNE,ZYNE,ZYNE,ZYNE
Unnamed: 0_level_1,Adj Close,Close,High,Low,Open,Volume,Adj Close,Close,High,Low,...,High,Low,Open,Volume,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2017-08-21,196.25,196.25,203.75,187.50,196.25,20136,20.700001,20.700001,22.500000,20.700001,...,7.405,7.100,7.405,3000,6.23,6.23,6.500,6.15,6.24,486800
2017-08-22,196.25,196.25,205.00,194.50,198.50,18108,18.900000,18.900000,20.700001,18.900000,...,7.220,7.000,7.190,11500,5.94,5.94,6.325,5.92,6.20,766700
2017-08-23,218.75,218.75,223.75,199.50,200.00,31904,19.799999,19.799999,20.700001,18.900000,...,7.220,7.046,7.220,7300,5.73,5.73,6.050,5.62,5.93,747600
2017-08-24,240.00,240.00,242.50,221.25,221.25,34228,20.700001,20.700001,21.600000,19.799999,...,7.071,6.760,7.000,14500,6.16,6.16,6.289,5.65,6.00,771100
2017-08-25,226.25,226.25,250.00,220.00,241.25,38728,21.240000,21.240000,21.600000,20.700001,...,7.290,6.920,7.000,19200,6.06,6.06,6.210,5.90,6.21,354800
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-15,4.65,4.65,4.73,4.26,4.50,135400,2.450000,2.450000,2.470000,2.420000,...,7.470,7.160,7.410,297700,1.36,1.36,1.370,1.28,1.30,609600
2022-08-16,4.77,4.77,4.83,4.46,4.58,173000,2.410000,2.410000,2.440000,2.390000,...,7.250,6.980,7.150,410600,1.33,1.33,1.370,1.30,1.37,276600
2022-08-17,4.60,4.60,4.81,4.50,4.79,79800,2.380000,2.380000,2.400000,2.340000,...,6.890,6.390,6.890,367100,1.32,1.32,1.370,1.28,1.36,1300300
2022-08-18,4.36,4.36,4.61,4.23,4.61,111500,2.360000,2.360000,2.370000,2.350000,...,6.520,6.110,6.270,344800,1.30,1.30,1.340,1.23,1.34,807000


In [23]:
try:
    os.mkdir('../data/historical')
except:
    print('Already Exists')
finally:
    path = '../data/historical/'
    export_to_csv(yf_data, path, retrieved)

Already Exists


## Data Source 2: IEX Cloud
This method will not be applicable in this case. However, if you have access to a premium account on IEX Cloud, you can uncomment the lines and retrieve data.

In [24]:
# Publishable key
iex_api_key = 'pk_7214effdee6e4e31b49afe88989d1f46'
historical_data_url = 'https://cloud.iexapis.com/stable/stock/{}/chart/max?token={}'

In [25]:
start = yf_data.index[0]
end = yf_data.index[-1]
start = datetime(start.year, start.month, start.day)
end = datetime(end.year, end.month, end.day)

In [26]:
# Caution: In order to retrieve all the data make sure you are using a key that belongs to a premium account
# After entering the key, uncomment the line below

# historic_data = pd.DataFrame(asyncio.run(get_symbols(symbols, iex_api_key, historical_data_url)))
# historical_data = get_historical_data_from_iexcloud(symbols, start, end, iex_api_key)