## Introduction

In this notebook, our primary goal is to import financial data of biotech companies, whose names and tickers have already been saved into a .csv file.

## Import Packages & Data

In [1]:
import pandas as pd
from datetime import datetime
from modules.data_collection_functions import *
import os

In [2]:
companies = pd.read_csv('../data/biotech_companies.csv')

In [3]:
companies.head()

Unnamed: 0,ticker,company name
0,ABEO,Abeona Therapeutics Inc.
1,ABIO,ARCA biopharma Inc.
2,ABUS,Arbutus Biopharma Corporation
3,ACAD,ACADIA Pharmaceuticals Inc.
4,ACER,Acer Therapeutics Inc.


In [4]:
companies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 491 entries, 0 to 490
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ticker        491 non-null    object
 1   company name  491 non-null    object
dtypes: object(2)
memory usage: 7.8+ KB


As can be seen, there are **491** companies whose historical data will be retrieved.

## Data Source 1: Yahoo Finance

In [5]:
period = '5y'
symbols = companies['ticker']

yf_data = get_historical_data_from_yf(symbols, period)

[*********************100%***********************]  491 of 491 completed

119 Failed downloads:
- ADRO: No data found, symbol may be delisted
- CTRV: No data found, symbol may be delisted
- OHRP: No data found, symbol may be delisted
- GEMP: No data found, symbol may be delisted
- OSIR: No data found, symbol may be delisted
- DMPI: No data found, symbol may be delisted
- FOMX: No data found, symbol may be delisted
- DOVA: No data found, symbol may be delisted
- ALXN: No data found, symbol may be delisted
- PGNX: No data found, symbol may be delisted
- NTGN: No data found, symbol may be delisted
- IPCI: No data found, symbol may be delisted
- CELG: No data found, symbol may be delisted
- VTL: No data found, symbol may be delisted
- AKCA: No data found, symbol may be delisted
- SBPH: No data found, symbol may be delisted
- XON: No data found, symbol may be delisted
- DRNA: No data found, symbol may be delisted
- AZRX: No data found, symbol may be delisted
- CNAT: No data found, symbol ma

Data retrieved from Yahoo Finance is incomplete. 119 companies of the initial 491 do not have any data recorded.

In [6]:
yf_data = yf_data.swaplevel(axis=1).sort_index(axis=1)

In [7]:
symbols_set = set(symbols)

In [8]:
retrieved = get_retrieved_symbols(yf_data)
len(retrieved)

491

In [10]:
try:
    os.mkdir('../data/historical')
except:
    print('Already Exists')
finally:
    path = '../data/historical/'
    export_to_csv(yf_data, path, retrieved)

Already Exists


## Data Source 2: IEX Cloud
This method will not be applicable in this case. However, if you have access to a premium account on IEX Cloud, you can uncomment the lines and retrieve data.

In [11]:
# Publishable key
iex_api_key = 'pk_7214effdee6e4e31b49afe88989d1f46'
historical_data_url = 'https://cloud.iexapis.com/stable/stock/{}/chart/max?token={}'

In [12]:
start = yf_data.index[0]
end = yf_data.index[-1]
start = datetime(start.year, start.month, start.day)
end = datetime(end.year, end.month, end.day)

In [13]:
# Caution: In order to retrieve all the data make sure you are using a key that belongs to a premium account
# After entering the key, uncomment the line below

# historic_data = pd.DataFrame(asyncio.run(get_symbols(symbols, iex_api_key, historical_data_url)))
# historical_data = get_historical_data_from_iexcloud(symbols, start, end, iex_api_key)