## Introduction

In this notebook our primary goal is to import financial data of biotech companies, whose names and tickers has been already saved into a .csv file, from multiple reliable sources.

## Import Packages & Data

In [1]:
import pandas as pd

# IEX Cloud Library
from iexfinance.stocks import get_historical_data

# Yahoo Finance Library
import yfinance as yf

import asyncio
import aiohttp
import nest_asyncio
from datetime import datetime

nest_asyncio.apply()

In [2]:
companies = pd.read_csv('../data/biotech_companies.csv')

In [3]:
companies.head()

Unnamed: 0,ticker,company name
0,ABEO,Abeona Therapeutics Inc.
1,ABIO,ARCA biopharma Inc.
2,ABUS,Arbutus Biopharma Corporation
3,ACAD,ACADIA Pharmaceuticals Inc.
4,ACER,Acer Therapeutics Inc.


In [4]:
companies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 491 entries, 0 to 490
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ticker        491 non-null    object
 1   company name  491 non-null    object
dtypes: object(2)
memory usage: 7.8+ KB


As can be seen, there are **491** companies whose historical data will be retrieved.

## Data Source 1: Yahoo Finance

In [5]:
period = '5y'
symbols = companies['ticker']
def get_historical_data_from_yf(symbols, period):

    data = yf.download(list(symbols), period=period)
    return data

yf_data = get_historical_data_from_yf(symbols, period)

[*********************100%***********************]  491 of 491 completed

118 Failed downloads:
- SBPH: No data found, symbol may be delisted
- REPH: No data found, symbol may be delisted
- MYOK: No data found, symbol may be delisted
- CHMA: No data found, symbol may be delisted
- CTRV: No data found, symbol may be delisted
- SNNA: No data found, symbol may be delisted
- EIDX: No data found, symbol may be delisted
- FTSV: No data found, symbol may be delisted
- THOR: No data found, symbol may be delisted
- CERC: No data found, symbol may be delisted
- PTLA: No data found, symbol may be delisted
- HSGX: No data found, symbol may be delisted
- GNMX: No data found, symbol may be delisted
- MNLO: No data found, symbol may be delisted
- RARX: No data found, symbol may be delisted
- SNSS: No data found, symbol may be delisted
- ADRO: No data found, symbol may be delisted
- CNAT: No data found, symbol may be delisted
- PGNX: No data found, symbol may be delisted
- PRNB: No data found, symbol 

Data retrieved from Yahoo Finance is incomplete. 113 companies of the initial 491 do not have any data recorded.

In [6]:
yf_data.swaplevel(axis=1).sort_index(axis=1)

Unnamed: 0_level_0,ABEO,ABEO,ABEO,ABEO,ABEO,ABEO,ABIO,ABIO,ABIO,ABIO,...,ZYME,ZYME,ZYME,ZYME,ZYNE,ZYNE,ZYNE,ZYNE,ZYNE,ZYNE
Unnamed: 0_level_1,Adj Close,Close,High,Low,Open,Volume,Adj Close,Close,High,Low,...,High,Low,Open,Volume,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2018-10-04,285.75,285.75,298.25,282.50,297.00,21920,12.420000,12.420000,13.50,11.88,...,14.940000,14.10,14.77,115900,7.535,7.535,7.87,7.342,7.84,365600
2018-10-05,288.50,288.50,296.25,277.00,286.75,15192,11.880000,11.880000,13.14,11.16,...,14.510000,13.24,14.46,79600,7.700,7.700,8.03,7.434,7.56,379800
2018-10-08,277.75,277.75,292.50,271.25,288.00,25248,11.700000,11.700000,11.88,11.16,...,13.800000,13.10,13.66,49600,7.350,7.350,7.68,7.340,7.67,227500
2018-10-09,269.25,269.25,282.75,267.25,276.25,27952,11.160000,11.160000,11.52,10.80,...,13.770000,13.13,13.15,60700,7.590,7.590,7.75,7.350,7.39,264700
2018-10-10,250.75,250.75,269.25,241.50,268.50,43232,10.620000,10.620000,10.98,10.44,...,13.629000,13.00,13.48,127200,7.000,7.000,7.67,7.000,7.58,415500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2018-09-27,325.00,325.00,331.25,321.25,326.25,11152,9.720000,9.720000,10.08,9.54,...,16.014999,15.68,15.90,43800,7.840,7.840,8.34,7.420,8.34,747200
2018-09-28,320.00,320.00,326.25,318.25,325.00,14180,10.080000,10.080000,10.26,9.54,...,15.890000,15.50,15.69,57300,8.160,8.160,8.24,8.000,8.01,374500
2018-10-01,314.00,314.00,325.50,313.50,320.25,19080,18.360001,18.360001,18.90,10.44,...,15.730000,15.01,15.73,92500,7.860,7.860,8.40,7.800,8.32,394600
2018-10-02,291.50,291.50,317.50,283.50,313.50,41560,11.880000,11.880000,15.30,10.62,...,15.400000,14.90,15.37,88400,7.810,7.810,7.90,7.460,7.88,459600


## Data Source 2: IEX Cloud
This method will not be applicable in this case. However, if you have access to a premium account on IEX Cloud, you can uncomment the lines and retrieve data.

In [7]:
def get_tasks(session, symbols, api_key, url):
    tasks = []
    for ticker in symbols:
        api_url = url.format(ticker, api_key)
        coroutine = session.get(api_url, ssl=False)
        tasks.append(asyncio.create_task(coroutine))
    return tasks

In [8]:
async def get_symbols(symbols, iex_api_key, url):
    results = []
    async with aiohttp.ClientSession() as session:
        tasks = get_tasks(session, symbols, iex_api_key, url)
        responses = await asyncio.gather(*tasks)
        for response in responses:
            results.append(await response.json())
    return results

In [9]:
def get_historical_data_from_iexcloud(symbols, start, end, iex_api_key):

    companies = dict()
    for symbol in symbols:
        data = get_historical_data(symbol, start, end, output_format='pandas', token=iex_api_key)
        companies[symbol] = data
    return companies

In [10]:
# Publishable key
iex_api_key = 'pk_7214effdee6e4e31b49afe88989d1f46'
historical_data_url = 'https://cloud.iexapis.com/stable/stock/{}/chart/max?token={}'

In [11]:
start = yf_data.index[0]
end = yf_data.index[-1]
start = datetime(start.year, start.month, start.day)
end = datetime(end.year, end.month, end.day)

In [12]:
# Caution: In order to retrieve all the data make sure you are using a key that belongs to a premium account
# After entering the key, uncomment the line below

# historic_data = pd.DataFrame(asyncio.run(get_symbols(symbols, iex_api_key, historical_data_url)))
# historical_data = get_historical_data_from_iexcloud(symbols, start, end, iex_api_key)