# Finding the best Open-Source Python API/library for financial data.
Data is a critical component of any financial study or project. Therefore, prior to initiating such efforts, developers should carefully select the appropriate tools to access high-quality data.

This notebook presents a systematic evaluation of various open-source financial APIs, aiming to provide clarity and guidance to developers and researchers seeking to integrate financial data into their Python projects.

Our study assesses each API across multiple metrics, including response time, error handling and data cost. By rigorously analyzing these factors, we offer an objective comparison to aid users in selecting the optimal API for their and our specific requirements.

## Performance metrics

As previously mentioned, our analysis focuses on various performance metrics tailored to different use cases. Unless otherwise specified, our evaluation will prioritize APIs offering free access to their data. Below outlines the parameters and features under consideration for discussion and assessment:

<ul> 
    <li>Number of free calls </li>
    <li>First day available or longest time serie</li>
    <li>Minimum time frequency provided </li>
    <li>Deviation from average values within a time series</li>
    <li>Request time delay </li>

</ul>

To facilitate our comparison and enable targeted filtering based on specific characteristics, we will construct a simple dataframe in this script. This dataframe will list all the APIs under consideration in the rows, while the columns will represent the rementioned parameters. This arrangement will afford us an overview of the comparison process.

In [45]:

import requests
from timeit import timeit
import sys
from api_alpha_vintage import * 
from api_tingo import *
from api_yfinance import *
from api_eodhd import *
from evaluators import *
from api_finhub import *
from api_fmp import *
from api_poligonio import *
import math
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

from api_pandas_datareader import *

sys.path.insert(0, '/api_functions/')

fmp_key = "wpy0OAHOmFPLDyGbZzZ9IQuBlPMGf6v3"
aplha_vantage_key = "P0FAXOA7EI26OK9C"
tingo_key = "ef8c1728f6f1409157ab6a4f1266dd7af5df1ad5"
iex_key = "pk_5b65eb8e38e744f0b90e312a5c638d1b"
eodhd_key = "660734df8f7450.97003170"
fin_hub_key = "co3jcjpr01qj6vn80uogco3jcjpr01qj6vn80up0"
finage_key = "API_KEY01UJ76AC66CG8HPDTMJ75CW2GWR8FY0A"
poligon_key = "yrLQ1dm3ZAujvqGds3ZBQq4zaE7X0R9a"

api_list = pd.Series(["Financial Modelling Prep", "Alpha Vantage", "Quandl", "Tingo",
                      "YFinance", "IEX Cloud", "EODHD" ,"FinHub", "Finage", "Poligon.io"
                    ])

comparison_df_columns = pd.Series(["Free Calls","First day available","Min Frequency",
                                  "Error Rate", "Price", "Time delay of the request"])

comparison_df = pd.DataFrame(index = api_list, columns=comparison_df_columns)

for column in comparison_df.columns:
  comparison_df[column] = "None"

### TIME DELAY
In this segment, we ascertain the time delay of each request. For the sake of simplicity, our analysis revolves around requests returning the latest quote for a specified stock. 
It's worth noting that analogous procedures could be applied to different endpoints. 

Should an API fail to provide the latest quote free of charge, we will denote a "none" value within the respective cell.

This evaluation entails computing the mean time response, measured in seconds, for each request. Leveraging the Python timeit module, we derive these averages. 

Subsequently, we incorporate the calculated average time for each API into the comparison_df, previously delineated for comparison purposes.

In [46]:
ticker = "AAPL"
iterations = 1

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_yfinance, ticker)
comparison_df.at["YFinance","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_alpha_vantage, ticker)
comparison_df.at["Alpha Vantage","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_eodhd, ticker)
comparison_df.at["EODHD","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_finhub, ticker)
comparison_df.at["FinHub","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_fmp, ticker)
comparison_df.at["Financial Modelling Prep","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_tingo, ticker)
comparison_df.at["Tingo","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_poligonio, ticker)
comparison_df.at["Poligon.io","Time delay of the request"] = sum / iterations


### COMPARISON OF VALUES, DEVIATIONS AND ERROR RATE

The primary objective of this section is to assess the data quality sourced from various APIs. Our focus lies in quantifying, within a specified timeframe and for each API, the degree of deviation from the actual value. 

This deviation is simply defined as the variance from the real value, which, in our context, pertains to the latest price retrieved from the APIs for a given ticker.

Given the absence of a definitive means to ascertain the real value, we consider it as the average of all values returned by different APIs at the same point in time.

Similar to our previous approach, should an API fail to provide the requested price free of charge, we will denote "None" within the respective cell.

In [47]:
value_comparison_df_columns = pd.Series(["Price", "Deviation""Error Rate"])
value_comparison_df = pd.DataFrame(index = api_list, columns=value_comparison_df_columns)

try:
    value_comparison_df.at["YFinance", "Price"] = int(latest_price_yfinance(ticker)["currentPrice"] )
except:
    pass
try:
    value_comparison_df.at["Financial Modelling Prep", "Price"] = latest_price_fmp(ticker)[0]["price"]
except:
    pass
try:
    value_comparison_df.at["Alpha Vantage", "Price"] = latest_price_alpha_vantage(ticker)["Global Quote"]["05. price"]
except:
    pass
try:
    value_comparison_df.at["Tingo", "Price"] = latest_price_tingo(ticker)[0]
except:
    pass
try:    
    value_comparison_df.at["EODHD", "Price"] = latest_price_eodhd(ticker)["close"]
except:
    pass    
    value_comparison_df.at["FinHub", "Price"] = latest_price_finhub(ticker)["c"]
try:    
    value_comparison_df.at["Poligon.io", "Price"] =latest_price_poligonio(ticker)["results"][0]["c"]
except:
    pass

comparison_df["Price"] = value_comparison_df["Price"]

value_comparison_df["Price"] = value_comparison_df["Price"].astype(float)

value_comparison_df["Deviation"] = value_comparison_df["Price"].mean() - value_comparison_df["Price"]

value_comparison_df["Error Rate"] = abs (value_comparison_df["Deviation"] / value_comparison_df["Price"].mean()  * 100 )

comparison_df["Error Rate"] = value_comparison_df["Error Rate"]

for index, row  in comparison_df.iterrows():
    if math.isnan(row["Error Rate"]):
        comparison_df.at[index, "Error Rate"] = "Not Found"
        comparison_df.at[index, "Price"] = "Not Found"


### LONGEST TIME SERIES AVAILABLE
For each API that provides free time series data, we find the first historical quote available for a given stock.
Note that this metric may vary depending on the selected ticker.
In case the API doesn't offert for free historical data, we will insert "None" in that cell.

In [48]:
try:
    comparison_df.at["Alpha Vantage","First day available", ] = historical_price_alpha_vantage("KO").index[1]
except:
    pass
try:    
    comparison_df.at["EODHD","First day available", ] = str(historical_price_eodhd(ticker).index[-1])[0:10]
except:
    pass
try:
    comparison_df.at["Financial Modelling Prep","First day available", ] = historical_price_fmp("AAPL")["historical"][-1]["date"]
except:
    pass
try:    
    comparison_df.at["Tingo","First day available", ] = str(historical_price_tingo("SPY").index[0])[0:10]
except:
    pass
try:
    comparison_df.at["YFinance","First day available", ] = str(historical_price_yfinance("SPY").index[0])[0:10]
except:
    pass

### MINIMUM TIME FREQUENCY
To determine the minimum time frequency available, we refer directly to the documentation provided by each API. In this context, we specify the minimum time frequency available specifically for open-source APIs. It's important to note that the minimum time frequency may differ for premium subscription users.

For those seeking the finest premium API services, we encourage exploration through the links available in the footer section of this page. These links will direct users to the documentation for each respective API.

In [49]:
comparison_df.at["Financial Modelling Prep", "Min Frequency"] = "1day"
comparison_df.at["Alpha Vantage", "Min Frequency"] = "1min"
comparison_df.at["Quandl", "Min Frequency"] = "None"
comparison_df.at["Tingo", "Min Frequency"] = "1day"
comparison_df.at["YFinance", "Min Frequency"] = "1min"
comparison_df.at["IEX Cloud", "Min Frequency"] = "None"
comparison_df.at["EODHD", "Min Frequency"] = "1day"
comparison_df.at["FinHub", "Min Frequency"] = "None"
comparison_df.at["Finage", "Min Frequency"] = "None"
comparison_df.at["Poligon.io", "Min Frequency"] = "None"



### MAXIMUM NUMBER OF FREE CALLS

Certain APIs impose a maximum limit on the number of free calls within a designated time frame. We  examined their documentation and updated the dataframe accordingly.

The data is presented in terms of "number of free calls per day."

In [50]:
comparison_df.at["Financial Modelling Prep", "Free Calls"] = 250
comparison_df.at["Alpha Vantage", "Free Calls"] = 25
comparison_df.at["Quandl", "Free Calls"] = 0
comparison_df.at["Tingo", "Free Calls"] = 1000
comparison_df.at["YFinance", "Free Calls"] = "Unlimited"
comparison_df.at["IEX Cloud", "Free Calls"] = 0
comparison_df.at["EODHD", "Free Calls"] = 5040
comparison_df.at["FinHub", "Free Calls"] = 5040
comparison_df.at["Finage", "Free Calls"] = 0
comparison_df.at["Poligon.io", "Free Calls"] = 7200

### RESULTS

We have compiled a comprehensive overview of all comparisons conducted within this dataframe, culminating in a subjective rating based on our personal experience and analysis throughout this project.

In [53]:
comparison_df.style.map(none_values_styler)


Unnamed: 0,Free Calls,First day available,Min Frequency,Error Rate,Price,Time delay of the request,Overall Rating
Financial Modelling Prep,250,2019-04-22,1day,0.000000,165,0.562379,★★★
Alpha Vantage,25,,1min,Not Found,Not Found,0.422901,★★★★
Quandl,0,,,Not Found,Not Found,,★
Tingo,1000,1993-01-29,1day,0.000000,165.000000,0.382852,★★
YFinance,Unlimited,1993-01-29,1min,0.000000,165,0.193432,★★★★★
IEX Cloud,0,,,Not Found,Not Found,,★
EODHD,5040,2024-04-19,1day,0.000000,165,0.775282,★★★
FinHub,5040,,,Not Found,Not Found,0.377253,★★
Finage,0,,,Not Found,Not Found,,★
Poligon.io,7200,,,0.000000,165,0.397879,★★
