# Finding the best Open-Source Python API/library for financial data.
Data is the foundamental brick of any financial study and project so, before getting started with one, each developer should firstly choose the right tool that enables him to find good quality data.

This paper presents a systematic evaluation of various open-source financial APIs, aiming to provide clarity and guidance to developers and researchers seeking to integrate financial data into their Python projects.

Our study assesses each API across multiple metrics, including response time, error handling, handling of null values and data cost. By rigorously analyzing these factors, we offer an objective comparison to aid users in selecting the optimal API for their and our specific requirements.

## Performance metrics
As we said before, we focused on different performance metrics for different usages.
If not clearly specified, we will consider only APIs that offer free access to their data.
This is the list of parameters and features we are going to discuss and evaluate.

<ul> 
    <li>Number of free calls </li>
    <li>First day available or longest time serie</li>
    <li>Minimum time frequency available </li>
    <li>Deviation from the average values in a timeserie</li>
    <li>Time delay of the request </li>

</ul>

In this script we will create a simple dataframe containing in the rows all the APIswe will consider and in the columns all the parameters we previously mentioned,in order to have a overview of the comparison and in order to be able to filter specific caratteristics.


In [14]:
import requests
from timeit import timeit
import sys
from api_alpha_vintage import * 
from api_tingo import *
from api_yfinance import *
from api_eodhd import *
from evaluators import *
from api_finhub import *
from api_fmp import *
from api_poligonio import *

from api_pandas_datareader import *

sys.path.insert(0, '/api_functions/')

fmp_key = "wpy0OAHOmFPLDyGbZzZ9IQuBlPMGf6v3"
aplha_vantage_key = "P0FAXOA7EI26OK9C"
tingo_key = "ef8c1728f6f1409157ab6a4f1266dd7af5df1ad5"
iex_key = "pk_5b65eb8e38e744f0b90e312a5c638d1b"
eodhd_key = "660734df8f7450.97003170"
fin_hub_key = "co3jcjpr01qj6vn80uogco3jcjpr01qj6vn80up0"
finage_key = "API_KEY01UJ76AC66CG8HPDTMJ75CW2GWR8FY0A"
poligon_key = "yrLQ1dm3ZAujvqGds3ZBQq4zaE7X0R9a"

api_list = pd.Series(["Financial Modelling Prep", "Alpha Vantage", "Quandl", "Tingo",
                      "YFinance", "IEX Cloud", "EODHD" ,"FinHub", "Finage", "Poligon.io"
                    ])

comparison_df_columns = pd.Series(["Free Calls","First day available","Min Frequency",
                                  "Error Rate", "Price", "Time delay of the request"])

comparison_df = pd.DataFrame(index = api_list, columns=comparison_df_columns)

### TIME DELAY
In this section we calculate the time delay of the request.
For simplicity, we are considering the request returning the last quote for a given stock.
The same processes could be done with different endpoints.
In case the API doesn't offert for free the last quote, a "none" value will be inserted in that cell.

This evalutaions consists in the mean of the time response (calculate with the python timeit module and expressed in seconds) of each request.
For each api, we insert the calculated average time in the comparison_df we have previously defined.

In [15]:
ticker = "AAPL"
iterations = 1

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_yfinance, ticker)
comparison_df.at["YFinance","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_alpha_vantage, ticker)
comparison_df.at["Alpha Vantage","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_eodhd, ticker)
comparison_df.at["EODHD","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_finhub, ticker)
comparison_df.at["FinHub","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_fmp, ticker)
comparison_df.at["Financial Modelling Prep","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_tingo, ticker)
comparison_df.at["Tingo","Time delay of the request"] = sum / iterations

sum = 0
for i in range(iterations):
    sum += time_response(latest_price_poligonio, ticker)
comparison_df.at["Poligon.io","Time delay of the request"] = sum / iterations


### COMPARISON OF VALUES, DEVIATIONS AND ERROR RATE
The aim of this section is to evalute the quality of the data we obtain from different APIs.
The main focus is to calculate, in a given time and for each APIs, how much is the error in comparison to the real value.
The error simply consists in the deviation from the real value that, in our case, is the last price returned from the APIs of a given ticker.

Since there is no way to determine it, the real value is the average of all the values returned by the different APIs in the same moment.

As we did before, in case the an API doesn't offert for free the price request, we will insert "None" in that cell.

In [16]:
value_comparison_df_columns = pd.Series(["Price", "Deviation""Error Rate"])
value_comparison_df = pd.DataFrame(index = api_list, columns=value_comparison_df_columns)

value_comparison_df.at["YFinance", "Price"] = int(latest_price_yfinance(ticker)["currentPrice"] )
value_comparison_df.at["Financial Modelling Prep", "Price"] = latest_price_fmp(ticker)[0]["price"]
value_comparison_df.at["Alpha Vantage", "Price"] = latest_price_alpha_vantage(ticker)["Global Quote"]["05. price"]
value_comparison_df.at["Quandl", "Price"] = np.nan
value_comparison_df.at["Tingo", "Price"] = latest_price_tingo(ticker)[0]
value_comparison_df.at["IEX Cloud", "Price"] = np.nan
value_comparison_df.at["EODHD", "Price"] = latest_price_eodhd(ticker)["close"]
value_comparison_df.at["FinHub", "Price"] = latest_price_finhub(ticker)["c"]
value_comparison_df.at["Finage", "Price"] = np.nan
value_comparison_df.at["Poligon.io", "Price"] =latest_price_poligonio(ticker)["results"][0]["c"]

comparison_df["Price"] = value_comparison_df["Price"]

value_comparison_df["Price"] = value_comparison_df["Price"].astype(float)

value_comparison_df["Deviation"] = value_comparison_df["Price"].mean() - value_comparison_df["Price"]

value_comparison_df["Error Rate"] = abs (value_comparison_df["Deviation"] / value_comparison_df["Price"].mean()  * 100 )

comparison_df["Error Rate"] = value_comparison_df["Error Rate"]

### LONGEST TIME SERIES AVAILABLE
For each API that provides free time series data, we find the first historical quote available for a given stock.
Note that this metric may vary depending on the selected ticker.
In case the API doesn't offert for free historical data, we will insert "None" in that cell.

In [17]:
comparison_df.at["Alpha Vantage","First day available", ] = historical_price_alpha_vantage("KO").index[1]
comparison_df.at["EODHD","First day available", ] = str(historical_price_eodhd(ticker).index[-1])[0:10]
comparison_df.at["Financial Modelling Prep","First day available", ] = historical_price_fmp("AAPL")["historical"][-1]["date"]
comparison_df.at["Tingo","First day available", ] = str(historical_price_tingo("SPY").index[0])[0:10]
comparison_df.at["YFinance","First day available", ] = str(historical_price_yfinance("SPY").index[0])[0:10]

### MINIMUM TIME FREQUENCY
For the minimun time frequency available, we directly read each documentation.
In this case, we provide the minimum time frequency available for open-source APIs.
Minimum time frequency can vary when paying a premium subscription.

If you are looking for the best premium API, we invite you to follow the links you can find in the footer section of this page and visit the documentation for each API.

In [18]:
comparison_df.at["Financial Modelling Prep", "Min Frequency"] =np.nan
comparison_df.at["Alpha Vantage", "Min Frequency"] = "1min"
comparison_df.at["Quandl", "Min Frequency"] = np.nan
comparison_df.at["Tingo", "Min Frequency"] = "1day"
comparison_df.at["YFinance", "Min Frequency"] = "1min"
comparison_df.at["IEX Cloud", "Min Frequency"] = np.nan
comparison_df.at["EODHD", "Min Frequency"] = "1day"
comparison_df.at["FinHub", "Min Frequency"] = np.nan
comparison_df.at["Finage", "Min Frequency"] = np.nan
comparison_df.at["Poligon.io", "Min Frequency"] = np.nan



### MAXIMUM NUMBER OF FREE CALLS
Some APIs provide a maximum number of free calls in a given time frame.

We directly looked at their documentation and we completed the dataframe as follows.

The data is expressed in "number of free calls per day".

In [19]:
comparison_df.at["Financial Modelling Prep", "Free Calls"] = 250
comparison_df.at["Alpha Vantage", "Free Calls"] = 25
comparison_df.at["Quandl", "Free Calls"] = 0
comparison_df.at["Tingo", "Free Calls"] = 1000
comparison_df.at["YFinance", "Free Calls"] = "Unlimited"
comparison_df.at["IEX Cloud", "Free Calls"] = 0
comparison_df.at["EODHD", "Free Calls"] = 5040
comparison_df.at["FinHub", "Free Calls"] = 5040
comparison_df.at["Finage", "Free Calls"] = 0
comparison_df.at["Poligon.io", "Free Calls"] = 7200

### RESULTS

We can have a general overview of all the comparisons we did in this dataframe.

In [34]:
comparison_df.style.set_properties(**{'border': '2px solid green',
                                    'color': 'white',
                                    'font-size': '30'
                                    },
                                   )

Unnamed: 0,Free Calls,First day available,Min Frequency,Error Rate,Price,Time delay of the request
Financial Modelling Prep,250,2019-04-22,,0.486931,165.1,0.521036
Alpha Vantage,25,2024-04-17,1min,0.682393,167.04,0.478163
Quandl,0,,,,,
Tingo,1000,1993-01-29,1day,0.682393,167.04,0.371393
YFinance,Unlimited,1993-01-29,1min,0.547206,165.0,0.082351
IEX Cloud,0,,,,,
EODHD,5040,2024-04-18,1day,0.517068,165.05,0.87419
FinHub,5040,,,0.495972,165.085,0.358993
Finage,0,,,,,
Poligon.io,7200,,,0.682393,167.04,0.397693


In [None]:
√