# Financial Data Sources & APIs

A time series is essentially a sequence of data points arranged in chronological order. It’s like a story told over time, where each data point represents a snapshot at a specific moment. Time series data is indexed in time order, and examples include financial data such as equity prices, commodity prices, and forex rates, all of which are observed at regular time intervals.

## Financial Data Preprocessing

The first step in any data analysis is to parse the raw data. This involves extracting the data from the source, cleaning it up, and addressing any missing values. Financial data can come in many forms, but Python makes it easy to read and manipulate this data using powerful packages.

In this session, we'll focus on how to retrieve and store time series data using some popular Python libraries. Specifically, we’ll work with end-of-day data, intraday data, and option chain data. Additionally, we’ll explore how to read time series data from traditional local sources like SQL databases.

**Import Libraries**

In [17]:
# Import client from TradingStrategy
from tradingstrategy.client import Client

# Create client for data import
client = Client.create_jupyter_client()

# Import time series package from TradingStrategy
from tradingstrategy.timebucket import TimeBucket

import pandas as pd

Started Trading Strategy in Jupyter notebook environment, configuration is stored in /home/kieferkushland/.tradingstrategy


## Financial Time Series Data Retrieval

In [18]:
# Fetch data by specifying the period
all_candles = client.fetch_all_candles(TimeBucket.d1)

# Represent candles as Pandas data frame
all_candles_df = all_candles.to_pandas()

# Display the all candles data frame
all_candles_df

Unnamed: 0,pair_id,timestamp,exchange_rate,open,close,high,low,buys,sells,volume,buy_volume,sell_volume,avg,start_block,end_block
0,1,2020-05-05,1.000000,205.587587,201.486251,205.587587,201.486251,0.0,2.0,0.011000,0.000000,0.011000,203.536919,10008566,10008585
1,1,2020-05-06,1.000000,201.078391,201.358458,201.358458,201.078391,1.0,1.0,0.001689,0.000689,0.001000,201.218424,10013764,10014418
2,1,2020-05-11,1.000000,199.065411,211.270007,211.270007,199.065411,1.0,1.0,0.397513,0.192643,0.204870,205.167709,10045107,10045110
3,1,2020-05-12,1.000000,214.489434,129.456786,214.489434,119.483601,2.0,4.0,3.213554,0.831169,2.382385,154.093810,10051015,10054357
4,1,2020-05-13,1.000000,131.736448,241.912496,244.201278,26.774071,4.0,2.0,17.446875,13.658491,3.788384,175.310108,10054536,10060820
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26405947,8439629,2025-07-25,3716.323599,0.000020,0.000020,0.000020,0.000020,,,0.037163,,,0.037163,33324509,33324509
26405948,8439637,2025-07-25,3718.154512,0.000004,0.000004,0.000004,0.000004,,,684.049960,,,52.619228,33324538,33324814
26405949,8439638,2025-07-25,3716.873515,0.000020,0.000020,0.000020,0.000020,,,0.037169,,,0.037169,33324555,33324555
26405950,8439640,2025-07-25,3717.191911,0.000020,0.000020,0.000020,0.000020,,,0.037172,,,0.037172,33324576,33324576


In [35]:
# Represent instrument pair from candle data
pair_id = 4

# Grab pair index 1, happens to be ETH/USDC [verify independently]
eth_usdc_pair: pd.DataFrame = all_candles_df.loc[all_candles_df['pair_id'] == pair_id] 

# Display the ETH/USDC pair candle data
eth_usdc_pair

Unnamed: 0,pair_id,timestamp,exchange_rate,open,close,high,low,buys,sells,volume,buy_volume,sell_volume,avg,start_block,end_block
27,4,2020-05-14,1.0,200.001681,211.817953,300.072562,200.001681,7.0,4.0,100.411077,48.905310,51.505767,213.483734,10060952,10065717
28,4,2020-05-15,1.0,210.070970,193.333021,378.186603,109.417778,7.0,12.0,661.884893,345.366841,316.518052,232.813905,10069814,10073671
29,4,2020-05-16,1.0,190.061997,199.401306,199.775098,177.004851,1.0,4.0,36.084748,21.052595,15.032153,189.231711,10073990,10079108
30,4,2020-05-17,1.0,186.457728,205.509878,205.509878,186.457728,1.0,1.0,29.870240,17.752717,12.117523,195.983803,10083697,10084908
31,4,2020-05-18,1.0,210.286427,213.494427,233.890133,202.564955,29.0,37.0,10177.065842,5156.156999,5020.908843,213.158586,10091500,10093029
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19878581,4,2025-07-21,1.0,3757.394392,3763.631010,3845.086684,3715.281317,161.0,137.0,690205.743199,348404.914886,341800.828313,3772.765633,22963880,22970994
19878582,4,2025-07-22,1.0,3764.865830,3740.171941,3787.646745,3629.449392,204.0,165.0,886717.336263,433302.512401,453414.823862,3701.618155,22971134,22978113
19878583,4,2025-07-23,1.0,3750.970386,3621.896977,3754.177428,3535.016376,169.0,154.0,679577.012305,285687.519935,393889.492370,3647.813050,22978274,22985237
19878584,4,2025-07-24,1.0,3621.899384,3712.163336,3761.451516,3519.821531,193.0,164.0,781184.234025,432952.354020,348231.880004,3654.103053,22985342,22992312



## Plotting an interactive chart with Plotly

Let’s try interactive charts. You can pan and zoom into interactive charts, making them easier to explore.

Plotly uses its JavaScript library to make the chart interactive on any HTML page.

In [25]:
import plotly.graph_objects as go
from plotly.offline import iplot

fig = go.Figure(data=[go.Candlestick(x=eth_usdc_pair['timestamp'],
                open=eth_usdc_pair['open'],
                high=eth_usdc_pair['high'],
                low=eth_usdc_pair['low'],
                close=eth_usdc_pair['close'])])
iplot(fig)


## OHCL Chart

Then we have OHCL chart. It is exactly the same as candle chart, but renders candle wicks differently if you zoom in.

In [None]:
fig = go.Figure(data=[go.Ohlc(x=eth_usdc_pair['timestamp'],
                open=eth_usdc_pair['open'],
                high=eth_usdc_pair['high'],
                low=eth_usdc_pair['low'],
                close=eth_usdc_pair['close'])])

fig.show()

## Returns

Daily or periodic returns can be calculated from price data using the `.pct_change()` method on a pandas Series or DataFrame of prices.

Log returns can be calculated using `np.log(prices / prices.shift(1))`

In [None]:
prices = eth_usdc_pair['close']

returns = prices.pct_change().dropna()

display(returns)

## Standard Deviation (Volatility)

The `.std()` method directly calculates the standard deviation of a Series or DataFrame, representing historical volatility.

In [28]:
volatility = returns.std()

display(volatility)

0.04792540411271216

## Annualized Volatility

For annualizing daily volatility, multiply by the square root of the number of trading days (e.g., 252 for equities, 365 for cryptocurrencies).

In [30]:
import numpy as np

annualized_volatility = volatility * np.sqrt(365)

display(annualized_volatility)

0.9156135599524889