# Getting Started with Data Sources

Welcome to **PyBroker**! The best place to start is to learn about [DataSources](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource).  A ```DataSource``` is a class that can fetch data from external sources, which you can then use to backtest your trading strategies.

## Yahoo Finance

One of the built-in ```DataSources``` in **PyBroker** is  [Yahoo Finance](https://finance.yahoo.com). To use it, you can import [YFinance](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.YFinance):

In [1]:
from pybroker import YFinance

yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df

Loading bar data...
[*********************100%***********************]  2 of 2 completed
Loaded bar data: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,AAPL,123.750000,127.930000,122.790001,127.790001,116307900,126.270065
1,2021-03-01,MSFT,235.899994,237.470001,233.149994,236.940002,25324000,233.325272
2,2021-03-02,AAPL,128.410004,128.720001,125.010002,125.120003,102260900,123.631813
3,2021-03-02,MSFT,237.009995,237.300003,233.449997,233.869995,22812500,230.302109
4,2021-03-03,AAPL,124.809998,125.709999,121.839996,122.059998,112966300,120.608208
...,...,...,...,...,...,...,...,...
501,2022-02-24,MSFT,272.510010,295.160004,271.519989,294.589996,56989700,292.458740
502,2022-02-25,AAPL,163.839996,165.119995,160.869995,164.850006,91974200,163.857407
503,2022-02-25,MSFT,295.140015,297.630005,291.649994,297.309998,32546700,295.159058
504,2022-02-28,AAPL,163.059998,165.419998,162.429993,165.119995,95056600,164.125778


The above code queries data for AAPL and MSFT stocks, and returns a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) with the results.

## Caching Data

If you want to speed up your data retrieval, you can cache your queries using **PyBroker**'s caching system. You can enable caching by calling  [pybroker.enable_data_source_cache('name')](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache) where ```name``` is the name of the cache you want to use:

In [2]:
import pybroker

pybroker.enable_data_source_cache('yfinance')

<diskcache.core.Cache at 0x7fc263b423d0>

The next call to [query](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource.query) will cache the returned data to disk. Each unique combination of ticker symbol and date range will be cached separately:

In [3]:
yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')

Loading bar data...
[*********************100%***********************]  2 of 2 completed
Loaded bar data: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,IBM,115.057358,116.940727,114.588913,115.430206,5977367,104.819466
1,2021-03-01,TSLA,230.036667,239.666672,228.350006,239.476669,81408600,239.476669
2,2021-03-02,IBM,115.430206,116.539200,114.971321,115.038239,4732418,104.463524
3,2021-03-02,TSLA,239.426666,240.369995,228.333328,228.813339,71196600,228.813339
4,2021-03-03,IBM,115.200768,117.237091,114.703636,116.978966,7744898,106.225845
...,...,...,...,...,...,...,...,...
501,2022-02-24,TSLA,233.463333,267.493347,233.333328,266.923340,135322200,266.923340
502,2022-02-25,IBM,122.050003,124.260002,121.449997,124.180000,4460900,118.284561
503,2022-02-25,TSLA,269.743347,273.166656,260.799988,269.956665,76067700,269.956665
504,2022-02-28,IBM,122.209999,123.389999,121.040001,122.510002,6757300,116.693848


Calling ```query``` again with the same ticker symbols and date range returns the cached data:

In [4]:
df = yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
df

Loaded cached bar data.



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01,IBM,115.057358,116.940727,114.588913,115.430206,5977367,104.819466
1,2021-03-02,IBM,115.430206,116.539200,114.971321,115.038239,4732418,104.463524
2,2021-03-03,IBM,115.200768,117.237091,114.703636,116.978966,7744898,106.225845
3,2021-03-04,IBM,116.634796,117.801147,113.537285,114.827919,8439651,104.272537
4,2021-03-05,IBM,115.334610,118.307838,114.961761,117.428299,7268968,106.633881
...,...,...,...,...,...,...,...,...
248,2022-02-22,TSLA,278.043335,285.576660,267.033325,273.843323,83288100,273.843323
249,2022-02-23,TSLA,276.809998,278.433319,253.520004,254.679993,95256900,254.679993
250,2022-02-24,TSLA,233.463333,267.493347,233.333328,266.923340,135322200,266.923340
251,2022-02-25,TSLA,269.743347,273.166656,260.799988,269.956665,76067700,269.956665


You can clear your cache using [pybroker.clear_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.clear_data_source_cache):

In [5]:
pybroker.clear_data_source_cache()

Or disable caching altogether using [pybroker.disable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.disable_data_source_cache):

In [6]:
pybroker.disable_data_source_cache()

Note that these calls should be made after first calling [pybroker.enable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache).

## Alpaca

**PyBroker** also includes an [Alpaca](https://alpaca.markets/) ```DataSource``` for fetching stock data. To use it, you can import [Alpaca](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.Alpaca) and provide your API key and secret: 

In [7]:
from pybroker import Alpaca
import os

alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])

You can query ```Alpaca``` for stock data using the same syntax as with Yahoo Finance, but Alpaca also supports querying data by different timeframes. For example, to query 1 minute data:

In [8]:
df = alpaca.query(
    ['AAPL', 'MSFT'], 
    start_date='3/1/2021', 
    end_date='4/1/2021', 
    timeframe='1m'
)
df

Loading bar data...
Loaded bar data: 0:00:05 



Unnamed: 0,date,symbol,open,high,low,close,volume,vwap
0,2021-03-01 04:00:00-05:00,AAPL,124.30,124.56,124.30,124.50,12267,124.433365
1,2021-03-01 04:00:00-05:00,MSFT,235.87,236.00,235.87,236.00,1429,235.938887
2,2021-03-01 04:01:00-05:00,AAPL,124.56,124.60,124.30,124.30,9439,124.481323
3,2021-03-01 04:01:00-05:00,MSFT,236.17,236.17,236.17,236.17,104,236.161538
4,2021-03-01 04:02:00-05:00,AAPL,124.00,124.05,123.78,123.78,4834,123.935583
...,...,...,...,...,...,...,...,...
33859,2021-03-31 19:57:00-04:00,MSFT,237.28,237.28,237.28,237.28,507,237.367870
33860,2021-03-31 19:58:00-04:00,AAPL,122.36,122.39,122.33,122.39,3403,122.360544
33861,2021-03-31 19:58:00-04:00,MSFT,237.40,237.40,237.35,237.35,636,237.378066
33862,2021-03-31 19:59:00-04:00,AAPL,122.39,122.45,122.38,122.45,5560,122.402606


## Alpaca Crypto

If you are interested in fetching cryptocurrency data, you can use [AlpacaCrypto](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.AlpacaCrypto). Here's an example of how to use it:

In [9]:
from pybroker import AlpacaCrypto

crypto = AlpacaCrypto(
    os.environ['ALPACA_API_KEY'], 
    os.environ['ALPACA_API_SECRET'], 
    exchange='CBSE'
)
df = crypto.query('BTCUSD', start_date='1/1/2021', end_date='2/1/2021', timeframe='1h')
df

Loading bar data...
Loaded bar data: 0:00:02 



Unnamed: 0,symbol,date,open,high,low,close,volume,vwap,trade_count
0,BTCUSD,2021-01-01 00:00:00-05:00,29287.95,29305.32,29148.60,29240.79,860.832582,29225.508289,7067
1,BTCUSD,2021-01-01 01:00:00-05:00,29246.06,29320.04,29158.93,29233.94,977.010313,29231.432535,7130
2,BTCUSD,2021-01-01 02:00:00-05:00,29233.93,29234.87,28900.00,29166.41,1226.293530,29084.702280,6890
3,BTCUSD,2021-01-01 03:00:00-05:00,29166.41,29246.63,28928.16,29074.87,850.251282,29096.221572,11113
4,BTCUSD,2021-01-01 04:00:00-05:00,29071.63,29386.28,29039.15,29280.43,517.123045,29241.740654,8740
...,...,...,...,...,...,...,...,...,...
740,BTCUSD,2021-01-31 20:00:00-05:00,32587.51,33561.57,32515.49,33450.00,1086.327588,33110.900911,11618
741,BTCUSD,2021-01-31 21:00:00-05:00,33450.00,33850.00,33248.80,33673.25,1002.629634,33597.702245,10695
742,BTCUSD,2021-01-31 22:00:00-05:00,33672.72,33780.00,33481.05,33590.17,636.194746,33630.418590,8520
743,BTCUSD,2021-01-31 23:00:00-05:00,33590.70,33958.34,33318.97,33581.76,726.182833,33671.944503,9552


In the above example, we're querying for hourly data for the BTCUSD currency pair. The ```exchange``` argument specifies that the data comes from Coinbase.

[In the next notebook, we'll take a look at how to use DataSources to backtest a simple trading strategy](https://www.pybroker.com/en/latest/notebooks/2.%20Backtesting%20a%20Strategy.html).