# Getting Started with Data Sources

Welcome to **PyBroker**! The best place to start is to learn about [DataSources](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource).  A ```DataSource``` is used to fetch data from an external source, and that data can then be used to backtest a trading strategy.

## Yahoo Finance

**PyBroker** includes a few ```DataSources``` by default. The first is [Yahoo Finance](https://finance.yahoo.com), imported below:

In [1]:
import pybroker
from pybroker.data import YFinance

An instance of [YFinance](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.YFinance) can be used to query data for multiple stock tickers:

In [2]:
yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df

Loading bar data...
[*********************100%***********************]  2 of 2 completed
Loaded bar data: 0:00:02 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01 05:00:00,AAPL,123.750000,127.930000,122.790001,127.790001,116307900,126.462852
1,2021-03-01 05:00:00,MSFT,235.899994,237.470001,233.149994,236.940002,25324000,233.325302
2,2021-03-02 05:00:00,AAPL,128.410004,128.720001,125.010002,125.120003,102260900,123.820580
3,2021-03-02 05:00:00,MSFT,237.009995,237.300003,233.449997,233.869995,22812500,230.302109
4,2021-03-03 05:00:00,AAPL,124.809998,125.709999,121.839996,122.059998,112966300,120.792358
...,...,...,...,...,...,...,...,...
501,2022-02-24 05:00:00,MSFT,272.510010,295.160004,271.519989,294.589996,56989700,292.458740
502,2022-02-25 05:00:00,AAPL,163.839996,165.119995,160.869995,164.850006,91974200,164.107590
503,2022-02-25 05:00:00,MSFT,295.140015,297.630005,291.649994,297.309998,32546700,295.159058
504,2022-02-28 05:00:00,AAPL,163.059998,165.419998,162.429993,165.119995,95056600,164.376373


The data above was returned in the form of a [Pandas](https://pandas.pydata.org/) [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

## Caching Data

Caching the query's data can be enabled by calling [pybroker.enable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache):

In [3]:
pybroker.enable_data_source_cache('yfinance')

<diskcache.core.Cache at 0x7f2d3877a950>

The argument passed in is the name to use for the cache. Passing a different name will store and retrieve data using a different cache located on disk.

The next call to [query](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource.query) will cache the returned data to disk. The data is cached for each unique combination of ticker symbol and date range:

In [4]:
yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')

Loading bar data...
[*********************100%***********************]  2 of 2 completed
Loaded bar data: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01 05:00:00,IBM,115.057358,116.940727,114.588913,115.430206,5977367,106.106979
1,2021-03-01 05:00:00,TSLA,230.036667,239.666672,228.350006,239.476669,81408600,239.476669
2,2021-03-02 05:00:00,IBM,115.430206,116.539200,114.971321,115.038239,4732418,105.746674
3,2021-03-02 05:00:00,TSLA,239.426666,240.369995,228.333328,228.813339,71196600,228.813339
4,2021-03-03 05:00:00,IBM,115.200768,117.237091,114.703636,116.978966,7744898,107.530640
...,...,...,...,...,...,...,...,...
501,2022-02-24 05:00:00,TSLA,233.463333,267.493347,233.333328,266.923340,135322200,266.923340
502,2022-02-25 05:00:00,IBM,122.050003,124.260002,121.449997,124.180000,4460900,119.737473
503,2022-02-25 05:00:00,TSLA,269.743347,273.166656,260.799988,269.956665,76067700,269.956665
504,2022-02-28 05:00:00,IBM,122.209999,123.389999,121.040001,122.510002,6757300,118.127220


Calling ```query``` again with the same ticker symbols and date range returns the cached data:

In [5]:
df = yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
df

Loaded cached bar data.



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01 05:00:00,TSLA,230.036667,239.666672,228.350006,239.476669,81408600,239.476669
1,2021-03-02 05:00:00,TSLA,239.426666,240.369995,228.333328,228.813339,71196600,228.813339
2,2021-03-03 05:00:00,TSLA,229.330002,233.566666,217.236664,217.733337,90624000,217.733337
3,2021-03-04 05:00:00,TSLA,218.600006,222.816666,200.000000,207.146667,197758500,207.146667
4,2021-03-05 05:00:00,TSLA,208.686661,209.279999,179.830002,199.316666,268189500,199.316666
...,...,...,...,...,...,...,...,...
248,2022-02-22 05:00:00,IBM,124.199997,125.000000,122.680000,123.919998,5349700,119.486771
249,2022-02-23 05:00:00,IBM,124.379997,124.699997,121.870003,122.070000,4086400,117.702965
250,2022-02-24 05:00:00,IBM,120.000000,122.099998,118.809998,121.970001,6563200,117.606544
251,2022-02-25 05:00:00,IBM,122.050003,124.260002,121.449997,124.180000,4460900,119.737473


The cached data can be cleared with [clear_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.clear_data_source_cache):

In [6]:
pybroker.clear_data_source_cache()

And [disable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.disable_data_source_cache) can be used to disable caching data retrieved from ```DataSources```:

In [7]:
pybroker.disable_data_source_cache()

The calls above must be made after first calling [enable_data_source_cache](https://www.pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache).

## Alpaca

**PyBroker** also includes an [Alpaca](https://alpaca.markets/) ```DataSource``` for fetching stock data. To begin using it, import [Alpaca](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.Alpaca): 

In [8]:
from pybroker import Alpaca

And then use your Alpaca API key and secret:

In [9]:
import os

alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])

An ```Alpaca``` instance is queried the same way, but it is also able to query by different timeframes supported by Alpaca. Let's query 1 minute data:

In [10]:
df = alpaca.query(
    ['AAPL', 'MSFT'], 
    start_date='3/1/2021', 
    end_date='4/1/2021', 
    timeframe='1m'
)
df

Loading bar data...
Loaded bar data: 0:00:06 



Unnamed: 0,date,symbol,open,high,low,close,volume,vwap
0,2021-03-01 04:00:00-05:00,AAPL,124.30,124.56,124.30,124.50,12267,124.433365
1,2021-03-01 04:00:00-05:00,MSFT,235.87,236.00,235.87,236.00,1429,235.938887
2,2021-03-01 04:01:00-05:00,AAPL,124.56,124.60,124.30,124.30,9439,124.481323
3,2021-03-01 04:01:00-05:00,MSFT,236.17,236.17,236.17,236.17,104,236.161538
4,2021-03-01 04:02:00-05:00,AAPL,124.00,124.05,123.78,123.78,4834,123.935583
...,...,...,...,...,...,...,...,...
33859,2021-03-31 19:57:00-04:00,MSFT,237.28,237.28,237.28,237.28,507,237.367870
33860,2021-03-31 19:58:00-04:00,AAPL,122.36,122.39,122.33,122.39,3403,122.360544
33861,2021-03-31 19:58:00-04:00,MSFT,237.40,237.40,237.35,237.35,636,237.378066
33862,2021-03-31 19:59:00-04:00,AAPL,122.39,122.45,122.38,122.45,5560,122.402606


We can see the 1 minute data above. Neat!

## Alpaca Crypto

[AlpacaCrypto](https://www.pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.AlpacaCrypto) can be used to fetch crypto data from Alpaca:

In [11]:
from pybroker import AlpacaCrypto

crypto = AlpacaCrypto(
    os.environ['ALPACA_API_KEY'], 
    os.environ['ALPACA_API_SECRET'], 
    exchange='CBSE'
)
df = crypto.query('BTCUSD', start_date='1/1/2021', end_date='2/1/2021', timeframe='1h')
df

Loading bar data...
Loaded bar data: 0:00:02 



Unnamed: 0,symbol,date,open,high,low,close,volume,vwap,trade_count
0,BTCUSD,2021-01-01 00:00:00-05:00,29287.95,29305.32,29148.60,29240.79,860.832582,29225.508289,7067
1,BTCUSD,2021-01-01 01:00:00-05:00,29246.06,29320.04,29158.93,29233.94,977.010313,29231.432535,7130
2,BTCUSD,2021-01-01 02:00:00-05:00,29233.93,29234.87,28900.00,29166.41,1226.293530,29084.702280,6890
3,BTCUSD,2021-01-01 03:00:00-05:00,29166.41,29246.63,28928.16,29074.87,850.251282,29096.221572,11113
4,BTCUSD,2021-01-01 04:00:00-05:00,29071.63,29386.28,29039.15,29280.43,517.123045,29241.740654,8740
...,...,...,...,...,...,...,...,...,...
740,BTCUSD,2021-01-31 20:00:00-05:00,32587.51,33561.57,32515.49,33450.00,1086.327588,33110.900911,11618
741,BTCUSD,2021-01-31 21:00:00-05:00,33450.00,33850.00,33248.80,33673.25,1002.629634,33597.702245,10695
742,BTCUSD,2021-01-31 22:00:00-05:00,33672.72,33780.00,33481.05,33590.17,636.194746,33630.418590,8520
743,BTCUSD,2021-01-31 23:00:00-05:00,33590.70,33958.34,33318.97,33581.76,726.182833,33671.944503,9552


The ``exchange`` argument specified which exchange the data came from (Coinbase in this case).

[Next up, we will take a look at how to use a DataSource to backtest a simple trading strategy](https://www.pybroker.com/en/latest/notebooks/2.%20Backtesting%20a%20Strategy.html).