# Getting Started with Data Sources

Welcome to **PyBroker**! The best place to start is to learn about [DataSources](https://pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource).  A ```DataSource``` is used to fetch data from an external source, and that data can then be used to backtest a trading strategy.

## Yahoo Finance

**PyBroker** includes a few ```DataSources``` by default. The first is [Yahoo Finance](https://finance.yahoo.com), imported below:

In [1]:
import pybroker
from pybroker.data import YFinance

An instance of [YFinance](https://pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.YFinance) can be used to query data for multiple stock tickers:

In [2]:
yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df

Downloading bar data...
[*********************100%***********************]  2 of 2 completed
Finished download: 0:00:02 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01 05:00:00,AAPL,123.750000,127.930000,122.790001,127.790001,116307900,126.462860
1,2021-03-01 05:00:00,MSFT,235.899994,237.470001,233.149994,236.940002,25324000,233.325287
2,2021-03-02 05:00:00,AAPL,128.410004,128.720001,125.010002,125.120003,102260900,123.820595
3,2021-03-02 05:00:00,MSFT,237.009995,237.300003,233.449997,233.869995,22812500,230.302109
4,2021-03-03 05:00:00,AAPL,124.809998,125.709999,121.839996,122.059998,112966300,120.792358
...,...,...,...,...,...,...,...,...
501,2022-02-24 05:00:00,MSFT,272.510010,295.160004,271.519989,294.589996,56989700,292.458740
502,2022-02-25 05:00:00,AAPL,163.839996,165.119995,160.869995,164.850006,91974200,164.107590
503,2022-02-25 05:00:00,MSFT,295.140015,297.630005,291.649994,297.309998,32546700,295.159058
504,2022-02-28 05:00:00,AAPL,163.059998,165.419998,162.429993,165.119995,95056600,164.376373


The data above was returned in the form of a [Pandas](https://pandas.pydata.org/) [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

## Caching Data

Caching the query's data can be enabled by calling [pybroker.enable_data_source_cache](https://pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache):

In [3]:
pybroker.enable_data_source_cache('yfinance')

<diskcache.core.Cache at 0x7f488a9a1ba0>

The argument passed in is the name to use for the cache. Passing a different name will store and retrieve data using a different cache located on disk.

The next call to [query](https://pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.DataSource.query) will cache the returned data to disk. The data is cached for each unique combination of ticker symbol and date range:

In [4]:
yfinance.query(['AAPL', 'MSFT'], '3/1/2021', '3/1/2022')

Downloading bar data...
[*********************100%***********************]  2 of 2 completed
Finished download: 0:00:00 



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01 05:00:00,AAPL,123.750000,127.930000,122.790001,127.790001,116307900,126.462860
1,2021-03-01 05:00:00,MSFT,235.899994,237.470001,233.149994,236.940002,25324000,233.325287
2,2021-03-02 05:00:00,AAPL,128.410004,128.720001,125.010002,125.120003,102260900,123.820595
3,2021-03-02 05:00:00,MSFT,237.009995,237.300003,233.449997,233.869995,22812500,230.302109
4,2021-03-03 05:00:00,AAPL,124.809998,125.709999,121.839996,122.059998,112966300,120.792351
...,...,...,...,...,...,...,...,...
501,2022-02-24 05:00:00,MSFT,272.510010,295.160004,271.519989,294.589996,56989700,292.458710
502,2022-02-25 05:00:00,AAPL,163.839996,165.119995,160.869995,164.850006,91974200,164.107590
503,2022-02-25 05:00:00,MSFT,295.140015,297.630005,291.649994,297.309998,32546700,295.159088
504,2022-02-28 05:00:00,AAPL,163.059998,165.419998,162.429993,165.119995,95056600,164.376373


Calling ```query``` again with the same ticker symbols and date range returns the cached data:

In [5]:
df = yfinance.query(['AAPL', 'MSFT'], '3/1/2021', '3/1/2022')
df

Loaded cached bar data.



Unnamed: 0,date,symbol,open,high,low,close,volume,adj_close
0,2021-03-01 05:00:00,AAPL,123.750000,127.930000,122.790001,127.790001,116307900,126.462860
1,2021-03-02 05:00:00,AAPL,128.410004,128.720001,125.010002,125.120003,102260900,123.820595
2,2021-03-03 05:00:00,AAPL,124.809998,125.709999,121.839996,122.059998,112966300,120.792351
3,2021-03-04 05:00:00,AAPL,121.750000,123.599998,118.620003,120.129997,178155000,118.882393
4,2021-03-05 05:00:00,AAPL,120.980003,121.940002,117.570000,121.419998,153766600,120.159004
...,...,...,...,...,...,...,...,...
248,2022-02-22 05:00:00,MSFT,285.000000,291.540009,284.500000,287.720001,41736100,285.638458
249,2022-02-23 05:00:00,MSFT,290.179993,291.700012,280.100006,280.269989,37811200,278.242340
250,2022-02-24 05:00:00,MSFT,272.510010,295.160004,271.519989,294.589996,56989700,292.458710
251,2022-02-25 05:00:00,MSFT,295.140015,297.630005,291.649994,297.309998,32546700,295.159088


The cached data can be cleared with [clear_data_source_cache](https://pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.clear_data_source_cache):

In [6]:
pybroker.clear_data_source_cache()

And [disable_data_source_cache](https://pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.disable_data_source_cache) can be used to disable caching data retrieved from ```DataSources```:

In [7]:
pybroker.disable_data_source_cache()

The calls above must be made after first calling [enable_data_source_cache](https://pybroker.com/en/latest/reference/pybroker.cache.html#pybroker.cache.enable_data_source_cache).

## Alpaca

**PyBroker** also includes an [Alpaca](https://alpaca.markets/) ```DataSource```. To begin using it, import [Alpaca](https://pybroker.com/en/latest/reference/pybroker.data.html#pybroker.data.Alpaca): 

In [8]:
from pybroker import Alpaca

And then specify your Alpaca API key and secret:

In [9]:
import os

alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])

An ```Alpaca``` instance is queried the same way, but it is also able to query by different timeframes supported by Alpaca. Let's query 1 minute data:

In [10]:
df = alpaca.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='4/1/2021', timeframe='1m')
df

Downloading bar data...
Finished download: 0:00:06 



Unnamed: 0,date,symbol,open,high,low,close,volume,vwap
0,2021-03-01 09:00:00-05:00,AAPL,124.30,124.56,124.30,124.50,12267,124.433365
1,2021-03-01 09:00:00-05:00,MSFT,235.87,236.00,235.87,236.00,1429,235.938887
2,2021-03-01 09:01:00-05:00,AAPL,124.56,124.60,124.30,124.30,9439,124.481323
3,2021-03-01 09:01:00-05:00,MSFT,236.17,236.17,236.17,236.17,104,236.161538
4,2021-03-01 09:02:00-05:00,AAPL,124.00,124.05,123.78,123.78,4834,123.935583
...,...,...,...,...,...,...,...,...
33859,2021-03-31 23:57:00-04:00,MSFT,237.28,237.28,237.28,237.28,507,237.367870
33860,2021-03-31 23:58:00-04:00,AAPL,122.36,122.39,122.33,122.39,3403,122.360544
33861,2021-03-31 23:58:00-04:00,MSFT,237.40,237.40,237.35,237.35,636,237.378066
33862,2021-03-31 23:59:00-04:00,AAPL,122.39,122.45,122.38,122.45,5560,122.402606


We can see the 1 minute data above. Neat! [Next up, we will take a look at how to use a DataSource to backtest a simple trading strategy](https://pybroker.com/en/latest/notebooks/2.%20Backtesting%20a%20Strategy.html).