# Data availability tutorial

#### Sections
* [Data availability](#Data-availability)
* [Base intervals](#Base-intervals)
* [PricesIntradayUnavailableError](#PricesIntradayUnavailableError)
    * [`strict`](#strict)
* [LastIndiceInaccurateError](#LastIndiceInaccurateError)
    * [`priority`](#priority)
    * [`composite`](#composite)
        * [.pt accessor methods for composite tables](#.pt-accessor-methods-for-composite-tables)
    * [Representing period end with 'greatest possible accuracy'](#Representing-period-end-with-'greatest-possible-accuracy')
    * [Relevance of anchor](#Relevance-of-anchor)  
        *[`anchor` "open"](#anchor-"open")  
        *[`anchor` "workback"](#anchor-"workback")

#### Note

The cell **outputs** shown in this tutorial are based on executing the cells at the time shown in the output of the following cell. Simply rerun the cells to bring any dynamic output up to date.

In [2]:
import pandas as pd
from zoneinfo import ZoneInfo
now = pd.Timestamp.now(tz=ZoneInfo("UTC")).floor("T")
print(f"{now!r}")
print(f"{now.astimezone(ZoneInfo('America/New_York'))!r}")

Timestamp('2022-05-13 11:42:00+0000', tz='UTC')
Timestamp('2022-05-13 07:42:00-0400', tz='America/New_York')


## Setup

Run the following cell to import tutorial dependencies.

In [3]:
from market_prices import PricesYahoo, helpers
from market_prices.support import tutorial_helpers as th

Run the following cell to instantiate prices objects and define values used in the first part of this tutorial.

In [4]:
prices = PricesYahoo("MSFT")  # prices for US stock Microsoft
xnys = prices.calendar_default
start_T1, end_T1 = th.get_sessions_range_for_bi(prices, prices.bis.T1)
start_T2 = th.get_sessions_range_for_bi(prices, prices.bis.T2)[0]
start_T5 = th.get_sessions_range_for_bi(prices, prices.bis.T5)[0]
start_H1 = th.get_sessions_range_for_bi(prices, prices.bis.H1)[0]
session = session_T1 = start_T1
session_T2 = start_T2
session_T5 = start_T5
start_T5_oob = helpers.to_tz_naive(xnys.session_offset(start_T5, -2))
start_H1_oob = helpers.to_tz_naive(xnys.session_offset(start_H1, -2))

## Data availability

`market_prices` processes price data into useful datasets. The data itself is requested from a data provider, by default from Yahoo APIs (see [disclaimers](../../README.md) section of README.md) via [yahooquery](https://github.com/dpguthrie/yahooquery).

Data providers typically limit the availability of historic data. Play around with `market_prices` long enough and this will be testified with a `PricesIntradayUnavailableError` raising its head.

This tutorial explains:
* What data a Prices class requests from the provider.
* How to query what data is available.
* Options when data is only available to partially fulfil a request: `strict`, `priority`, and `composite`.

## Base intervals

Data providers offer OHLCV data at specific intervals. `yahooquery` provides for data to be requested from Yahoo! at intervals such as 1 min, 2 mins, 5 mins, 15 mins, 1 hour, 1 day, 5 days, 1 week, 1 month etc.

`market_prices` in contrast places hardly any restrictions on the `interval` that can be requested, for example...

In [5]:
prices.get("87T", session, session)

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-04-13 09:30:00, 2022-04-13 10:57:00)",282.049988,285.26001,281.299988,283.950012,6035620.0
"[2022-04-13 10:57:00, 2022-04-13 12:24:00)",283.940002,286.339996,283.589996,286.160004,3338831.0
"[2022-04-13 12:24:00, 2022-04-13 13:51:00)",286.190002,286.859985,285.619995,285.670013,2568882.0
"[2022-04-13 13:51:00, 2022-04-13 15:18:00)",285.660004,288.089996,285.279999,288.070007,2911567.0
"[2022-04-13 15:18:00, 2022-04-13 16:45:00)",288.070007,288.579987,287.299988,287.600006,3473701.0


The data above was obviously not requested from Yahoo! at an 87 minute interval. Rather the data was requested at a 'base interval' that's a factor of the requested interval and this base data (in this case 1 minute data) was then downsampled to the requested interval.

The 'base intervals' at which a Prices class requests data from the provider are contained in a `BaseInterval` enum.

In [6]:
PricesYahoo.BaseInterval.__members__

mappingproxy({'T1': <BaseInterval.T1: Timedelta('0 days 00:01:00')>,
              'T2': <BaseInterval.T2: Timedelta('0 days 00:02:00')>,
              'T5': <BaseInterval.T5: Timedelta('0 days 00:05:00')>,
              'H1': <BaseInterval.H1: Timedelta('0 days 01:00:00')>,
              'D1': <BaseInterval.D1: Timedelta('1 days 00:00:00')>})

This shows that the PricesYahoo class only requests data from Yahoo! at four intraday intervals (1 min, 2 mins, 5 mins and 1 hour) and the daily interval. Requests for any other interval are evaluated by downsampling data at one of these base intervals.

Why request prices at only 5 different intervals when `yahooquery` provides for requesting data at so many more? ...

Data providers typically limit the period over which prices can be requested, with a shorter period available for smaller intervals. For example, the Yahoo! API offers up to the last 60 days of data at a 5 minute interval although only the last 30 days at a 1 minute interval. The limits for the base intervals are stored in the class attribute `BASE_LIMITS`. This holds a dictionary with keys as base intervals and values as the correspoinding period over which prices are available.

In [7]:
PricesYahoo.BASE_LIMITS

{<BaseInterval.T1: Timedelta('0 days 00:01:00')>: Timedelta('30 days 00:00:00'),
 <BaseInterval.T2: Timedelta('0 days 00:02:00')>: Timedelta('43 days 00:00:00'),
 <BaseInterval.T5: Timedelta('0 days 00:05:00')>: Timedelta('60 days 00:00:00'),
 <BaseInterval.H1: Timedelta('0 days 01:00:00')>: Timedelta('730 days 00:00:00'),
 <BaseInterval.D1: Timedelta('1 days 00:00:00')>: None}

Price data can be requested via `yahooquery` at a 15 minute interval although at this interval only 60 days of price data are available, the same as for the 5 minute interval. Requesting data at 15 minutes would therefore add nothing that cannot be evaluated from data requested at 5 minutes.

**Base intevals represent the fewest intervals that collectively allow for all available data to be requested from the data provider.**

A Prices class never requests the same data twice from the source (rather subsequent requests are served from local tables that are built up with each request to the data provider). Limiting the number of base intervals has the benefit of minimising the amount of data that need be stored.

That's as far as this tutorial will delve into 'how' price data is requested and served. If you're interested in what's going on under-the-bonnet, further explanation can be found in the 'Serving Price Data' section of `prices.base.PricesBase.__doc__`.

Before moving on, the instance property `limits` is worth a mention. This returns a dictionary with keys as base intervals and values as 2-tuples that describe the left and right limits of the period over which prices are currently available.

In [8]:
prices.limits

{<BaseInterval.T1: Timedelta('0 days 00:01:00')>: (Timestamp('2022-04-13 11:44:00+0000', tz='UTC'),
  Timestamp('2022-05-13 11:43:00+0000', tz='UTC')),
 <BaseInterval.T2: Timedelta('0 days 00:02:00')>: (Timestamp('2022-03-31 11:44:00+0000', tz='UTC'),
  Timestamp('2022-05-13 11:44:00+0000', tz='UTC')),
 <BaseInterval.T5: Timedelta('0 days 00:05:00')>: (Timestamp('2022-03-14 11:44:00+0000', tz='UTC'),
  Timestamp('2022-05-13 11:47:00+0000', tz='UTC')),
 <BaseInterval.H1: Timedelta('0 days 01:00:00')>: (Timestamp('2020-05-13 11:44:00+0000', tz='UTC'),
  Timestamp('2022-05-13 12:42:00+0000', tz='UTC')),
 <BaseInterval.D1: Timedelta('1 days 00:00:00')>: (Timestamp('1986-03-13 00:00:00'),
  Timestamp('2022-05-13 00:00:00'))}

## `PricesIntradayUnavailableError`

Look what happens if 4 minute data is requested for a session for which base data shorter than 5 minutes is not available.

In [9]:
session_T5  # for reference

Timestamp('2022-03-14 00:00:00', freq='C')

In [None]:
prices.get("4T", session_T5, session_T5)

```
---------------------------------------------------------------------------
PricesIntradayUnavailableError            Traceback (most recent call last)
<ipython-input-10-a1c31ac9ba37> in <module>
----> 1 prices.get("4T", session_T5, session_T5)

PricesIntradayUnavailableError: Data is unavailable at a sufficiently low base interval to evaluate prices at interval 0 days 00:04:00 anchored 'Anchor.OPEN'.
Base intervals that are a factor of 0 days 00:04:00:
	[<BaseInterval.T1: Timedelta('0 days 00:01:00')>, <BaseInterval.T2: Timedelta('0 days 00:02:00')>].
The earliest minute from which data is available at 0 days 00:02:00 is 2022-03-31 13:30:00+00:00, although at this base interval the requested period evaluates to (Timestamp('2022-03-14 13:30:00+0000', tz='UTC'), Timestamp('2022-03-14 20:02:00+0000', tz='UTC')).
Period evaluated from parameters: {'minutes': 0, 'hours': 0, 'days': 0, 'weeks': 0, 'months': 0, 'years': 0, 'start': Timestamp('2022-03-14 00:00:00', freq='C'), 'end': Timestamp('2022-03-14 00:00:00', freq='C'), 'add_a_row': False}.
```

A `PricesIntradayUnavailableError` is raised. The message explains that data is unavailable at a sufficiently low base interval to evaluate prices for the requested period. The base intervals from which the requested interval could be evaluated are listed, in this case T1 and T2, together with the earliest date that data at one of these base intervals is available. This availability date is then compared with the period start date.

The options are pretty simple, either go with a higher interval that can be served from a base interval available over the required period...

In [11]:
prices.get("5T", session_T5, session_T5)[:2]  # only show first two lines

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 09:30:00, 2022-03-14 09:35:00)",280.25,282.359985,280.01001,281.869995,1721154
"[2022-03-14 09:35:00, 2022-03-14 09:40:00)",281.890015,282.850006,281.040009,281.880005,720010


Or if you really had your mind set on that particular interval, change the period start to fall no later than the limit from when data at a viable base interval is available...

In [12]:
# for reference
session_T2

Timestamp('2022-03-31 00:00:00', freq='C')

In [13]:
prices.get("4T", session_T2, session_T2)[:2]  # only show first two lines

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-31 09:30:00, 2022-03-31 09:34:00)",313.899994,315.140015,312.769989,312.869995,989434.0
"[2022-03-31 09:34:00, 2022-03-31 09:38:00)",312.859985,313.0,311.299988,312.321991,548154.0


In the above example prices were unavailable as a result of viable base data being unavailable over the full requested period. A more common issue is where data is unavailable over only the earlier part of the period. For example, consider a period to 'now' and starting from a session for which data is only available at base intervals of 5T or longer.

In [14]:
# for reference
session_T5

Timestamp('2022-03-14 00:00:00', freq='C')

In [None]:
prices.get("4T", session_T5)

```
---------------------------------------------------------------------------
PricesIntradayUnavailableError            Traceback (most recent call last)
<ipython-input-15-935d6ce1dd8f> in <module>
----> 1 prices.get("4T", session_T5)

PricesIntradayUnavailableError: Data is unavailable at a sufficiently low base interval to evaluate prices at interval 0 days 00:04:00 anchored 'Anchor.OPEN'.
Base intervals that are a factor of 0 days 00:04:00:
	[<BaseInterval.T1: Timedelta('0 days 00:01:00')>, <BaseInterval.T2: Timedelta('0 days 00:02:00')>].
The earliest minute from which data is available at 0 days 00:02:00 is 2022-03-31 13:30:00+00:00, although at this base interval the requested period evaluates to (Timestamp('2022-03-14 13:30:00+0000', tz='UTC'), Timestamp('2022-05-12 20:02:00+0000', tz='UTC')).
Period evaluated from parameters: {'minutes': 0, 'hours': 0, 'days': 0, 'weeks': 0, 'months': 0, 'years': 0, 'start': Timestamp('2022-03-14 00:00:00', freq='C'), 'end': None, 'add_a_row': False}.
Data is available from 2022-03-31 13:30:00+00:00 through to the end of the requested period. Consider passing `strict` as False to return prices for this part of the period.
```

Again a `PricesIntradayUnavailableError` is raised as data is not available at a sufficiently low base interval to evaluate prices over the full period. However, as advised at the end of the error message, when data is available over the later part of the period there's another option.

### `strict`

`strict` determines what to do in the event that data is only available over the later part of the requested period. When data is not available over the full period the default behaviour (`strict`=True) is to raise an error (as above). Alternatively, passing `strict` as **False** will return prices only for the part of the period over which data is available.

In [16]:
prices.get("4T", session_T5, strict=False)

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-31 09:30:00, 2022-03-31 09:34:00)",313.899994,315.140015,312.769989,312.869995,989434.0
"[2022-03-31 09:34:00, 2022-03-31 09:38:00)",312.859985,313.000000,311.299988,312.321991,548154.0
"[2022-03-31 09:38:00, 2022-03-31 09:42:00)",312.325012,312.370514,311.450012,311.529999,333904.0
"[2022-03-31 09:42:00, 2022-03-31 09:46:00)",311.500000,312.515015,311.350006,312.029999,264671.0
"[2022-03-31 09:46:00, 2022-03-31 09:50:00)",312.079987,312.100006,311.290009,311.940002,305562.0
...,...,...,...,...,...
"[2022-05-12 15:42:00, 2022-05-12 15:46:00)",253.600006,254.410004,252.589996,253.119995,585341.0
"[2022-05-12 15:46:00, 2022-05-12 15:50:00)",253.100006,253.699997,252.610001,252.880005,544063.0
"[2022-05-12 15:50:00, 2022-05-12 15:54:00)",253.050003,254.660004,252.970001,254.029999,950715.0
"[2022-05-12 15:54:00, 2022-05-12 15:58:00)",254.029999,255.119995,253.259995,255.085007,1227302.0


The table starts notably later than the requested period start, although includes all data that can be provided at a 4T interval.

## `LastIndiceInaccurateError`

A separate consideration to the period over which data is available is whether data is available at a sufficiently low base interval to accurately represent the requested period end. (`market_prices` gives importance to the period end to allow for price changes to be reliably calculated to a specific time.)

Consider a period end that can only be represented by T1 or T5 data...

In [17]:
end = xnys.session_close(end_T1) - pd.Timedelta(15, "T")
end = end.astimezone(prices.tz_default)
end

Timestamp('2022-05-12 15:45:00-0400', tz='America/New_York')

If the requested period starts from a session for which T5 data is available then all's well and good.

In [18]:
print(f"{start_T5=}\n")  # for reference

prices.get(start=start_T5, end=end)

start_T5=Timestamp('2022-03-14 00:00:00', freq='C')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 09:30:00, 2022-03-14 09:35:00)",280.250000,282.359985,280.010010,281.869995,1721154.0
"[2022-03-14 09:35:00, 2022-03-14 09:40:00)",281.890015,282.850006,281.040009,281.880005,720010.0
"[2022-03-14 09:40:00, 2022-03-14 09:45:00)",281.829987,284.769989,281.369995,284.290009,808963.0
"[2022-03-14 09:45:00, 2022-03-14 09:50:00)",284.269989,284.329987,282.480011,283.109985,593177.0
"[2022-03-14 09:50:00, 2022-03-14 09:55:00)",283.019989,283.699005,282.470001,283.109985,367264.0
...,...,...,...,...,...
"[2022-05-12 15:20:00, 2022-05-12 15:25:00)",252.850006,254.199997,252.647095,253.550003,678724.0
"[2022-05-12 15:25:00, 2022-05-12 15:30:00)",253.570007,254.175003,252.360001,252.470001,777827.0
"[2022-05-12 15:30:00, 2022-05-12 15:35:00)",252.460007,254.050003,252.410004,253.250000,487619.0
"[2022-05-12 15:35:00, 2022-05-12 15:40:00)",253.220001,253.899994,252.699997,253.750000,584798.0


The full requested period is available and the right of the final indice aligns with the requested period end.

However, if the period is extended to start from a session for which 5T data is not available...

In [19]:
start_T5_oob  # T5 out-of-bounds session, for reference

Timestamp('2022-03-10 00:00:00', freq='C')

In [None]:
prices.get(start=start_T5_oob, end=end)

```
---------------------------------------------------------------------------
LastIndiceInaccurateError                 Traceback (most recent call last)
<ipython-input-20-175366e416c2> in <module>
----> 1 prices.get(start=start_T5_oob, end=end)

LastIndiceInaccurateError: Full period available at the following intraday base intervals although these do not allow for representing the end indice with the greatest possible accuracy:
	[<BaseInterval.H1: Timedelta('0 days 01:00:00')>].
The following base intervals could represent the end indice with the greatest possible accuracy although have insufficient data available to cover the full period:
	[<BaseInterval.T1: Timedelta('0 days 00:01:00')>, <BaseInterval.T5: Timedelta('0 days 00:05:00')>].
The earliest minute from which data is available at 0 days 00:05:00 is 2022-03-14 13:30:00+00:00, although at this base interval the requested period evaluates to (Timestamp('2022-03-10 14:30:00+0000', tz='UTC'), Timestamp('2022-05-12 19:45:00+0000', tz='UTC')).
Period evaluated from parameters: {'minutes': 0, 'hours': 0, 'days': 0, 'weeks': 0, 'months': 0, 'years': 0, 'start': Timestamp('2022-03-10 00:00:00', freq='C'), 'end': Timestamp('2022-05-12 19:45:00+0000', tz='UTC'), 'add_a_row': False}.
Data that can express the period end with the greatest possible accuracy is available from 2022-03-14 13:30:00+00:00. Pass `strict` as False to return prices for this part of the period.
Alternatively, consider creating a composite table (pass `composite` as True) or passing `priority` as 'period'.
```

A `LastIndiceInaccurateError` is raised. The error message explains that although intraday data is available over the full period, it's not available at a sufficiently low base interval to accurately represent the period end. Base intervals are listed for which data is available over the full period and, separately, those that could accurately align with the requested period end.

In short, prices could be returned EITHER over the whole period OR that align with the period end, but not both.

As suggested towards the end of the error message, passing `strict` as True will return prices for the part of the period over which data is available that can express the period end with the greatest possible accuracy.

In [21]:
print(f"{start_T5_oob=}\n{end=}\n")    # for reference

prices.get(start=start_T5_oob, end=end, strict=False)

start_T5_oob=Timestamp('2022-03-10 00:00:00', freq='C')
end=Timestamp('2022-05-12 15:45:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 09:30:00, 2022-03-14 09:35:00)",280.250000,282.359985,280.010010,281.869995,1721154.0
"[2022-03-14 09:35:00, 2022-03-14 09:40:00)",281.890015,282.850006,281.040009,281.880005,720010.0
"[2022-03-14 09:40:00, 2022-03-14 09:45:00)",281.829987,284.769989,281.369995,284.290009,808963.0
"[2022-03-14 09:45:00, 2022-03-14 09:50:00)",284.269989,284.329987,282.480011,283.109985,593177.0
"[2022-03-14 09:50:00, 2022-03-14 09:55:00)",283.019989,283.699005,282.470001,283.109985,367264.0
...,...,...,...,...,...
"[2022-05-12 15:20:00, 2022-05-12 15:25:00)",252.850006,254.199997,252.647095,253.550003,678724.0
"[2022-05-12 15:25:00, 2022-05-12 15:30:00)",253.570007,254.175003,252.360001,252.470001,777827.0
"[2022-05-12 15:30:00, 2022-05-12 15:35:00)",252.460007,254.050003,252.410004,253.250000,487619.0
"[2022-05-12 15:35:00, 2022-05-12 15:40:00)",253.220001,253.899994,252.699997,253.750000,584798.0


Notice that the data starts later than the requested period start.

If the start of the data is important then, as suggested at the end of the error message, there are a couple of options that can help out, `priority` and `composite`.

### `priority`
The `priority` option determines what should be prioritised if it's only possible to return prices EITHER for the full requested period ("period") OR with a final indice that represents the period end with the greatest possible accuracy ("end").

By default the priority is "end". However, by default prices will not be returned for a lesser period than that requested in order to best-align the final indice with the period end. Rather, as in the above example, it's necessary to also pass `strict` as False to explicitely accept that prices can be returned for only part of the requested period.

If the accuracy of the period end is not of concern then `priority` can be passed as "period" to ask that prices reflect the full requested period, even if that comes at the expense of the final indice not best reflecting the requested period end.

In [22]:
print(f"{start_T5_oob=}\n{end=}\n")    # for reference

prices.get(start=start_T5_oob, end=end, priority="period")

start_T5_oob=Timestamp('2022-03-10 00:00:00', freq='C')
end=Timestamp('2022-05-12 15:45:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-10 09:30:00, 2022-03-10 10:30:00)",283.019989,285.089996,282.360107,283.450012,7524189.0
"[2022-03-10 10:30:00, 2022-03-10 11:30:00)",283.420013,283.950012,280.799988,281.119995,3581266.0
"[2022-03-10 11:30:00, 2022-03-10 12:30:00)",281.140015,283.000000,280.579987,282.679993,3235163.0
"[2022-03-10 12:30:00, 2022-03-10 13:30:00)",282.670013,283.915009,281.960114,283.799988,2395787.0
"[2022-03-10 13:30:00, 2022-03-10 14:30:00)",283.799988,286.579987,283.399994,285.674988,3052864.0
...,...,...,...,...,...
"[2022-05-12 10:30:00, 2022-05-12 11:30:00)",257.170013,259.880005,254.220001,259.010010,6693295.0
"[2022-05-12 11:30:00, 2022-05-12 12:30:00)",259.040009,259.320007,253.777893,254.830002,4884195.0
"[2022-05-12 12:30:00, 2022-05-12 13:30:00)",254.839996,256.410004,252.389999,253.100006,4579244.0
"[2022-05-12 13:30:00, 2022-05-12 14:30:00)",253.097900,253.550003,250.220001,250.975006,5689406.0


With `priority` as "period" the prices are now returned at a one hour interval. At this interval prices are available over the full requested period, although the final indice can not align with requested period end, rather it falls a bit short (probably by 15 minutes).

### `composite`

If both the length of the period and the accuracy of the period end are important, then a composite table offers the best of both worlds, albeit at the expense of the table not having a regular interval.

To accept a composite table, just pass `composite` as True. If the request can be fulfilled from a single base interval, it will be, although if the base interval for which data is available over the full requested period difers from the base interval that can express the period end with the greatest possible accuracy then a composite table will be returned...

In [23]:
print(f"{start_T5_oob=}\n{end=}\n")    # for reference

df = prices.get(start=start_T5_oob, end=end, composite=True)
df

start_T5_oob=Timestamp('2022-03-10 00:00:00', freq='C')
end=Timestamp('2022-05-12 15:45:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-10 09:30:00, 2022-03-10 10:30:00)",283.019989,285.089996,282.360107,283.450012,7524189.0
"[2022-03-10 10:30:00, 2022-03-10 11:30:00)",283.420013,283.950012,280.799988,281.119995,3581266.0
"[2022-03-10 11:30:00, 2022-03-10 12:30:00)",281.140015,283.000000,280.579987,282.679993,3235163.0
"[2022-03-10 12:30:00, 2022-03-10 13:30:00)",282.670013,283.915009,281.960114,283.799988,2395787.0
"[2022-03-10 13:30:00, 2022-03-10 14:30:00)",283.799988,286.579987,283.399994,285.674988,3052864.0
...,...,...,...,...,...
"[2022-05-12 15:20:00, 2022-05-12 15:25:00)",252.850006,254.199997,252.647095,253.550003,678724.0
"[2022-05-12 15:25:00, 2022-05-12 15:30:00)",253.570007,254.175003,252.360001,252.470001,777827.0
"[2022-05-12 15:30:00, 2022-05-12 15:35:00)",252.460007,254.050003,252.410004,253.250000,487619.0
"[2022-05-12 15:35:00, 2022-05-12 15:40:00)",253.220001,253.899994,252.699997,253.750000,584798.0


A composite table has two different intervals. It combines data at a higher interval which can serve the full period with data of a lower interval that can most accurately align with the period end.

The example above combines hourly data that can cover the full period with 5 minute data which can accurately express the period end.

In [24]:
df.pt.indices_length

0 days 01:00:00    313
0 days 00:05:00     15
dtype: int64

The following cell shows the rows of the table over which the interval changes.

In [25]:
i = df.pt.indices_length[pd.Timedelta(1, "H")]
df[i-2:i+2]

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-05-12 12:30:00, 2022-05-12 13:30:00)",254.839996,256.410004,252.389999,253.100006,4579244.0
"[2022-05-12 13:30:00, 2022-05-12 14:30:00)",253.0979,253.550003,250.220001,250.975006,5689406.0
"[2022-05-12 14:30:00, 2022-05-12 14:35:00)",250.949997,250.990005,250.110001,250.630005,440496.0
"[2022-05-12 14:35:00, 2022-05-12 14:40:00)",250.619995,250.626099,250.029999,250.207901,349733.0


If intraday data is not available over the start of a period then a composite table will happily combine daily and intraday data.

In [26]:
print(f"{start_H1_oob=}\n{end=}\n")    # for reference

df_comp = prices.get(start=start_H1_oob, end=end, composite=True)
df_comp

start_H1_oob=Timestamp('2020-05-11 00:00:00', freq='C')
end=Timestamp('2022-05-12 15:45:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2020-05-11, 2020-05-11)",183.149994,187.509995,182.850006,186.740005,30892700.0
"[2020-05-12, 2020-05-12)",186.800003,187.039993,182.300003,182.509995,32038200.0
"[2020-05-13, 2020-05-13)",182.550003,184.050003,176.539993,179.750000,44711500.0
"[2020-05-14, 2020-05-14)",177.539993,180.690002,175.679993,180.529999,41873900.0
"[2020-05-15, 2020-05-15)",179.059998,187.059998,177.000000,183.160004,46610400.0
...,...,...,...,...,...
"[2022-05-12 19:20:00, 2022-05-12 19:25:00)",252.850006,254.199997,252.647095,253.550003,678724.0
"[2022-05-12 19:25:00, 2022-05-12 19:30:00)",253.570007,254.175003,252.360001,252.470001,777827.0
"[2022-05-12 19:30:00, 2022-05-12 19:35:00)",252.460007,254.050003,252.410004,253.250000,487619.0
"[2022-05-12 19:35:00, 2022-05-12 19:40:00)",253.220001,253.899994,252.699997,253.750000,584798.0


Note that a Daily/Intraday composite table is always returned with timezone as UTC.

#### **.pt accessor** methods for composite tables

The .pt accessor has a couple of useful methods to get the respective parts of composite tables comprising daily and intraday data.

In [27]:
df_comp.pt.daily_part

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
2020-05-11,183.149994,187.509995,182.850006,186.740005,30892700.0
2020-05-12,186.800003,187.039993,182.300003,182.509995,32038200.0
2020-05-13,182.550003,184.050003,176.539993,179.750000,44711500.0
2020-05-14,177.539993,180.690002,175.679993,180.529999,41873900.0
2020-05-15,179.059998,187.059998,177.000000,183.160004,46610400.0
...,...,...,...,...,...
2022-05-05,285.540009,286.350006,274.339996,277.350006,43260400.0
2022-05-06,274.809998,279.250000,271.269989,274.730011,37748300.0
2022-05-09,270.059998,272.359985,263.320007,264.579987,47726000.0
2022-05-10,271.690002,273.750000,265.070007,269.500000,39336400.0


In [28]:
df_comp.pt.intraday_part

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-05-12 13:30:00, 2022-05-12 13:35:00)",257.690002,258.510010,254.481003,254.500000,3177997.0
"[2022-05-12 13:35:00, 2022-05-12 13:40:00)",254.519897,255.559998,253.009995,253.080002,1042469.0
"[2022-05-12 13:40:00, 2022-05-12 13:45:00)",253.050003,256.019989,252.979996,255.649994,1144518.0
"[2022-05-12 13:45:00, 2022-05-12 13:50:00)",255.639999,257.859985,255.240005,256.000000,739530.0
"[2022-05-12 13:50:00, 2022-05-12 13:55:00)",256.000000,256.649994,255.300003,256.154999,723985.0
...,...,...,...,...,...
"[2022-05-12 19:20:00, 2022-05-12 19:25:00)",252.850006,254.199997,252.647095,253.550003,678724.0
"[2022-05-12 19:25:00, 2022-05-12 19:30:00)",253.570007,254.175003,252.360001,252.470001,777827.0
"[2022-05-12 19:30:00, 2022-05-12 19:35:00)",252.460007,254.050003,252.410004,253.250000,487619.0
"[2022-05-12 19:35:00, 2022-05-12 19:40:00)",253.220001,253.899994,252.699997,253.750000,584798.0


### Representing period end with **'greatest possible accuracy'**

There's been various references to the final indice expressing the period end to the 'greatest possible accuracy'. It's worth clarifying that the 'greatest possible accuracy' refers to the greatest accuracy with which the *available data* can express a requested period end, NOT absolute accuracy.

Consider what happens when prices are requested for a period ending on a time that can only be expressed by T1 data, although over which data is not available at any interval shorter than T5.

In [29]:
end_alt = xnys.session_close(start_T5) - pd.Timedelta(3, "T")
end_alt = end_alt.astimezone(prices.tz_default)
end_alt

Timestamp('2022-03-14 15:57:00-0400', tz='America/New_York')

In [30]:
prices.get(start=start_T5, end=end_alt)

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 09:30:00, 2022-03-14 09:35:00)",280.250000,282.359985,280.010010,281.869995,1721154.0
"[2022-03-14 09:35:00, 2022-03-14 09:40:00)",281.890015,282.850006,281.040009,281.880005,720010.0
"[2022-03-14 09:40:00, 2022-03-14 09:45:00)",281.829987,284.769989,281.369995,284.290009,808963.0
"[2022-03-14 09:45:00, 2022-03-14 09:50:00)",284.269989,284.329987,282.480011,283.109985,593177.0
"[2022-03-14 09:50:00, 2022-03-14 09:55:00)",283.019989,283.699005,282.470001,283.109985,367264.0
...,...,...,...,...,...
"[2022-03-14 15:30:00, 2022-03-14 15:35:00)",277.940002,277.959991,276.839996,277.100006,381329.0
"[2022-03-14 15:35:00, 2022-03-14 15:40:00)",277.119995,277.269989,276.579987,277.130005,418167.0
"[2022-03-14 15:40:00, 2022-03-14 15:45:00)",277.109985,277.619995,276.940002,277.619995,425234.0
"[2022-03-14 15:45:00, 2022-03-14 15:50:00)",277.630005,278.079987,276.959991,277.019989,581069.0


The right side of the final indice does not align with the requested period end, rather it's 2 minutes earlier, although a `LastIndiceInaccurateError` was not raised, even though the `priority` was the default "end"

An error was not raised because, although the final indice does not exactly represent the requested period end, it does reflect the period end with the 'greatest possible accuracy' given the data that's available (T1 data is not available).

### Relevance of anchor
#### `anchor` "open"

None of the above examples request a specific interval and all are anchored on the default "open".

If the anchor is "open" and an interval is defined then a `LastIndiceInaccuarteError` will never be raised regardless of how well aligned the final indice with the period end (`priority` is irrelevant). The indices will be anchored on the open and evaluated according to the interval. How well the final indice aligns with the period end will be determined by the interval and period end requested.

In the following example 5T is the smallest base interval for which data is available over the requested period.

In [31]:
print(f"{start_T5=}\n{end_alt=}\n")  # for reference

prices.get("10T", start=start_T5, end=end_alt)[-2:]  # only show last 2 rows

start_T5=Timestamp('2022-03-14 00:00:00', freq='C')
end_alt=Timestamp('2022-03-14 15:57:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 15:30:00, 2022-03-14 15:40:00)",277.940002,277.959991,276.579987,277.130005,799496.0
"[2022-03-14 15:40:00, 2022-03-14 15:50:00)",277.109985,278.079987,276.940002,277.019989,1006303.0


The indices are evaluated at 10 minute intervals from the session open such that the final indice falls 7 minutes short of the requested period end.

#### `anchor` "workback"

When prices are anchored "workback" the period end is by definition important. An `LastIndiceInaccurateError` will be raised in the same circumstances as for when the anchor is "open" and no interval is defined.

Consider the previous example as before although with anchor as "workback".

In [32]:
print(f"{start_T5=}\n{end_alt=}\n")  # for reference

# only show last 2 rows
prices.get("10T", start_T5, end_alt, anchor="workback")[-2:]

start_T5=Timestamp('2022-03-14 00:00:00', freq='C')
end_alt=Timestamp('2022-03-14 15:57:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 15:35:00, 2022-03-14 15:45:00)",277.119995,277.619995,276.579987,277.619995,843401.0
"[2022-03-14 15:45:00, 2022-03-14 15:55:00)",277.630005,278.079987,275.820007,276.029999,1405440.0


Recalling that T5 is the shortest interval at which data is available for this period, an LastIndiceInaccurateError is NOT raised regardless that the final indice falls two minutes short of the requested period end (the period end _is_ being represented with the greatest possible accuracy _given the data that's available_).

The same is not true in the following example. Here the period end can be expressed only with T1 data which is available at the period end although not over the full requested period...

In [33]:
end = xnys.session_close(start_T1) - pd.Timedelta(3, "T")
end = end.astimezone(prices.tz_default)  # only for ease of reference
print(f"{start_T5=}\n{end=}")  # for reference

start_T5=Timestamp('2022-03-14 00:00:00', freq='C')
end=Timestamp('2022-04-13 15:57:00-0400', tz='America/New_York')


In [None]:
prices.get("10T", start_T5, end, anchor="workback")

```
---------------------------------------------------------------------------
LastIndiceInaccurateError                 Traceback (most recent call last)
<ipython-input-34-eba36859a212> in <module>
----> 1 prices.get("10T", start_T5, end, anchor="workback")

LastIndiceInaccurateError: Full period available at the following intraday base intervals although these do not allow for representing the end indice with the greatest possible accuracy:
	[<BaseInterval.T5: Timedelta('0 days 00:05:00')>].
The following base intervals could represent the end indice with the greatest possible accuracy although have insufficient data available to cover the full period:
	[<BaseInterval.T1: Timedelta('0 days 00:01:00')>].
The earliest minute from which data is available at 0 days 00:01:00 is 2022-04-13 13:30:00+00:00, although at this base interval the requested period evaluates to (Timestamp('2022-03-14 13:30:00+0000', tz='UTC'), Timestamp('2022-04-13 19:57:00+0000', tz='UTC')).
Period evaluated from parameters: {'minutes': 0, 'hours': 0, 'days': 0, 'weeks': 0, 'months': 0, 'years': 0, 'start': Timestamp('2022-03-14 00:00:00', freq='C'), 'end': Timestamp('2022-04-13 19:57:00+0000', tz='UTC'), 'add_a_row': False}.
Data that can express the period end with the greatest possible accuracy is available from 2022-04-13 13:30:00+00:00. Pass `strict` as False to return prices for this part of the period.
Alternatively, consider passing `priority` as 'period'.
```

The error message explains what's going on. As with the earlier examples with anchor `open` and an inferred interval, here it's possible to EITHER express the period end with the greatest possible accuracy OR return data over the full requested period, but not both.

As before, the options are noted at the end of the error message.

Pass `strict` as False to return prices for only the part of the period for which data is available at a base interval that can express the period end with the greatest possible accuracy.

In [35]:
print(f"{start_T5=}\n{end=}\n")  # for reference

df = prices.get("10T", start_T5, end, anchor="workback", strict=False)
df[-3:]  # only show last three indices

start_T5=Timestamp('2022-03-14 00:00:00', freq='C')
end=Timestamp('2022-04-13 15:57:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-04-13 15:27:00, 2022-04-13 15:37:00)",287.970001,288.579987,287.76001,288.159912,661201.0
"[2022-04-13 15:37:00, 2022-04-13 15:47:00)",288.140106,288.540009,288.059998,288.390015,704986.0
"[2022-04-13 15:47:00, 2022-04-13 15:57:00)",288.380005,288.390015,287.470001,287.804993,1078534.0


And the start of the table...

In [36]:
df[:3]

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-04-13 09:37:00, 2022-04-13 09:47:00)",281.980011,283.859985,281.299988,283.369995,1275130.0
"[2022-04-13 09:47:00, 2022-04-13 09:57:00)",283.35849,283.779999,282.5,282.799988,711069.0
"[2022-04-13 09:57:00, 2022-04-13 10:07:00)",282.829987,283.829987,282.170013,283.813599,588244.0


Notice that the prices started much later than the period start given the unavailability of T1 data over the start of the period.

If it's more important to return prices for the full period than to most accurately represent the period end then `priority` can be passed as "period".

In [37]:
print(f"{start_T5=}\n{end=}\n")  # for reference

prices.get("10T", start_T5, end, anchor="workback", priority="period")

start_T5=Timestamp('2022-03-14 00:00:00', freq='C')
end=Timestamp('2022-04-13 15:57:00-0400', tz='America/New_York')



symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 09:35:00, 2022-03-14 09:45:00)",281.890015,284.769989,281.040009,284.290009,1528973.0
"[2022-03-14 09:45:00, 2022-03-14 09:55:00)",284.269989,284.329987,282.470001,283.109985,960441.0
"[2022-03-14 09:55:00, 2022-03-14 10:05:00)",283.089996,283.929901,282.000000,282.659912,976656.0
"[2022-03-14 10:05:00, 2022-03-14 10:15:00)",282.660004,285.399200,282.500000,285.100006,716046.0
"[2022-03-14 10:15:00, 2022-03-14 10:25:00)",285.079987,285.220001,284.279999,284.690002,620037.0
...,...,...,...,...,...
"[2022-04-13 15:05:00, 2022-04-13 15:15:00)",286.899994,287.916107,286.899994,287.820007,479574.0
"[2022-04-13 15:15:00, 2022-04-13 15:25:00)",287.809998,288.160004,287.670013,287.910004,537914.0
"[2022-04-13 15:25:00, 2022-04-13 15:35:00)",287.920013,288.579987,287.760010,288.440002,575528.0
"[2022-04-13 15:35:00, 2022-04-13 15:45:00)",288.450012,288.559906,288.059998,288.144989,697087.0


Notice that the higher base interval (T5) allows for prices to cover the full requested period, although the final indice now falls short of the requested period end.

Note that a `composite` table is not available if `anchor` is "workback.

In [None]:
prices.get("10T", start_T5, end, anchor="workback", composite=True)

```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-9d5dcd4b474a> in <module>
----> 1 prices.get("10T", start_T5, end, anchor="workback", composite=True)

ValueError: Cannot create a composite table when anchor is 'workback'.
```

Finally, worth noting that as a consequence of all the above, if strict is False then much lesser data may be returned when anchor is "workback" than when anchor is "open"...

In [39]:
prices.get("10T", start_T5, end, anchor="workback", strict=False)[:2]

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-04-13 09:37:00, 2022-04-13 09:47:00)",281.980011,283.859985,281.299988,283.369995,1275130.0
"[2022-04-13 09:47:00, 2022-04-13 09:57:00)",283.35849,283.779999,282.5,282.799988,711069.0


In [40]:
prices.get("10T", start_T5, end, anchor="open", strict=False)[:2]

symbol,MSFT,MSFT,MSFT,MSFT,MSFT
Unnamed: 0_level_1,open,high,low,close,volume
"[2022-03-14 09:30:00, 2022-03-14 09:40:00)",280.25,282.850006,280.01001,281.880005,2441164.0
"[2022-03-14 09:40:00, 2022-03-14 09:50:00)",281.829987,284.769989,281.369995,283.109985,1402140.0
