# Prices tutorial

#### Sections
* [Prices classes](#Prices-classes)
  * [`symbols`](#Symbols)
  * [Calendars](#Calendars)
    * [`lead_symbol`](#lead_symbol)
    * [`calendars`](#calendars)
* [`PricesYahoo`](#PricesYahoo)
  * [`delays`](#delays)
  * [`adj_close`](#adj_close)
  * [`PricesYahoo` doc](#PricesYahoo-doc)
* [`PricesCsv`](#PricesCsv)
  * [`PricesCsv` doc](#PricesCsv-doc)

#### Notes
* The cell **outputs** shown in this tutorial are based on executing the cells at **2024-01-30 21:11 UTC**. Simply rerun the cells to bring any dynamic output up to date.

### Setup
Execute the following cell to import tutorial dependencies.

In [1]:
from zoneinfo import ZoneInfo

## Prices classes

`market_prices` uses Prices classes to access price data. There are currently two classes included with the library:
* `PricesYahoo` provides for getting live and historic price data 'out-the-box' from Yahoo APIs (see [disclaimers](../../README.md) section of README.md) via the `yahooquery` library.
* `PricesCsv` gets price data from locally saved .csv files containing historic price data.

In [2]:
from market_prices import PricesYahoo
from market_prices import PricesCsv

The `symbols`, `calendars` and `lead_symbol` parameters are common to both classes...

### `symbols`

The `symbols` parameter defines the instruments for which price data is required. The symbols can be defined as a list of strings or a single comma-delimited or space-delimited string. For the `PricesYahoo` class `symbols` is the only required arugment.

In [3]:
symbols = "MSFT, GOOG"
prices = PricesYahoo(symbols)
prices.symbols

['MSFT', 'GOOG']

Using `PricesYahoo` prices are available for any symbol for which price data is available on [yahoo finance](https://uk.finance.yahoo.com/), for example...

In [4]:
# Note: this cell might take a little while to execute (10 seconds maybe)
# given the number of different exchange calendars that will be created.
symbols = [
    "MSFT",  # us stock
    "AZN.L",  # uk stock
    "9988.HK",  # hong kong stock
    "PETR3.SA",  # brazilan stock
    "^FTSE",  # equity index
    "ES=F",  # futures
    "CL=F",  # oil
    "GC=F",  # gold
    "GBPEUR=X",  # currency pair
    "BTC-USD",  # crypto
]
prices = PricesYahoo(symbols, lead_symbol="9988.HK")

In [5]:
# last 30 mins of data at 10min intervals
prices.get("10min", minutes=30, tzout=ZoneInfo("UTC"))

symbol,9988.HK,9988.HK,9988.HK,9988.HK,9988.HK,AZN.L,AZN.L,AZN.L,AZN.L,AZN.L,...,PETR3.SA,PETR3.SA,PETR3.SA,PETR3.SA,PETR3.SA,^FTSE,^FTSE,^FTSE,^FTSE,^FTSE
Unnamed: 0_level_1,close,high,low,open,volume,close,high,low,open,volume,...,close,high,low,open,volume,close,high,low,open,volume
"[2024-01-30 07:30:00, 2024-01-30 07:40:00)",70.949997,71.25,70.900002,71.199997,976002.0,,,,,,...,,,,,,,,,,
"[2024-01-30 07:40:00, 2024-01-30 07:50:00)",71.099998,71.150002,70.849998,70.900002,2794300.0,,,,,,...,,,,,,,,,,
"[2024-01-30 07:50:00, 2024-01-30 08:00:00)",71.150002,71.150002,70.599998,71.099998,2218000.0,,,,,,...,,,,,,,,,,


Note that prices will show as not available for any instrument not trading over an index interval.

### Calendars

In order to evaluate the period over which data has been requested, `market_prices` requires that each symbol is associated with an exchange calendar of the library `exchange_calendars`. (See the [exchange_calendars](https://github.com/gerrymanoim/exchange_calendars) library for tutorials dedicated to exchange calendars.)

`PricesYahoo` will by default attempt to match each symbol with the exchange calendar that best reflects the symbol's trading times.

In [6]:
prices.calendars

{'MSFT': <exchange_calendars.exchange_calendar_xnys.XNYSExchangeCalendar at 0x261c95f01c0>,
 'PETR3.SA': <exchange_calendars.exchange_calendar_bvmf.BVMFExchangeCalendar at 0x261c854de20>,
 'BTC-USD': <exchange_calendars.always_open.AlwaysOpenCalendar at 0x261c854dd30>,
 '^FTSE': <exchange_calendars.exchange_calendar_xlon.XLONExchangeCalendar at 0x261ca915880>,
 'GC=F': <exchange_calendars.exchange_calendar_cmes.CMESExchangeCalendar at 0x261c95fdd90>,
 '9988.HK': <exchange_calendars.exchange_calendar_xhkg.XHKGExchangeCalendar at 0x261ca749e80>,
 'ES=F': <exchange_calendars.exchange_calendar_cmes.CMESExchangeCalendar at 0x261c95fdd90>,
 'AZN.L': <exchange_calendars.exchange_calendar_xlon.XLONExchangeCalendar at 0x261ca915880>,
 'CL=F': <exchange_calendars.us_futures_calendar.QuantopianUSFuturesCalendar at 0x261c85ea040>,
 'GBPEUR=X': <exchange_calendars.weekday_calendar.WeekdayCalendar at 0x261c95f0e50>}

By default, requested periods are evaluated against the most common calendar.

#### `lead_symbol`

Alternatively, the default calendar can be set by passing the `lead_symbol` option to the Prices class. This will set the default calendar to the calendar associated with the passed symbol. In the example above `lead_symbol` was passed as '9988.HK', which is the symbol for Alibaba's Hong Kong listing.

In [7]:
prices.calendar_default

<exchange_calendars.exchange_calendar_xhkg.XHKGExchangeCalendar at 0x261ca749e80>

The period over which prices were returned was therefore evaluated as the last 30 minutes over which the Hong Kong exchange was open.

Note: the lead symbol can be overriden for any particular call to `get`. The following example has the same arguments as earlier although passes `lead_symbol` as the symbol for Bitcoin. Bitcoin trades 24/7, hence the returned prices reflect the 30 minutes to the end of the current 'live indice' (times are UTC).

In [8]:
df = prices.get("10min", minutes=30, tzout=ZoneInfo("UTC"), lead_symbol='BTC-USD')
df

symbol,9988.HK,9988.HK,9988.HK,9988.HK,9988.HK,AZN.L,AZN.L,AZN.L,AZN.L,AZN.L,...,PETR3.SA,PETR3.SA,PETR3.SA,PETR3.SA,PETR3.SA,^FTSE,^FTSE,^FTSE,^FTSE,^FTSE
Unnamed: 0_level_1,close,high,low,open,volume,close,high,low,open,volume,...,close,high,low,open,volume,close,high,low,open,volume
"[2024-01-30 20:50:00, 2024-01-30 21:00:00)",,,,,,,,,,,...,,,,,,,,,,
"[2024-01-30 21:00:00, 2024-01-30 21:10:00)",,,,,,,,,,,...,,,,,,,,,,
"[2024-01-30 21:10:00, 2024-01-30 21:20:00)",,,,,,,,,,,...,,,,,,,,,,


In [9]:
df["BTC-USD"]

Unnamed: 0,close,high,low,open,volume
"[2024-01-30 20:50:00, 2024-01-30 21:00:00)",43641.152344,43651.898438,43596.34375,43617.265625,101537792.0
"[2024-01-30 21:00:00, 2024-01-30 21:10:00)",43522.578125,43604.261719,43522.578125,43571.070312,53286912.0
"[2024-01-30 21:10:00, 2024-01-30 21:20:00)",43500.671875,43500.671875,43500.671875,43500.671875,11606016.0


(See the [periods tutorial](./periods.ipynp) tutorial for an explanation of how the requested period is evaluated.)

#### `calendars`

A `CalendarError` is raised if `PricesYahoo` is unable to match a symbol with a calendar. In this case it's necessary to manually assign calendars to those symbols for which a calendar cannot be ascertained. This is done by passing the `calendars` option. The `calendars` option can also be passed to override, for any symbol(s), the default calendar(s) that would otherwise be assigned.

For the `PricesCsv` class `calendars` is a required argument in order that each symbol can be mapped to a corresponding calendar.

`calendars` can take an `ExchangeCalendar`, a `list` or a `dict`, as described by the class documentation (see final cell of the `PricesYahoo` and `PricesCsv` sections of this tutorial).

## `PricesYahoo`

`PricesYahoo` has a couple of further optional arguments...

### `delays`

If a prices class can request 'live' prices then `market_prices` also requires knowledge of any real-time delay in the price data (this is in order to evaluate periods to 'now').

By default, `PricesYahoo` attempts to evaluate the delay for each symbol via fields made available by the Yahoo API and some hardcoded mappings. A `ValueError` is raised if a delay cannot be ascertained for a symbol. In this case it's necessary to manually assign a delay via the `delays` kwarg. `delays` can also be passed to override the default delay that would otherwise be assigned to a specific symbol or symbols.

(NB an inaccurately evaluated delay can have the effect that the latest real-time prices 'stick' rather than update on further requests, or that data is unavailable at the most recent time for which it would be expected to be available.)

`delays` can take an `int`, a `list` or a `dict`, as described by the `PricesYahoo` class documentation.

### `adj_close`

The `adj_close` argument simply provides for setting daily close prices to the 'adjusted close' price available via the Yahoo API. By default (False) close prices represent the non-adjusted close.

### `PricesYahoo` doc
See the `PricesYahoo` class doc for full documentation of all class parameters...

In [None]:
# or PricesYahoo?
help(PricesYahoo)

## `PricesCsv`

The `PricesCsv` class provides for retrieving and serving prices data stored in locally saved .csv files.

The following example uses example .csv files that are copies of files used for testing. The raised warnings offer examples of the types of advices offered when there are issues with the .csv files. (Note: this example requries that the .csv files are located in a directory named 'resources' located in the same folder as this tutorial file. The .csv files can be found [here](https://github.com/maread99/market_prices/tree/master/docs/tutorials/resources).)

In [11]:
import pathlib, os
path = pathlib.Path(os.path.abspath('')) / "resources"
assert path.is_dir()

In [None]:
calendars = {
    "MSFT": "XNYS",
    "AZN.L": "XLON",
    "9988.HK": "XHKG",
}

prices = PricesCsv(
    path,
    symbols=list(calendars.keys()),
    calendars=calendars,
    lead_symbol = "AZN.L",
)

```
PricesCsvParsingConsolidatedWarning: Price data has been found for all symbols at a least one interval, however, you may find that not all the expected price data is available. See the `limits` property for available base intervals and the limits between which price data is available at each of these intervals. See the `csv_paths` property for paths to all csv files that were found for the requested symbols. See the 'path' parameter and 'Notes' sections of help(PricesCsv) for advices on how csv files should be named and formatted and for use of the `read_csv_kwargs` parameter.

The following errors and/or warnings occurred during parsing:

0) Unable to create dataframe from csv file at 'f_AZN.L_H1_fails_on_vol_dtype.csv' due to the following error:
	<class 'market_prices.prices.csv.CsvVolDtypeError'> 'volume' column will not convert to 'float64' dtype.
The source error's message was:
	<class 'ValueError'>: could not convert string to float: 'not a volume'

1) Unable to create dataframe from csv file at 'f_AZN.L_H1_fails_on_vol_dtype.csv' due to the following error:
	<class 'market_prices.prices.csv.CsvIntervalError'> Date indices do not reflect the expected interval.

2) Unable to create dataframe from csv file at 'f_MSFT_H1_fails_on_no_data.csv' due to the following error:
	<class 'market_prices.prices.csv.CsvDataframeEmptyError'> No price data parsed from csv file.

3) Unable to create dataframe from csv file at 'f_MSFT_H1_fails_on_no_data.csv' due to the following error:
	<class 'market_prices.prices.csv.CsvIntervalError'> Date indices do not reflect the expected interval.

4) Prices are not available at base interval 1:00:00 as data was not found at this interval for symbols '['AZN.L', 'MSFT']'.

5) For symbol 'MSFT' at base interval 0:05:00 the csv file included the following indices that are not aligned with the evaluated index and: have therefore been ignored:
DatetimeIndex(['2022-04-18 16:02:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
  warnings.warn(
```

In addition to the parameters passed in the example above:
- `read_csv_kwargs` provides for defining arguments to pass to `pandas.read_csv` in order to customise parsing of data from the .csv files.
- `ohlc_thres` defines a threshold that determines the degree of incongruent data that will be accepted in a .csv file.
- `pm_subsession_origin` declares how price data for exchanges with breaks is indexed.
- `verbose` provides for including underlying error messages with any warnings (default False).

Note that as the `PricesCsv` does not provide for live prices, there is no `delays` argument.

### `PricesCsv` doc
See the `PricesCsv` class doc for full documentation of all parameters together with **how .csv files should be arranged and named...**

In [None]:
# or PricesCsv?
help(PricesCsv)

```
Help on class PricesCsv in module market_prices.prices.csv:

class PricesCsv(market_prices.prices.base.PricesBase)
 |  PricesCsv(path: 'Annotated[Union[str, Path], Coerce(Path)]', symbols: 'Union[str, list[str]]', calendars: 'mptypes.Calendars', lead_symbol: 'Optional[str]' = None, read_csv_kwargs: 'Optional[dict[str, Any]]' = None, ohlc_thres: 'float' = 0.08, pm_subsession_origin: "Literal['open', 'break_end']" = 'open', verbose: 'bool' = False)
 |  
 |  Retrieve and serve historic price data sourced from local csv files.
 |  
 |  Parameters
 |  ----------
 |  path : str | pathlib.Path
 |      Path to directory containing .csv files and/or a hierarchy of
 |      subdirectories containing .csv files. Files and folders should
 |      conform with requirements detailed here and to the 'Notes' section.
 |  
 |      The constructor will search for .csv files in this directory and
 |      all directories under it. All files without the .csv extension will
 |      be ignored.
 |  
 |      Each csv file must contain data for a single symbol and for a
 |      single interval. The symbol and interval should be included within
 |      the filename and separated from each other and/or any other parts
 |      of the filename with a '_' separator. The following are examples
 |      of valid filenames:
 |          MSFT_5T.csv
 |          5T_MSFT.csv
 |          whatever_MSFT_5T.csv
 |          MSFT_5T_whatever.csv
 |          whatever_MSFT_5T_whatever.csv
 |          whatever_MSFT_whatever_5T_whatever.csv
 |          whatever_whatever_5T_whatever_MSFT_whatever.csv
 |  
 |      The interval part expresses the duration of the period corresonding
 |      with each row of data. The interval comprises two parts, a unit and
 |      a value. Valid units are:
 |          MIN - to describe mintues
 |          T - to describe mintues
 |          H - to describe hours
 |          D - to describe hours
 |      Units are not case-sensitive, for example T, t, Min, MIN and mIN
 |      are all valid units.
 |  
 |      The interval value defines the mulitple of units, for example '5T'
 |      defines the interval as 5 minutes. If the value is omitted then it
 |      will be assumed as 1, for example 'MSFT_T.csv' will be assumed to
 |      contain 1 minute data for MSFT.
 |  
 |      The value for daily data cannot be higher than 1, i.e. there is no
 |      support for weekly or monthly data. (Whilst the `get` method
 |      supports requests for weekly or monthly data, the request is
 |      fulfilled by resampling daily data.)
 |  
 |      The interval can optionally be omitted for daily data, for example
 |      all of 'MSFT.csv', 'MSFT_1D' and 'MSFT_D' will be assumed as
 |      containing daily data for MSFT.
 |  
 |      The following are all examples of valid filenames:
 |          5T_MSFT.csv
 |          other_5t_MSFT_123.csv
 |          MSFT_t_231116 MSFT.csv (assumed as minute data)
 |          MSFT_something_else.csv (assumed as daily data)
 |  
 |      Any files containing malformed intervals will be ignored.
 |  
 |      The following are examples of invalid filenames that will result in
 |      the file being ignored:
 |          MSFT_p5T_else.csv (malformed interval)
 |          MSFT_5T_15T,csv (ambiguous interval)
 |          MSFT_5T_TSLA.csv (two symbols)
 |          MSFT_2D.csv (if interval unit is day then value cannot be
 |              greater than one)
 |          MSFT.txt (not a .csv file)
 |  
 |      The `csv_paths` property shows all the csv files that have been
 |      included, by symbol by interval.
 |  
 |  symbols : str | list[str]
 |      Symbols for which require price data. For example:
 |          'AMZN'
 |          'FB AAPL AMZN NFLX GOOG MSFT'
 |          ['FB', 'AAPL', 'AMZN']
 |  
 |  calendars :
 |      mptypes.Calendar |
 |      list[myptypes.Calendar] |
 |      dict[str, mytypes.Calendar]
 |  
 |      Calendar(s) defining trading times and timezones for `symbols`.
 |  
 |      A single calendar representing all `symbols` can be passed as
 |      an mptype.Calendar, specifically any of:
 |          Instance of a subclass of
 |          `exchange_calendars.ExchangeCalendar`. Calendar 'side' must
 |          be "left".
 |  
 |          `str` of ISO Code of an exchange for which the
 |          `exchange_calendars` package maintains a calendar. See
 |          https://github.com/gerrymanoim/exchange_calendars#calendars
 |          or call market_prices.get_exchange_info`. For example:
 |              calendars="XLON",
 |  
 |          `str` of any other calendar name supported by
 |          `exchange_calendars`, as returned by
 |          `exchange_calendars.get_calendar_names`
 |  
 |      Multiple calendars, each representing one or more symbols, can
 |      be passed as any of:
 |          List of mptypes.Calendar (i.e. defined as for a single
 |          calendar). List should have same length as `symbols` with each
 |          element relating to the symbol at the corresponding index.
 |  
 |          Dictionary mapping each symbol with a calendar.
 |              key: str
 |                  symbol.
 |              value: mptypes.Calendar (i.e. as for a single calendar)
 |                  Calendar corresponding with symbol.
 |  
 |              For example:
 |                  calendars = {"MSFT": "XNYS", "AZN.L": "XLON"}
 |  
 |      Each Calendar should have a first session no later than the first
 |      session from which prices are available for any symbol
 |      corresponding with that calendar.
 |  
 |  lead_symbol : str
 |      Symbol with calendar that should be used as the default calendar to
 |      evaluate period from period parameters. If not passed default
 |      calendar will be defined as the most common calendar (and if there
 |      is no single most common calendar then the calendar associated
 |      with the first symbol passed that's associated with one of the most
 |      common calendars).
 |  
 |  read_csv_kwargs : Optional[dict[str, Any]]
 |      Keyword argumnets to pass to `pandas.read_csv` to parse a csv file
 |      to a pandas DataFrame. See the 'Notes' section for how a csv file
 |      can be formatted such that it parses under the default
 |      implementation.
 |  
 |      market_prices requires that the DataFrame parses with:
 |          index as a `pd.DatetimeIndex` named 'date'.
 |  
 |          columns labelled 'open', 'high', 'low', 'close' and optionally
 |          'volume', each with dtype "float64".
 |  
 |      If the following kwargs are not included to `read_csv_kwargs` then
 |      by default they will be passed to `pandas.read_csv` with the
 |      following values:
 |          "header": 0,
 |          "usecols": lambda x: x.lower() in [
 |              "date", "open", "high", "low", "close", "volume"
 |          ],
 |          "index_col": "date",
 |          "parse_dates": ["date"],
 |  
 |      See help(pandas.read_csv) for all available kwargs.
 |  
 |      Note that the following arguments will always be passed by
 |      market_prices to `pandas.read_csv` with the following values (these
 |      values cannot be overriden by `read_csv_kwargs`):
 |          "filepath_or_buffer": <csv file path>
 |          "dtype": {
 |              'open': "float64",
 |              'high': "float64",
 |              'low': "float64",
 |              'close': "float64",
 |          }
 |  
 |      EXAMPLE USAGE
 |      If in the csv files the:
 |          date column is labelled 'timestamp'
 |          close column is labelled 'price'
 |          volume column is labelled 'vol'
 |      Then the `names` kwarg can be used to override the labels that
 |      would otherwise be assigned to each column. If the columns in the
 |      csv file were ordered 'timestamp', 'price', 'low', 'high', 'open',
 |      'vol' then `read_csv_kwargs` could be passed as:
 |          read_csv_kwargs = {
 |              "names": ['date', 'close', 'low', 'high', 'open', 'volume'],
 |          }
 |      This would override the names as defined in the csv file's first
 |      row with the required values. Note that all references to column
 |      names in other kwargs, such as 'usecols' and 'dtype', will now
 |      refer to the overridden names (as required), not the names as
 |      defined in the csv files.
 |  
 |  ohlc_thres : float, default: 0.08
 |      Threshold to reject incongruent ohlc data, in terms of maximum
 |      percentage of incongrument rows to permit. For example, pass as 0.1
 |      to reject data if more than 10% of rows exhibit incongruent data.
 |  
 |      If the number of incongruent rows are below the threshold then
 |      adjustements will be made to force congruence.
 |  
 |      A row of data will be considered incongruent if any of:
 |          close is higher than high
 |              within threshold, high will be forced to close
 |          close is lower than low
 |              within threshold, low will be forced to close
 |          open is lower than low
 |              within threshold, open will be forced to low
 |          open is higher than high
 |              within threshold, open will be forced to high
 |  
 |      Note: Data will always be rejected if any row has a high value
 |      lower than the low value. No provision is made to permit this
 |      circumstance.
 |  
 |  pm_subsession_origin : Literal["open", "break_end"], default: "open"
 |      How to evaluate indices of sessions that include a break. (The
 |      'Notes' covers how, in order to offer a complete data set, prices
 |      are reindexed against an index evaluated in accordance with the
 |      corresponding calendar. This parameter determines the basis on
 |      which that index is evaluated for sessions that have a break.)
 |  
 |      'open' - evaluate all indices for a session based on the session
 |      open. If the session open and pm subsession open are not aligned
 |      then indices will be included through the break (i.e. treat as if
 |      the session did not have a break).
 |  
 |      'break_end' - evaluate indices for the am subsession based on the
 |      session open and indices for the pm subsession based on the
 |      pm subsession open (i.e. based on the break end). No indices will
 |      be included that would fall during the break.
 |  
 |  verbose : bool, default: False
 |      Within error and warning messages concerning the parsing of csv
 |      files, include the full traceback of any underlying errors.
 |  
 |  Notes
 |  -----
 |  By default csv files should have headers in the first line that include
 |  'date', 'open', 'high', 'low', 'close' and optionally 'volume'. Each
 |  further line should represent a single period starting on the value in
 |  'date' column and lasting a period corresponding with the interval
 |  declared in the filename. It is NOT necessary for every period to be
 |  represented (it's common for data sources to exclude intraday data for
 |  periods during which a symbol did not register a trade). The price data
 |  will be reindexed against expected indices as evaluated from the
 |  corresponding calendar of `calendars`.
 |  
 |  For daily price data values in the 'date' column should represent a
 |  date, for example '2023-11-16'.
 |  
 |  For intraday price data the values in the 'date' column should express
 |  either:
 |      a date and UTC time, for example '2023-11-16 15:30' or
 |      '2023-11-16 15:30:00'
 |  
 |      a date, local time and GMT offset for that local time, for example
 |      '2023-11-16 12:53:00-04:00'
 |  
 |  If the csv does not confrom with the above then the `read_csv_kwargs`
 |  parameter can be passed to define paramters to pass to
 |  `pandas.read_csv_kwargs` in order to parse the file as required.
 |  
 |  -- Alignment of Intraday 'date' values --
 |  Intraday 'date' values can define the time of a (sub)session open
 |  (according to the corresopnding calendar) and any time thereafter which
 |  is aligned with the interval, based on the (sub)session open, and which
 |  falls before the corresonding (sub)session close. See the
 |  `pm_subsession_origin` parameter for how to determine how indices are
 |  evaluated for sessions that include a break.
 |  
 |  Examples
 |  If a session opens at 10:00 and the interval is 15T then
 |  '2023-11-16 10:00' is a valid value and so is '2023-11-16 10:15',
 |  although '2023-11-16 10:10' is not as it does not align with the
 |  declared interval. All unaligned indices will be ignored.
 |  
 |  If the same session closes at 17:00 then the latest valid value for
 |  that session will be '2023-11-16 16:45'. '2023-11-16 17:00' is not
 |  valid as it would represent a period outside of trading hours, i.e. the
 |  15 minutes following the session close. All values that lie outside of
 |  regular trading hours will be ignored.
 ```