# Example 2: Grandma Valuation

This example demostrates how to use the Grandma Stock Valuation model to valuate a stock (instrument).
* Load Data
* Grandma Stock Valuation - the Basic Idea
* Valuate IVV (SP500) ETF
    * Over-Valued Years
* Visualize the Valuation
* Effect of the `recent_months` Argument
* Valuate a Group of Instruments and Save Outputs
* Limitations

In [1]:
from grandma_stock_valuation import FileLogger, loadPacakgeData, GrandmaStockValuation, batchValuation

# Refer to example_0_FileLogger.ipynb for details of the FileLogger.
logger = FileLogger()
logPrint = logger.logPandas

### Load Data

For this example, I will use the stored package data.

To query data from Yahoo, please refer to *example_1_yahoo_data_loader.ipynb*.

In [2]:
d_instrument_data, d_instrument = loadPacakgeData(verbose=2)

d_instrument

VPL data contains 4273 rows, 4273 dates from 2005-03-10 to 2022-02-28.
IVV data contains 5479 rows, 5479 dates from 2000-05-19 to 2022-02-28.
EEMA data contains 2530 rows, 2530 dates from 2012-02-09 to 2022-02-28.
IEV data contains 5431 rows, 5431 dates from 2000-07-28 to 2022-02-28.


{'IVV': 'SP500',
 'VPL': 'Developed Asia-Pacific',
 'IEV': 'Europe',
 'EEMA': 'Emerging Asia'}

In [3]:
logPrint("Keys of d_instrument_data:", str(d_instrument_data.keys()))

logPrint("IVV (SP500 ETF):", d_instrument_data['IVV'].head())

2022-03-02 09:05:40,326 INFO Keys of d_instrument_data: dict_keys(['VPL', 'IVV', 'EEMA', 'IEV'])
2022-03-02 09:05:40,339 INFO IVV (SP500 ETF): 
        date       open       high        low      close  close_adj   volume
0 2000-05-19  142.65625  142.65625  140.25000  140.68750  94.121216   775500
1 2000-05-22  140.59375  140.59375  136.81250  139.81250  93.535789  1850600
2 2000-05-23  140.21875  140.21875  137.68750  137.68750  92.114151   373900
3 2000-05-24  137.75000  140.06250  136.65625  139.75000  93.494003   400300
4 2000-05-25  140.03125  140.93750  137.87500  138.46875  92.636810    69600



### Grandma Stock Valuation - the Basic Idea

The Grandma Stock Valuation model is designed on a simple but powerful idea: valuate an instrument by comparing its current price against its historical trend.

The model consists of the following steps:
1. Fit a trend line on historical daily prices.
2. Identify extreme prices as outliers.
3. Remove the outliers and re-fit the trend line.
4. Estimate the "fair price" based on the trend line.
5. Valuation based on the actual price, fair price, and historical growth rate.

All these steps have been implemented in the `GrandmaStockValuation` class.



### Valuate IVV (SP500) ETF

Let's start with SP500 as an example.

At initialization, the `GrandmaStockValuation` class took the following arguments:
* `recent_months` (int): Number of recent months, before `date_end`, to exclude from model fitting.<br>You may use it, if you think the recent market is so specail that you want to exclude the recent period from model fitting.<br>I will illustrate its effect in later sections.
* `train_years` (int): Years of historical data, after excluding `recent_months`, for model fitting.<br>10-year is used as default to represent an economic cycle.
* `date_end` (date): Data after this date will not be used.<br>This is for special use-case if you want to re-run the model asof a past date.<br>If None, the model will use the latest date in the input daily price data.
* `verbose` (int): 2 to print detailed information; 1 to print high-level information; 0 to suppress print.
* `printfunc` (function): default to `print`, and I will use `logPandas` here.

The model will first set the last date as specified by `date_end`.<br>
Then from the last date, move backward `recent_months` to exclude from model fitting.<br>
Then move backward `train_years` as the period to fit the model.

In [4]:
grandma = GrandmaStockValuation(recent_months=0, train_years=10, date_end=None, verbose=2, printfunc=logPrint)

Now let's provide the daily price of an ST500 ETF to the `fitTransform()` method, to fit the model and estimate trend.

The `fitTransform()` method takes the following arguments:
* `input_data` (pandas.DataFrame): Daily price data of the insturment.<br>It should contain a `date` column and a price column named by `price_col`.
* `price_col` (str): The column name in `input_data` to indicate daily price.<br>I suggest to use the adjusted price.
* `log` (bool): If True, fit log-linear regression. If False, fit linear regression.
* `n_std` (float): Outliers are identified by as data points with residual outside `mean ± n_std * std`.

Selection of arguments:
* `log=True` fits log-linear regression, which aligns with common industry practice of "% growth over a fixed period of time".<br>On the other hand, there is no strong mathematical basis, and linear regression can sometimes fit better. Feel free to try it out by yourself.
* `n_std=1.5`: lower `n_std` value will consider more prices as extreme values, which are excluded from model fitting.<br>The default value, 1.5, is a common practice without strong mathematical basis. Feel free to try other values.

`fitTransform()` method will update the `_df_train` and `_df_recent` attributes, and return the class object with updated attributes.

In [5]:
grandma.fitTransform(d_instrument_data['IVV'], price_col='close_adj', log=True, n_std=1.5)

# let's take a look at _df_train, though it is designed for internal usage only - not to be exposed to users.
grandma._df_train.head()

2022-03-02 09:05:40,492 INFO Train data contains 2516 rows over 2516 dates from 2012-03-01 to 2022-02-28.
2022-03-02 09:05:40,495 INFO No recent data specified.
2022-03-02 09:05:40,496 INFO Fit regression...
2022-03-02 09:05:40,510 INFO 245 out of 2516 dates are outliers.
2022-03-02 09:05:40,511 INFO Re-fit wihtout outliers...
2022-03-02 09:05:40,520 INFO No recent data to estimate.
2022-03-02 09:05:40,521 INFO done!


Unnamed: 0,date,price,x,trend,is_outlier,is_recent
0,2012-03-01,114.176628,0,116.765871,False,False
1,2012-03-02,113.879234,1,116.824413,False,False
2,2012-03-05,113.408363,2,116.882986,False,False
3,2012-03-06,111.657005,3,116.941587,False,False
4,2012-03-07,112.491371,4,117.000218,False,False


Let's valuate SP500!

The `evaluateValuation()` method takes only one argument:
* `min_annual_return` (float): Minimum annual return required to calculate over-valued years. I will explain this below.

`evaluateValuation()` returns a dictionary with the following keys:
* `r2_train`: R2 of the fitted model - the higher, the better.<br>You may want to manually remove instruments with low R2 (i.e., less than 0.5), since the model cannot "fit well" on their price data. This is up to your decision.
* `train_years`: number of years actually used to fit the model.<br>Althrough you have specified `train_years=10` at initialization, but the instrument may not have 10-year data, especailly for newly listed instruments.
* `annualized_return`: this annualized return is derived from the fitted trend line (linear or log-linear), not the acutal price data.<br>As a result, recent fluctuations in price won't have big effect on this result.
* `current_price`: the latest actual price in the input data.
* `fair_price`: the latest estimated price, based on the fitted trend line.
* `over_value_range`: `(current_price / fair_price) - 1`
* `over_value_years`: see explaination below.

#### Over-Valued Years

"Over-valued years" is an important mechanism to **consider growth in valuation**.
* For example, if we have two instruments X and Y, both over-valued at 10%
* X had annualized growth of 10%, while Y had annualized growth of only 1%.
    * If the price does not change, it will take X 1-year to become not-over-valued, but it will take Y 10-year.
* As a result, X is over-valued by 1-year, while Y is over-valued by 10-year.


For over-valued instruments (`over_value_range>0`), `over_value_years = over_value_range / annualized_return`.

For under-valued instruments (`over_value_range<0`), `over_value_years = over_value_range * annualized_return * 100`.<br>This fomulation has the following considerations:
* Firstly, "division" changes to "multiplication" to handle the sign reversion of `over_value_range`.
* "multiplication" also makes instruements with higher annualized return more under-valued.
* Let's see several examples:

```
    over_value_range = 5%,  annualized_return = 10% ==> over_value_years = 0.5
    over_value_range = -5%, annualized_return = 10% ==> over_value_years = -0.5
    over_value_range = -5%, annualized_return = 20% ==> over_value_years = -1.0
```

Note that the math behind `over_value_years` leads to several **limitations**:
* It cannot handle instruments with negative annualized growth.
* It can only handle instruments with sufficent positive annualized growth, as specified by `min_annual_return` (default to 1%)

For instruments which cannot be handled, the valuation will still be estimated, with `over_value_years` as `nan`.

In [6]:
d_metrics = grandma.evaluateValuation(min_annual_return=0.01)

d_metrics

2022-03-02 09:05:40,622 INFO R2 train = 0.971, train years = 10.0, annualize return = 0.134.
2022-03-02 09:05:40,623 INFO current price = 4.39e+02, fair price = 4.12e+02, over-value range = 0.0651, over-value years = 0.485.


{'r2_train': 0.9710536943366812,
 'train_years': 10.002739726027396,
 'annualized_return': 0.1343143168141876,
 'current_price': 438.720001,
 'fair_price': 411.90779633312866,
 'over_value_range': 0.06509273411563954,
 'over_value_years': 0.4846299014102107}

Let's take a closer look at the result:
* `R2` is 0.97, which is very good.
* `train_years` is 10: the model was fitted with 10 years' data.
* `annualized_return` of SP500 was estimated as 13.4%
    * By using adjusted price, this result also takes dividend into consideration.
* As of 2022-02-28, SP500 was **over-valued by 6.5%, or 0.48 years**.

### Visualize the Valuation

Of course we want visualization! The `plotTrendline()` method makes it handly for you.
* It takes a `title` argument, which is the text to display as title in your chart.
* Additional key-word arguments will be passed to plotly's `update_layout` function.

The chart is very straight-forward:
* One line of the actual daily prices.
* The outliers are highlighted in red, which are extreme values not fitted by the model.
* One fitted trend line.

In [None]:
fig = grandma.plotTrendline(title="Grandma's View of SP500", width=900, height=300)

# You may need to run the following to display the chart in notebook.
import plotly.io as pio
pio.renderers.default = "notebook_connected"

fig.show()

![](./images/example_2_SP500.png)

### Effect of the `recent_months` Argument

Here I want to illustrate how the `recent_months` argument works.

In the following example, I will set `recent_months=24`, to exclude the past 2-year from model fitting. You may want to do so, because in this case, the past 2-year was the Covid-19 period.

On the other hand, I do not recommand this practice - it is better to rely on the model's outlier identification capability to handle extreme prices.

In [8]:
grandma = GrandmaStockValuation(recent_months=24, train_years=10, date_end=None, verbose=2, printfunc=logPrint)
grandma.fitTransform(d_instrument_data['IVV'], price_col='close_adj', log=True, n_std=1.5)
d_metrics = grandma.evaluateValuation(min_annual_return=0.01)

d_metrics

2022-03-02 09:05:58,634 INFO Train data contains 2518 rows over 2518 dates from 2010-03-01 to 2020-02-28.
2022-03-02 09:05:58,639 INFO Recent data contains 504 rows over 504 dates from 2020-03-02 to 2022-02-28.
2022-03-02 09:05:58,640 INFO Fit regression...
2022-03-02 09:05:58,648 INFO 361 out of 2518 dates are outliers.
2022-03-02 09:05:58,649 INFO Re-fit wihtout outliers...
2022-03-02 09:05:58,656 INFO Extend trend to recent data.
2022-03-02 09:05:58,660 INFO done!
2022-03-02 09:05:58,670 INFO R2 train = 0.986, train years = 10.0, annualize return = 0.133.
2022-03-02 09:05:58,672 INFO current price = 4.39e+02, fair price = 4.02e+02, over-value range = 0.0903, over-value years = 0.68.


{'r2_train': 0.9860325013120617,
 'train_years': 10.002739726027396,
 'annualized_return': 0.13290449074031763,
 'current_price': 438.720001,
 'fair_price': 402.377694833628,
 'over_value_range': 0.09031888852934178,
 'over_value_years': 0.6795774019842268}

In [None]:
fig = grandma.plotTrendline(title="Grandma's View of SP500", width=900, height=300)

fig.show()

![](./images/example_2_SP500_24m_recent.png)

As you see, the visualization displays the recent period specified in a dedicated color.

You can compare the valuation to the previous results, and see how they differentiate.

### Valuate a Group of Instruments and Save Outputs

Now you know how to valuate one instrument - it is time to scale up.

The `batchValuation()` function has been provided to you, so you don't need to write for-loop by yourself.

`batchValuation()` takes the following arguments:
* `d_instrument_data` (dict): a dictionary containing the daily price of a group of instruments. You can refer to the loaded package data at the beginning of this notebook.
* `init_parameters` (dict): parameters passed to initiate `GrandmaStockValuation` class.
* `fit_parameters` (dict): parameters passed to `GrandmaStockValuation.fitTransform()`
* `valuate_parameters` (dict): parameters passed to `GrandmaStockValuation.evaluateValuation()`.
* `save_result` (bool): if True, save the valuation metrics and figures to files.
* `metric_file` (str): file to store the valuation metrics.<br>If None, save to the default location "_output/valuation_metrics_<today>.csv".
* `figure_folder` (str): folder to store the price charts of each instruments.<br>If `None`, save to the default folder "_output/images/"
* `verbose` (int): 2 to print detailed information; 1 to print high-level information; 0 to suppress print.
* `printfunc` (function): function to output messages.
* Additional key-word arguments will be passed to `GrandmaStockValuation.plotTrendline()`.

With `save_result=True`, the function will create the default folders if they do not exist, 

`batchValuation()` will return a dataframe with the valuations, and a dictionary of the visualizations (price charts with fitted trend line).


In [10]:
df_metrics, d_fig = batchValuation(
    d_instrument_data,
    init_parameters={'recent_months':0, 'train_years':10, 'date_end':None},
    fit_parameters={'price_col':'close_adj', 'log':True, 'n_std':1.5},
    valuate_parameters={'min_annual_return':0.01},
    save_result=True,
    metric_file = None,
    figure_folder = None,
    verbose=0,
    printfunc=logPrint,
    width=900, height=300
)

df_metrics

Unnamed: 0,ticker,r2_train,train_years,annualized_return,current_price,fair_price,over_value_range,over_value_years
0,VPL,0.890218,10.00274,0.067348,74.18,77.196246,-0.039072,-0.263147
1,IVV,0.971054,10.00274,0.134314,438.720001,411.907796,0.065093,0.48463
2,EEMA,0.847897,10.00274,0.065518,77.550003,79.470586,-0.024167,-0.158339
3,IEV,0.780975,10.00274,0.052644,50.130001,49.242054,0.018032,0.34253


In [11]:
# Recall what are the ETF tickers
d_instrument

{'IVV': 'SP500',
 'VPL': 'Developed Asia-Pacific',
 'IEV': 'Europe',
 'EEMA': 'Emerging Asia'}

What an interesting table!
* Let's look at `annualized_return`: SP500 (IVV) had highlight 13.4% annualized growth, followed by asia-pacific (VPL and EEMA), which were both around 6.6%, then Europe (IEV) around 5.3%.
* The `over_value_range` and `over_value_years` columns tell us, as of 2022-02-28
    * Asia-pacific region (VPL and EEMA) were under-valued.
    * SP500 (IVV) was more over-valued then Europe (IEV) by range (6.5% vs. 1.8%), but given the strong historical growth of SP500, it was over-valued at about the same level by years (0.48 vs. 0.34).

<br>

Let's exam one of the price chart:


In [None]:
d_fig['EEMA']

![](./images/example_2_EEMA.png)

Now go to check out the valuation metrics and price charts save in the "_output/" folder.

### Limitations

Firstly, the Grandma Stock Valuation model is most suitable to **broad ETF** (national / region ETF).
* You may also try it on major industry sector ETF.
* It is not recommanded for sub-sector ETF, individual stock, or any product involving derivatives.

In addition, formulation of `over_value_years` leas to the following limitations:
* It cannot handle instruments with negative annualized growth.
* It can only handle instruments with sufficent positive annualized growth, as specified by `min_annual_return` (default to 1%)

<br>


**WELL DONE!**

Now you can valuate a group of instruments in a quatitative manner. The next example will show you how to build a **managable investment portfolio**, based on the valuations!