# What does the `diff_cap_packages.stock_selection` module do?

Primarily, `stock_selection` takes all the low-level functions in `models_` and combines them into high-level functions that we can use for back-testing.

### Functions Explained:
- `make_and_store_linear_models`
- `load_models`
- `get_predictions`
- `get_buy_sell`
- `get_buys_from_buy_sell`
- `get_sells_from_buy_sell`
- `get_buy_sell_analysis`
- `get_predicted_overall_pct_chg`
- `get_actual_overall_pct_chg`



### Helpful Features:
- Progress bars
- Preliminary analysis
    - Was the predicted action correct?
        - How do "holds" fit into this?
    - What was the predicted percentage gain for the day?
    - What was the *actual* percentage gain for the day?

# Imports

In [1]:
from diff_cap_packages import stock_selection

In [2]:
# Used for debugging purposes, don't need to import to use stock_selection
import importlib
importlib.reload(stock_selection)
%config Completer.use_jedi=False
# Used for debugging purposes, don't need to import to use stock_selection

# Usage

## Preparation with models

### `make_and_store_linear_models`
This function takes a pretty significant amount of time to run (just under 5 minutes per date for me). It essentially combines all of the low-level PyCaret code found in the `models_` package to train and save models into the models folder of our repository. Once models have been saved once, however, we don't have to rerun this function; we can just load them in with the `load_models` function below.

The path that models are saved to is `"models/[DATE]-linear-[NUMBER_OF_DAYS]/[STOCKID].pkl"`, but this is not needed to use the function.

**By default: `force_overwrite` is set to False. (Why bother overwriting models that already exist and are saved?) When this is set to False, a `FileExistsError` error will occur if we already have saved models for that day. If you *really* want to, you can set this to True, but it's honestly just a waste of time to retrain hundreds of models.**

- **ADDED FEATURE**: if the folder containing models does not contain a file called `"completed.txt"`, it is not considered to be a fully completed set of models. If you run the function on an "uncompleted" set of models, it will overwrite, regardless of the `force_overwrite` value. `force_overwrite` controls whether a completed set is overwritten. *The completed.txt file is now automatically added after all the models are created and saved.*

In [3]:
num_days_back = 200
end_date = '2021-06-16'
start_date = stock_selection.get_start_date(end_date, num_days_back)

stock_selection.make_and_store_linear_models(start_date, end_date, force_overwrite=False)

100%|██████████| 134/134 [01:05<00:00,  2.03it/s]


### `load_models`
Use this function to load our saved pretrained models. These should ideally already be saved in the GitHub repository under the models folder. The format for the path to a specific model is `"models/[DATE]-linear-[NUMBER_OF_DAYS]/[STOCKID].pkl"`, but you don't need to know this to use the function.

If they aren't saved, you will have to use the `make_and_store_linear_models` function above to train and save them—this takes a pretty long amount of time (anywhere from 5–10 minutes per date).

In [None]:
lrmodels = stock_selection.load_models("2021-06-16")

## Predictions

### `get_predictions`
This function takes the loaded models (`lrmodels`) that are obtained above and the date to predict from. 

**This date should be the same date as above, and it would be wise to just use a variable for the date to avoid issues.**

For example, if the date given is `2021-06-29`, this function will output prediction information for every stock on `2021-06-30`.

In [None]:
predictions = stock_selection.get_predictions(lrmodels, "2021-06-17")

## Buying and short selling: which ones?

### `get_buy_sell`
This returns a `DataFrame` called `buy_sell`, which contains the aggregated information about which stocks the models are predicting to buy and which to short sell for the next day.

- This function has an optional parameter `how_many`, which allows us to adjust how many stocks we want to buy or sell. By default this is set to 5 (5 long, 5 short)
- This function also has another optional parameter called `abs_vol_thresh`, which is set to an arbitrary 0.08. This means that stocks that are predicted to move up or down 8% (±8%) over the next day are filtered out of the ranking. This can be fine tuned, but it seems to be performing well at 0.08.

In [None]:
buy_sell = stock_selection.get_buy_sell(predictions)
buy_sell

### `get_buys_from_buy_sell`
Extracts the names of the stocks that the models say to buy.

In [None]:
stock_selection.get_buys_from_buy_sell(buy_sell)

### `get_sells_from_buy_sell`
Extracts the names of the stocks that the models say to short sell.

In [None]:
stock_selection.get_sells_from_buy_sell(buy_sell)

## Deeper analysis

### `get_buy_sell_analysis`
Returns a copy of the `buy_sell` DataFrame with additional analytical columns: `"proper action"` (what the correct move should have been for a particular stock and `"ACCURATE"`, which indicates whether or not the predicted action was correct.

This function has a toggleable parameter: `lenient_on_holds`. When `True`, a position that did not move at all in reality (a "HOLD") is marked as a win, whereas it is marked as a loss when `False`. Test this one out if it doesn't make sense.

In [None]:
stock_selection.get_buy_sell_analysis(buy_sell, lenient_on_holds=True)

### `get_predicted_overall_pct_chg`
Returns the overall portfolio's predicted daily percent change.

Calculation:
(average of the predicted % gain on the long stocks + the average of the predicted % gain on the short stocks) / 2

In [None]:
stock_selection.get_predicted_overall_pct_chg(buy_sell)

### `get_actual_overall_pct_chg`
Returns the overall portfolio's **actual** daily percent change.

Calculation:
(average of the actual % gain on the long stocks + the average of the actual % gain on the short stocks) / 2

In [None]:
stock_selection.get_actual_overall_pct_chg(buy_sell)