Skip to content

Commit

Permalink
chore(docs): typo in docs [skip ci]
Browse files Browse the repository at this point in the history
  • Loading branch information
HugoDelatte committed Jan 1, 2024
1 parent 23f5d5c commit e8aed66
Show file tree
Hide file tree
Showing 43 changed files with 324 additions and 306 deletions.
18 changes: 10 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,17 +90,19 @@ Unfortunately, it faces a number of shortcomings, including high sensitivity to
input parameters (expected returns and covariance), weight concentration, high turnover,
and poor out-of-sample performance.

It is well known that naive allocation (1/N, inverse-vol, ...) tends to outperform MVO
out-of-sample (DeMiguel, 2007).
It is well known that naive allocation (1/N, inverse-vol, etc.) tends to outperform
MVO out-of-sample (DeMiguel, 2007).

Numerous approaches have been developed to alleviate these shortcomings (shrinkage,
additional constraints, regularization, uncertainty set, higher moments, Bayesian
approaches, coherent risk measures, left-tail risk optimization, distributionally robust
optimization, factor model, risk-parity, hierarchical clustering, ensemble methods, ...).
optimization, factor model, risk-parity, hierarchical clustering, ensemble methods,
pre-selection, etc.).

With this large number of methods, added to the fact that they can be composed together,
there is a need for a unified framework to perform model selection, validation,
and parameter tuning while reducing the risk of data leakage and overfitting.
there is a need for a unified framework with a machine learning approach to perform
model selection, validation, and parameter tuning while reducing the risk of data
leakage and overfitting.

This framework is built on scikit-learn's API.

Expand Down Expand Up @@ -128,7 +130,7 @@ Available models
* Empirical
* Exponentially Weighted
* Equilibrium
* Shrinkage (James-Stein, Bayes-Stein, ...)
* Shrinkage

* Covariance Estimator:
* Empirical
Expand Down Expand Up @@ -168,12 +170,12 @@ Available models
* Drop Highly Correlated Assets

* Cross-Validation and Model Selection:
* Compatible with all `sklearn` methods (KFold, ...)
* Compatible with all `sklearn` methods (KFold, etc.)
* Walk Forward
* Combinatorial Purged Cross-Validation

* Hyper-Parameter Tuning:
* Compatible with all `sklearn` methods (GridSearchCV, RandomizedSearchCV, ...)
* Compatible with all `sklearn` methods (GridSearchCV, RandomizedSearchCV)

* Risk Measures:
* Variance
Expand Down
43 changes: 23 additions & 20 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,28 @@
myst_heading_anchors = 2
myst_substitutions = {"rtd": "[Read the Docs](https://readthedocs.org/)"}

# -- sphinx-favicons ------------------------------------------------------------
favicons = [
{
"rel": "shortcut icon",
"type": "image/svg+xml",
"sizes": "any",
"href": "favicon.svg",
},
{
"rel": "icon",
"type": "image/svg+xml",
"sizes": "any",
"href": "favicon.svg",
},
{
"rel": "icon",
"type": "image/png",
"sizes": "144x144",
"href": "favicon.png",
},
]

# -- Options for HTML output -------------------------------------------------

html_theme = "pydata_sphinx_theme"
Expand Down Expand Up @@ -163,26 +185,7 @@
<a href="https://github.com/skfolio/skfolio">check out our GitHub repository.</a>
Your contributions are welcome!</div>""",
"secondary_sidebar_items": [], # No secondary sidebar due to bug with plotly
"favicons": [
{
"rel": "shortcut icon",
"type": "image/svg+xml",
"sizes": "any",
"href": "favicon.svg",
},
{
"rel": "icon",
"type": "image/svg+xml",
"sizes": "any",
"href": "favicon.svg",
},
{
"rel": "icon",
"type": "image/png",
"sizes": "144x144",
"href": "favicon.png",
},
],

}

html_sidebars = {
Expand Down
18 changes: 10 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,19 @@ Unfortunately, it faces a number of shortcomings, including high sensitivity to
input parameters (expected returns and covariance), weight concentration, high turnover,
and poor out-of-sample performance.

It is well known that naive allocation (1/N, inverse-vol, ...) tends to outperform MVO
out-of-sample (DeMiguel, 2007).
It is well known that naive allocation (1/N, inverse-vol, etc.) tends to outperform
MVO out-of-sample (DeMiguel, 2007).

Numerous approaches have been developed to alleviate these shortcomings (shrinkage,
additional constraints, regularization, uncertainty set, higher moments, Bayesian
approaches, coherent risk measures, left-tail risk optimization, distributionally robust
optimization, factor model, risk-parity, hierarchical clustering, ensemble methods, ...).
optimization, factor model, risk-parity, hierarchical clustering, ensemble methods,
pre-selection, etc.).

With this large number of methods, added to the fact that they can be composed together,
there is a need for a unified framework to perform model selection, validation,
and parameter tuning while reducing the risk of data leakage and overfitting.
there is a need for a unified framework with a machine learning approach to perform
model selection, validation, and parameter tuning while reducing the risk of data
leakage and overfitting.

This framework is built on scikit-learn's API.

Expand Down Expand Up @@ -82,7 +84,7 @@ Available models
* Empirical
* Exponentially Weighted
* Equilibrium
* Shrinkage (James-Stein, Bayes-Stein, ...)
* Shrinkage

* Covariance Estimator:
* Empirical
Expand Down Expand Up @@ -122,12 +124,12 @@ Available models
* Drop Highly Correlated Assets

* Cross-Validation and Model Selection:
* Compatible with all `sklearn` methods (KFold, ...)
* Compatible with all `sklearn` methods (KFold, etc.)
* Walk Forward
* Combinatorial Purged Cross-Validation

* Hyper-Parameter Tuning:
* Compatible with all `sklearn` methods (GridSearchCV, RandomizedSearchCV, ...)
* Compatible with all `sklearn` methods (GridSearchCV, RandomizedSearchCV)

* Risk Measures:
* Variance
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Clustering Estimators

The `skfolio.cluster` module complement `sklearn.cluster` with additional clustering
estimators including the :class:`HierarchicalClustering` that forms hierarchical
clusters from a distance matrix. It is used in the following optimization estimators:
clusters from a distance matrix. It is used in the following portfolio optimizations:

* :class:`~skfolio.optimization.HierarchicalRiskParity`
* :class:`~skfolio.optimization.HierarchicalEqualRiskContribution`
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/covariance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ assets.
It follows the same API as scikit-learn's `estimator`: the `fit` method takes `X` as the
assets returns and stores the covariance in its `covariance_` attribute.

`X` can be any array-like structure (numpy array, pandas DataFrame, etc...)
`X` can be any array-like structure (numpy array, pandas DataFrame, etc.)


Available estimators are:
Expand Down
8 changes: 4 additions & 4 deletions docs/user_guide/data_preparation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Data Preparation
****************

Most `fit` methods of `skfolio` estimators take the assets returns as input `X`.
This means that the choice of methodology to convert prices to returns is left to the user.
Therefore, the choice of methodology to convert prices to returns is left to the user.

There are two different notions of return:

Expand Down Expand Up @@ -51,7 +51,7 @@ It is not uncommon to witness the following steps [1]_:

#. Take the daily prices :math:`S_{t}, S_{t+1}, ...,` for all the n securities
#. Transform the daily prices to daily logarithmic returns
#. Estimate the expected retruns vector :math:`\mu` and covariance matrix :math:`\Sigma` from the daily logarithmic returns
#. Estimate the expected returns vector :math:`\mu` and covariance matrix :math:`\Sigma` from the daily logarithmic returns
#. Determine the investment horizon, for example k = 255 days
#. Project the expected returns and covariance to the horizon using the square-root rule: :math:`\mu_{k} ≡ k \times \mu` and :math:`\Sigma_{k} ≡ k \times \Sigma`
#. Compute the mean-variance efficient frontier :math:`\max_{w} \Biggl\{ w^T \mu - \lambda \times w^T \Sigma w \Biggr\}`
Expand All @@ -70,7 +70,7 @@ The correct approach
====================
The correct general approach is the following:

#. Find the market invariants (logarithmic return for stocks, change in yield to maturity for bonds, etc...)
#. Find the market invariants (logarithmic return for stocks, change in yield to maturity for bonds, etc.)
#. Estimate the joint distribution of the market invariant over the time period of estimation
#. Project the distribution of invariants to the time period of investment
#. Map the distribution of invariants into the distribution of security prices at the investment horizon through a pricing function
Expand Down Expand Up @@ -113,7 +113,7 @@ below simplified one will give very close results:

#. Take the prices :math:`S_{t}, S_{t+1}, ...,` (for example daily) for all the n securities
#. Transform the daily prices to daily linear returns
#. Estimate the expected retruns vector :math:`\mu` and covariance matrix :math:`\Sigma` from the daily linear returns
#. Estimate the expected returns vector :math:`\mu` and covariance matrix :math:`\Sigma` from the daily linear returns
#. Compute the mean-variance efficient frontier :math:`\max_{w} \Biggl\{w^T \mu - \lambda \times w^T \Sigma w\Biggr\}`

This simplified procedure is the default one used in all `skfolio` examples as most
Expand Down
20 changes: 10 additions & 10 deletions docs/user_guide/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,15 @@ directory. They are available via:
* :func:`load_ftse100_dataset`
* :func:`load_nasdaq_dataset`

By default the data directory is set to a folder named 'skfolio_data' in the user home
folder. Alternatively, it can be set by the `SKFOLIO_DATA` environment variable
By default the data directory is set to a folder named "skfolio_data" in the user home
folder. Alternatively, it can be set by the `SKFOLIO_DATA` environment variable.
If the folder does not already exist, it is automatically created.


**Example:**

Loading the SPX 500 dataset which is composed of the daily prices of 20 assets from the
S&P 500 composition starting from 1990-01-02 up to 2022-12-28.

The data comes from the Yahoo public API.
The price is the adjusted close which is the closing price after adjustments for
all applicable splits and dividend distributions.
The adjustment uses appropriate split and dividend multipliers, adhering to
the Center for Research in Security Prices (CRSP) standards.
Loading the SPX 500 dataset, which is composed of the daily prices of 20 assets from the
S&P 500 composition starting from 1990-01-02 up to 2022-12-28:

.. code-block:: python
Expand All @@ -41,3 +35,9 @@ the Center for Research in Security Prices (CRSP) standards.
prices = load_sp500_dataset()
print(prices.head())
The data comes from the Yahoo public API.
The price is the adjusted close which is the closing price after adjustments for
all applicable splits and dividend distributions.
The adjustment uses appropriate split and dividend multipliers, adhering to
the Center for Research in Security Prices (CRSP) standards.
2 changes: 1 addition & 1 deletion docs/user_guide/distance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ It follows the same API as scikit-learn's `estimator`: the `fit` method takes `X
assets returns and stores the codependence and distance matrix in its `codependence_`
and `distance_` attributes.

`X` can be any array-like structure (numpy array, pandas DataFrame, etc...)
`X` can be any array-like structure (numpy array, pandas DataFrame, etc.)


Available estimators are:
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/expected_returns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ the assets.

It follows the same API as scikit-learn's `estimator`: the `fit` method takes `X` as
the assets returns and stores the expected returns in its `mu_` attribute.
`X` can be any array-like structure (numpy array, pandas DataFrame, etc...)
`X` can be any array-like structure (numpy array, pandas DataFrame, etc.)


Available estimators are:
Expand Down
14 changes: 7 additions & 7 deletions docs/user_guide/hyper_parameters_tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ A search consists of:


Two generic approaches to parameter search are provided in
scikit-learn: for given values, `sklearn.model_selection.GridSearchCV` exhaustively considers
all parameter combinations, while `sklearn.model_selection.RandomizedSearchCV` can sample a
scikit-learn: for given values, `GridSearchCV` exhaustively considers
all parameter combinations, while `RandomizedSearchCV` can sample a
given number of candidates from a parameter space with a specified
distribution.

Expand All @@ -42,7 +42,7 @@ After describing these tools we detail :ref:`best practices
Exhaustive Grid Search
**********************

The grid search provided by `sklearn.model_selection.GridSearchCV` exhaustively generates
The grid search provided by `GridSearchCV` exhaustively generates
candidates from a grid of parameter values specified with the `param_grid`
parameter. For instance, the following `param_grid`::

Expand Down Expand Up @@ -104,7 +104,7 @@ Randomized Parameter Optimization
While using a grid of parameter settings is currently the most widely used
method for parameter optimization, other search methods have more
favorable properties.
`sklearn.model_selection.RandomizedSearchCV` implements a randomized search over parameters,
`RandomizedSearchCV` implements a randomized search over parameters,
where each setting is sampled from a distribution over possible parameter values.
This has two main benefits over an exhaustive search:

Expand Down Expand Up @@ -182,8 +182,8 @@ Tips for Parameter Search
Specifying an Objective Metric
------------------------------

By default, all optimization estimators have the same score function which is the
**Sharpe Ratio**. This score function can be customized with the
By default, all portfolio optimization estimators have the same score function which is
the **Sharpe Ratio**. This score function can be customized with
:func:`~skfolio.metrics.make_scorer` by using another :ref:`measure <measures_ref>` or
by writing your own score function.

Expand Down Expand Up @@ -246,7 +246,7 @@ parameters of composite or nested estimators using a dedicated

**Example:**

In the below example, we search the optimal parameter `alpha` of the nested estimators
In the below example, we search the optimal parameter `alpha` of the nested estimator
:class:`~skfolio.moments.EWMu`:

.. code-block:: python
Expand Down
16 changes: 8 additions & 8 deletions docs/user_guide/model_selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Model Selection
***************

The Model Selection module extend the ``sklearn.model_selection`` by adding additional
The Model Selection module extends `sklearn.model_selection` by adding additional
methods tailored for portfolio selection.

.. _cross_validation:
Expand All @@ -15,7 +15,7 @@ Cross-Validation Prediction
***************************
Every `skfolio` estimator is compatible with `sklearn.model_selection.cross_val_predict`.
We also implement our own :func:`cross_val_predict` for enhanced integration
with `Portfolio` and `Population` objects as well as compatibility with
with `Portfolio` and `Population` objects, as well as compatibility with
`CombinatorialPurgedCV`.

.. _data_leakage:
Expand All @@ -28,8 +28,8 @@ with `Portfolio` and `Population` objects as well as compatibility with
training set.

In `cross_val_predict`, the data is split according to the `cv` parameter.
The optimization estimator is fitted on the training set and portfolios are predicted on
the corresponding testing set.
The portfolio optimization estimator is fitted on the training set and portfolios are
predicted on the corresponding test set.

For non-combinatorial cross-validation like ``Kfold``, the output is the predicted
:class:`~skfolio.MultiPeriodPortfolio` where each
Expand All @@ -38,7 +38,7 @@ pair (K portfolios for ``Kfold``).

For combinatorial cross-validation like :class:`CombinatorialPurgeCV`, the output is the
predicted :class:`~skfolio.Population` of multiple
:class:`~skfolio.MultiPeriodPortfolio`. This is because each test outputs are a
:class:`~skfolio.MultiPeriodPortfolio`. This is because each test output is a
collection of multiple paths instead of one single path.

**Example:**
Expand Down Expand Up @@ -70,16 +70,16 @@ collection of multiple paths instead of one single path.
Combinatorial Purged Cross-Validation
*************************************
Compared to ``KFold`` which split the data into k folds and generate one single testing
Compared to `KFold`, which splits the data into k folds and generates one single testing
path, the :class:`CombinatorialPurgedCV` uses the combination of multiple
training/testing sets to generate multiple testing paths.
train/test sets to generate multiple testing paths.

To avoid data leakage, purging and embargoing can be performed.

Purging consist of removing from the training set all observations
whose labels overlapped in time with those labels included in the testing set.
Embargoing consist of removing from the training set observations that immediately
follow an observation in the testing set since financial features often incorporate
follow an observation in the testing set, since financial features often incorporate
series that exhibit serial correlation (like ARMA processes).

When used with :func:`cross_val_predict`, the object returned is a
Expand Down

0 comments on commit e8aed66

Please sign in to comment.