Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new shrinkage estimators #20

Closed
88d52bdba0366127fffca9dfa93895 opened this issue Mar 4, 2019 · 7 comments
Closed

Add new shrinkage estimators #20

88d52bdba0366127fffca9dfa93895 opened this issue Mar 4, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@88d52bdba0366127fffca9dfa93895
Copy link
Collaborator

Hi @robertmartin8,

I'm trying to implement 2 estimators of Ledoit-Wolf in python: Sharpe’s single factor (or single-index) model and The constant-correlation model. And I think it's suitable to contribute it into your repo, is it okay or do you have any suggestion for me?

@robertmartin8
Copy link
Owner

That would be amazing! Actually I always wanted to write custom implementations for PyPortfolioOpt rather than just wrapping sklearn (because it's an unnecessary dependency IMO), but had a lot of trouble implementing either of the estimators you mention.

What I envisioned was a LedoitWolfShrinkage parent class which can be subclassed into SingleFactorShrinkage or ConstantCorrelation shrinkage. Please do have a go at it, and I would be happily accept a PR and mark you as a contributor on the documentation site as well.

To help you along, here was my attempt – I think it semi-worked but in the end I couldn't get acceptable covariance matrices and had other things to do, so I just wrapped sklearn instead. Keep me updated!

class LedoitWolfShrinkage:
    """
    Originally described in Ledoit and Wolf (2001).
    The core idea behind Ledoit Wolf Shrinkage is to find a compromise between the sample covariance
    matrix S (although it is the MLE, it has high estimation error) and a shrinkage target F.
    These are then combined with :math:`\delta F + (1-\delta)S`, where :math:`\delta` is
    the shrinkage constant which has to be estimated. The optimal shrinkage constant depends on
    F and S, which is why this module has been designed such that different shrinkage targets are
    sublcasses of LedoitWolfShrinkage.

    https://epublications.bond.edu.au/cgi/viewcontent.cgi?article=1099&context=ejsie

    The default is constant correlation shrinkage, as per Ledoit and Wolf 2004.
    """

    # TODO implement a default identity shrinkage

    def __init__(self, prices):
        self.X = prices.pct_change().dropna(how="all")
        self.S = self.X.cov()  # sample cov is the MLE
        self.F = None  # default shrinkage target
        self.delta = None  # shrinkage constant

    def _shrinkage_target(self):
        """
        Calculates the constant correlation shrinkage target, as per Ledoit and Wolf 2004.
        All pairwise correlations are assumed to be equal to the mean of all correlations,
        and the diagonal is just the std of each asset's returns.
        Note that these functions particularly depend on X being an pandas dataframe, because
        it provides functions that are very robust to missing data.
        :return: constant correlation shrinkage target
        :rtype: np.array
        """
        T, N = self.X.shape
        correlation_matrix = self.X.corr()
        # r_bar is the average pairwise correlation (excluding diagonal)
        r_bar = (np.sum(correlation_matrix.values) - N) / (N * (N - 1))
        # r_bar = (np.mean(correlation_matrix.values)  average correlation

        # Use numpy's broadcasting to calculate s_ii * s_jj
        # Then f_ij = r_bar * sqrt(s_ii * s_jj)
        F = r_bar * np.sqrt(self.X.std().values *
                            self.X.std().values.reshape((-1, 1)))
        # f_ii = s_ii
        np.fill_diagonal(F, self.X.std().values)
        self.F = F
        return self.F, r_bar

    def _pi_matrix(self):
        """
        Estimates the asymptotic variances of the scaled entries of the sample covariance matrix.
        This is used to calculate pi and rho.
        :return: scaled asymptotic variances, in an NxN array where N is the number of assets.
        :rtype: np.array
        """

        T, N = self.X.shape
        Xc = np.array(self.X - self.X.mean())  # centre X
        Xc = np.nan_to_num(Xc)
        S = np.array(self.S)
        pi_matrix = np.zeros((N, N))
        for i in range(N):
            for j in range(i, N):
                pi_matrix[i, j] = pi_matrix[j, i] = np.sum(
                    (Xc[:, i] * Xc[:, j] - S[i, j]) ** 2
                )
        return pi_matrix / T

    def _pi(self):
        """
        Estimates the sum of asymptotic variances of the scaled entries of the
        sample covariance matrix. One of the three components of the optimal shrinkage
        constant
        :return: pi(hat)
        :rtype: float
        """
        pi_matrix = self._pi_matrix()
        return np.sum(pi_matrix)

    # TODO implement
    def _rho(self):
        Xc = self.X - self.X.mean()
        T, N = self.X.shape

        pi_diag = np.trace(self._pi_matrix())
        _, r_bar = self._shrinkage_target()

        asycov = 0

        for i in range(N):
            for j in range(N):
                if j == i:
                    continue

                theta_ii = np.sum(
                    (Xc.iloc[:, i] ** 2 - self.S.iloc[i, i])
                    * (Xc.iloc[:, i] * Xc.iloc[:, j] - self.S.iloc[i, j])
                )
                theta_jj = np.sum(
                    (Xc.iloc[:, j] ** 2 - self.S.iloc[j, j])
                    * (Xc.iloc[:, i] * Xc.iloc[:, j] - self.S.iloc[i, j])
                )

                first_term = np.sqrt(
                    self.S.iloc[j, j] / self.S.iloc[i, i]) * theta_ii
                second_term = np.sqrt(
                    self.S.iloc[i, i] / self.S.iloc[j, j]) * theta_jj
                asycov += first_term + second_term

        return pi_diag + r_bar / 2 * asycov / T

    def _gamma(self):
        """
        Gamma estimates deviance of the shrinkage target (i.e sum of element-wise squared distances
        between shrinkage target matrix and sample covariance matrix).
        :return: gamma
        :rtype: float
        """
        if self.F is None:
            self._shrinkage_target()
        S = np.array(self.S)
        return np.sum((self.F - S) ** 2)

    def _optimal_shrinkage_constant(self):
        """
        Calculates the optimal value of the shrinkage constant, as per Ledoit and Wolf. This
        is truncated to [0, 1], in the rare case that k/T is outside of this range.
        :return: optimal shrinkage constant
        :rtype: float between 0 and 1.
        """
        T = self.X.shape[0]
        pi = self._pi()
        rho = self._rho()
        gamma = self._gamma()
        kappa = (pi - rho) / gamma
        self.delta = max(0, min(kappa / T, 1))
        return self.delta

    def shrink(self):
        if self.delta is None:
            self._optimal_shrinkage_constant()
        # Return shrunk matrix
        S_hat = self.delta * self.F + (1 - self.delta) * self.S
        return S_hat

@robertmartin8 robertmartin8 added the enhancement New feature or request label Mar 4, 2019
@88d52bdba0366127fffca9dfa93895
Copy link
Collaborator Author

IMO sklearn's shrinkage(s) is very well-optimized then we dont need to re-implement them. I will look at your attempt and go ahead.

@robertmartin8
Copy link
Owner

The problem is that they only implement identity shrinkage, whereas ideally I want to offer constant correlation and single index

@88d52bdba0366127fffca9dfa93895
Copy link
Collaborator Author

Just keep them in the wrapper, it's okay. I found that this guy (https://github.com/jasonstrimpel/covshrink/blob/master/core/portfolio.py#L417) converts authors' code from Matlab to python, I will verify and follow your codebase to add new shrinkage models.

@robertmartin8
Copy link
Owner

@88d52bdba0366127fffca9dfa93895

I've just merged the PR – code looks excellent!

Also, could you share with me links to all the sources you used? This will be very important for me when I get round to writing documentation. For instance, where is this matlab code?

@robertmartin8
Copy link
Owner

Brilliant. I'll revisit this in summer and add docs and examples. Thanks for all your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants