## LogisticRegressionCV源码
```python
class LogisticRegressionCV(LogisticRegression, LinearClassifierMixin, BaseEstimator):
    """Logistic Regression CV (aka logit, MaxEnt) classifier.

    See glossary entry for :term:`cross-validation estimator`.

    This class implements logistic regression using liblinear, newton-cg, sag
    or lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
    regularization with primal formulation. The liblinear solver supports both
    L1 and L2 regularization, with a dual formulation only for the L2 penalty.
    Elastic-Net penalty is only supported by the saga solver.

    For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter
    is selected by the cross-validator
    :class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed
    using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs'
    solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).

    Read more in the :ref:`User Guide <logistic_regression>`.

    Parameters
    ----------
    Cs : int or list of floats, default=10
        Each of the values in Cs describes the inverse of regularization
        strength. If Cs is as an int, then a grid of Cs values are chosen
        in a logarithmic scale between 1e-4 and 1e4.
        Like in support vector machines, smaller values specify stronger
        regularization.

    fit_intercept : bool, default=True
        Specifies if a constant (a.k.a. bias or intercept) should be
        added to the decision function.

    cv : int or cross-validation generator, default=None
        The default cross-validation generator used is Stratified K-Folds.
        If an integer is provided, then it is the number of folds used.
        See the module :mod:`sklearn.model_selection` module for the
        list of possible cross-validation objects.

        .. versionchanged:: 0.22
            ``cv`` default value if None changed from 3-fold to 5-fold.

    dual : bool, default=False
        Dual (constrained) or primal (regularized, see also
        :ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation
        is only implemented for l2 penalty with liblinear solver. Prefer dual=False when
        n_samples > n_features.

    penalty : {'l1', 'l2', 'elasticnet'}, default='l2'
        Specify the norm of the penalty:

        - `'l2'`: add a L2 penalty term (used by default);
        - `'l1'`: add a L1 penalty term;
        - `'elasticnet'`: both L1 and L2 penalty terms are added.

        .. warning::
           Some penalties may not work with some solvers. See the parameter
           `solver` below, to know the compatibility between the penalty and
           solver.

    scoring : str or callable, default=None
        A string (see :ref:`scoring_parameter`) or
        a scorer callable object / function with signature
        ``scorer(estimator, X, y)``. For a list of scoring functions
        that can be used, look at :mod:`sklearn.metrics`. The
        default scoring option used is 'accuracy'.

    solver : {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, \
            default='lbfgs'

        Algorithm to use in the optimization problem. Default is 'lbfgs'.
        To choose a solver, you might want to consider the following aspects:

        - For small datasets, 'liblinear' is a good choice, whereas 'sag'
          and 'saga' are faster for large ones;
        - For multiclass problems, all solvers except 'liblinear' minimize the full
          multinomial loss;
        - 'liblinear' might be slower in :class:`LogisticRegressionCV`
          because it does not handle warm-starting.
        - 'liblinear' can only handle binary classification by default. To apply a
          one-versus-rest scheme for the multiclass setting one can wrap it with the
          :class:`~sklearn.multiclass.OneVsRestClassifier`.
        - 'newton-cholesky' is a good choice for
          `n_samples` >> `n_features * n_classes`, especially with one-hot encoded
          categorical features with rare categories. Be aware that the memory usage
          of this solver has a quadratic dependency on `n_features * n_classes`
          because it explicitly computes the full Hessian matrix.

        .. warning::
           The choice of the algorithm depends on the penalty chosen and on
           (multinomial) multiclass support:

           ================= ============================== ======================
           solver            penalty                        multinomial multiclass
           ================= ============================== ======================
           'lbfgs'           'l2'                           yes
           'liblinear'       'l1', 'l2'                     no
           'newton-cg'       'l2'                           yes
           'newton-cholesky' 'l2',                          no
           'sag'             'l2',                          yes
           'saga'            'elasticnet', 'l1', 'l2'       yes
           ================= ============================== ======================

        .. note::
           'sag' and 'saga' fast convergence is only guaranteed on features
           with approximately the same scale. You can preprocess the data with
           a scaler from :mod:`sklearn.preprocessing`.

        .. versionadded:: 0.17
           Stochastic Average Gradient descent solver.
        .. versionadded:: 0.19
           SAGA solver.
        .. versionadded:: 1.2
           newton-cholesky solver.

    tol : float, default=1e-4
        Tolerance for stopping criteria.

    max_iter : int, default=100
        Maximum number of iterations of the optimization algorithm.

    class_weight : dict or 'balanced', default=None
        Weights associated with classes in the form ``{class_label: weight}``.
        If not given, all classes are supposed to have weight one.

        The "balanced" mode uses the values of y to automatically adjust
        weights inversely proportional to class frequencies in the input data
        as ``n_samples / (n_classes * np.bincount(y))``.

        Note that these weights will be multiplied with sample_weight (passed
        through the fit method) if sample_weight is specified.

        .. versionadded:: 0.17
           class_weight == 'balanced'

    n_jobs : int, default=None
        Number of CPU cores used during the cross-validation loop.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
        for more details.

    verbose : int, default=0
        For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any
        positive number for verbosity.

    refit : bool, default=True
        If set to True, the scores are averaged across all folds, and the
        coefs and the C that corresponds to the best score is taken, and a
        final refit is done using these parameters.
        Otherwise the coefs, intercepts and C that correspond to the
        best scores across folds are averaged.

    intercept_scaling : float, default=1
        Useful only when the solver 'liblinear' is used
        and self.fit_intercept is set to True. In this case, x becomes
        [x, self.intercept_scaling],
        i.e. a "synthetic" feature with constant value equal to
        intercept_scaling is appended to the instance vector.
        The intercept becomes ``intercept_scaling * synthetic_feature_weight``.

        Note! the synthetic feature weight is subject to l1/l2 regularization
        as all other features.
        To lessen the effect of regularization on synthetic feature weight
        (and therefore on the intercept) intercept_scaling has to be increased.

    multi_class : {'auto, 'ovr', 'multinomial'}, default='auto'
        If the option chosen is 'ovr', then a binary problem is fit for each
        label. For 'multinomial' the loss minimised is the multinomial loss fit
        across the entire probability distribution, *even when the data is
        binary*. 'multinomial' is unavailable when solver='liblinear'.
        'auto' selects 'ovr' if the data is binary, or if solver='liblinear',
        and otherwise selects 'multinomial'.

        .. versionadded:: 0.18
           Stochastic Average Gradient descent solver for 'multinomial' case.
        .. versionchanged:: 0.22
            Default changed from 'ovr' to 'auto' in 0.22.
        .. deprecated:: 1.5
           ``multi_class`` was deprecated in version 1.5 and will be removed in 1.7.
           From then on, the recommended 'multinomial' will always be used for
           `n_classes >= 3`.
           Solvers that do not support 'multinomial' will raise an error.
           Use `sklearn.multiclass.OneVsRestClassifier(LogisticRegressionCV())` if you
           still want to use OvR.

    random_state : int, RandomState instance, default=None
        Used when `solver='sag'`, 'saga' or 'liblinear' to shuffle the data.
        Note that this only applies to the solver and not the cross-validation
        generator. See :term:`Glossary <random_state>` for details.

    l1_ratios : list of float, default=None
        The list of Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``.
        Only used if ``penalty='elasticnet'``. A value of 0 is equivalent to
        using ``penalty='l2'``, while 1 is equivalent to using
        ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a combination
        of L1 and L2.

    Attributes
    ----------
    classes_ : ndarray of shape (n_classes, )
        A list of class labels known to the classifier.

    coef_ : ndarray of shape (1, n_features) or (n_classes, n_features)
        Coefficient of the features in the decision function.

        `coef_` is of shape (1, n_features) when the given problem
        is binary.

    intercept_ : ndarray of shape (1,) or (n_classes,)
        Intercept (a.k.a. bias) added to the decision function.

        If `fit_intercept` is set to False, the intercept is set to zero.
        `intercept_` is of shape(1,) when the problem is binary.

    Cs_ : ndarray of shape (n_cs)
        Array of C i.e. inverse of regularization parameter values used
        for cross-validation.

    l1_ratios_ : ndarray of shape (n_l1_ratios)
        Array of l1_ratios used for cross-validation. If no l1_ratio is used
        (i.e. penalty is not 'elasticnet'), this is set to ``[None]``

    coefs_paths_ : ndarray of shape (n_folds, n_cs, n_features) or \
                   (n_folds, n_cs, n_features + 1)
        dict with classes as the keys, and the path of coefficients obtained
        during cross-validating across each fold and then across each Cs
        after doing an OvR for the corresponding class as values.
        If the 'multi_class' option is set to 'multinomial', then
        the coefs_paths are the coefficients corresponding to each class.
        Each dict value has shape ``(n_folds, n_cs, n_features)`` or
        ``(n_folds, n_cs, n_features + 1)`` depending on whether the
        intercept is fit or not. If ``penalty='elasticnet'``, the shape is
        ``(n_folds, n_cs, n_l1_ratios_, n_features)`` or
        ``(n_folds, n_cs, n_l1_ratios_, n_features + 1)``.

    scores_ : dict
        dict with classes as the keys, and the values as the
        grid of scores obtained during cross-validating each fold, after doing
        an OvR for the corresponding class. If the 'multi_class' option
        given is 'multinomial' then the same scores are repeated across
        all classes, since this is the multinomial class. Each dict value
        has shape ``(n_folds, n_cs)`` or ``(n_folds, n_cs, n_l1_ratios)`` if
        ``penalty='elasticnet'``.

    C_ : ndarray of shape (n_classes,) or (n_classes - 1,)
        Array of C that maps to the best scores across every class. If refit is
        set to False, then for each class, the best C is the average of the
        C's that correspond to the best scores for each fold.
        `C_` is of shape(n_classes,) when the problem is binary.

    l1_ratio_ : ndarray of shape (n_classes,) or (n_classes - 1,)
        Array of l1_ratio that maps to the best scores across every class. If
        refit is set to False, then for each class, the best l1_ratio is the
        average of the l1_ratio's that correspond to the best scores for each
        fold.  `l1_ratio_` is of shape(n_classes,) when the problem is binary.

    n_iter_ : ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs)
        Actual number of iterations for all classes, folds and Cs.
        In the binary or multinomial cases, the first dimension is equal to 1.
        If ``penalty='elasticnet'``, the shape is ``(n_classes, n_folds,
        n_cs, n_l1_ratios)`` or ``(1, n_folds, n_cs, n_l1_ratios)``.

    n_features_in_ : int
        Number of features seen during :term:`fit`.

        .. versionadded:: 0.24

    feature_names_in_ : ndarray of shape (`n_features_in_`,)
        Names of features seen during :term:`fit`. Defined only when `X`
        has feature names that are all strings.

        .. versionadded:: 1.0

    See Also
    --------
    LogisticRegression : Logistic regression without tuning the
        hyperparameter `C`.

    Examples
    --------
    >>> from sklearn.datasets import load_iris
    >>> from sklearn.linear_model import LogisticRegressionCV
    >>> X, y = load_iris(return_X_y=True)
    >>> clf = LogisticRegressionCV(cv=5, random_state=0).fit(X, y)
    >>> clf.predict(X[:2, :])
    array([0, 0])
    >>> clf.predict_proba(X[:2, :]).shape
    (2, 3)
    >>> clf.score(X, y)
    0.98...
    """

    _parameter_constraints: dict = {**LogisticRegression._parameter_constraints}

    for param in ["C", "warm_start", "l1_ratio"]:
        _parameter_constraints.pop(param)

    _parameter_constraints.update(
        {
            "Cs": [Interval(Integral, 1, None, closed="left"), "array-like"],
            "cv": ["cv_object"],
            "scoring": [StrOptions(set(get_scorer_names())), callable, None],
            "l1_ratios": ["array-like", None],
            "refit": ["boolean"],
            "penalty": [StrOptions({"l1", "l2", "elasticnet"})],
        }
    )

    def __init__(
        self,
        *,
        Cs=10,
        fit_intercept=True,
        cv=None,
        dual=False,
        penalty="l2",
        scoring=None,
        solver="lbfgs",
        tol=1e-4,
        max_iter=100,
        class_weight=None,
        n_jobs=None,
        verbose=0,
        refit=True,
        intercept_scaling=1.0,
        multi_class="deprecated",
        random_state=None,
        l1_ratios=None,
    ):
        self.Cs = Cs
        self.fit_intercept = fit_intercept
        self.cv = cv
        self.dual = dual
        self.penalty = penalty
        self.scoring = scoring
        self.tol = tol
        self.max_iter = max_iter
        self.class_weight = class_weight
        self.n_jobs = n_jobs
        self.verbose = verbose
        self.solver = solver
        self.refit = refit
        self.intercept_scaling = intercept_scaling
        self.multi_class = multi_class
        self.random_state = random_state
        self.l1_ratios = l1_ratios

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(self, X, y, sample_weight=None, **params):
        """Fit the model according to the given training data.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape (n_samples, n_features)
            Training vector, where `n_samples` is the number of samples and
            `n_features` is the number of features.

        y : array-like of shape (n_samples,)
            Target vector relative to X.

        sample_weight : array-like of shape (n_samples,) default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.

        **params : dict
            Parameters to pass to the underlying splitter and scorer.

            .. versionadded:: 1.4

        Returns
        -------
        self : object
            Fitted LogisticRegressionCV estimator.
        """
        _raise_for_params(params, self, "fit")

        solver = _check_solver(self.solver, self.penalty, self.dual)

        if self.penalty == "elasticnet":
            if (
                self.l1_ratios is None
                or len(self.l1_ratios) == 0
                or any(
                    (
                        not isinstance(l1_ratio, numbers.Number)
                        or l1_ratio < 0
                        or l1_ratio > 1
                    )
                    for l1_ratio in self.l1_ratios
                )
            ):
                raise ValueError(
                    "l1_ratios must be a list of numbers between "
                    "0 and 1; got (l1_ratios=%r)" % self.l1_ratios
                )
            l1_ratios_ = self.l1_ratios
        else:
            if self.l1_ratios is not None:
                warnings.warn(
                    "l1_ratios parameter is only used when penalty "
                    "is 'elasticnet'. Got (penalty={})".format(self.penalty)
                )

            l1_ratios_ = [None]

        X, y = validate_data(
            self,
            X,
            y,
            accept_sparse="csr",
            dtype=np.float64,
            order="C",
            accept_large_sparse=solver not in ["liblinear", "sag", "saga"],
        )
        check_classification_targets(y)

        class_weight = self.class_weight

        # Encode for string labels
        label_encoder = LabelEncoder().fit(y)
        y = label_encoder.transform(y)
        if isinstance(class_weight, dict):
            class_weight = {
                label_encoder.transform([cls])[0]: v for cls, v in class_weight.items()
            }

        # The original class labels
        classes = self.classes_ = label_encoder.classes_
        encoded_labels = label_encoder.transform(label_encoder.classes_)

        # TODO(1.7) remove multi_class
        multi_class = self.multi_class
        if self.multi_class == "multinomial" and len(self.classes_) == 2:
            warnings.warn(
                (
                    "'multi_class' was deprecated in version 1.5 and will be removed in"
                    " 1.7. From then on, binary problems will be fit as proper binary "
                    " logistic regression models (as if multi_class='ovr' were set)."
                    " Leave it to its default value to avoid this warning."
                ),
                FutureWarning,
            )
        elif self.multi_class in ("multinomial", "auto"):
            warnings.warn(
                (
                    "'multi_class' was deprecated in version 1.5 and will be removed in"
                    " 1.7. From then on, it will always use 'multinomial'."
                    " Leave it to its default value to avoid this warning."
                ),
                FutureWarning,
            )
        elif self.multi_class == "ovr":
            warnings.warn(
                (
                    "'multi_class' was deprecated in version 1.5 and will be removed in"
                    " 1.7. Use OneVsRestClassifier(LogisticRegressionCV(..)) instead."
                    " Leave it to its default value to avoid this warning."
                ),
                FutureWarning,
            )
        else:
            # Set to old default value.
            multi_class = "auto"
        multi_class = _check_multi_class(multi_class, solver, len(classes))

        if solver in ["sag", "saga"]:
            max_squared_sum = row_norms(X, squared=True).max()
        else:
            max_squared_sum = None

        if _routing_enabled():
            routed_params = process_routing(
                self,
                "fit",
                sample_weight=sample_weight,
                **params,
            )
        else:
            routed_params = Bunch()
            routed_params.splitter = Bunch(split={})
            routed_params.scorer = Bunch(score=params)
            if sample_weight is not None:
                routed_params.scorer.score["sample_weight"] = sample_weight

        # init cross-validation generator
        cv = check_cv(self.cv, y, classifier=True)
        folds = list(cv.split(X, y, **routed_params.splitter.split))

        # Use the label encoded classes
        n_classes = len(encoded_labels)

        if n_classes < 2:
            raise ValueError(
                "This solver needs samples of at least 2 classes"
                " in the data, but the data contains only one"
                " class: %r" % classes[0]
            )

        if n_classes == 2:
            # OvR in case of binary problems is as good as fitting
            # the higher label
            n_classes = 1
            encoded_labels = encoded_labels[1:]
            classes = classes[1:]

        # We need this hack to iterate only once over labels, in the case of
        # multi_class = multinomial, without changing the value of the labels.
        if multi_class == "multinomial":
            iter_encoded_labels = iter_classes = [None]
        else:
            iter_encoded_labels = encoded_labels
            iter_classes = classes

        # compute the class weights for the entire dataset y
        if class_weight == "balanced":
            class_weight = compute_class_weight(
                class_weight, classes=np.arange(len(self.classes_)), y=y
            )
            class_weight = dict(enumerate(class_weight))

        path_func = delayed(_log_reg_scoring_path)

        # The SAG solver releases the GIL so it's more efficient to use
        # threads for this solver.
        if self.solver in ["sag", "saga"]:
            prefer = "threads"
        else:
            prefer = "processes"

        fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, prefer=prefer)(
            path_func(
                X,
                y,
                train,
                test,
                pos_class=label,
                Cs=self.Cs,
                fit_intercept=self.fit_intercept,
                penalty=self.penalty,
                dual=self.dual,
                solver=solver,
                tol=self.tol,
                max_iter=self.max_iter,
                verbose=self.verbose,
                class_weight=class_weight,
                scoring=self.scoring,
                multi_class=multi_class,
                intercept_scaling=self.intercept_scaling,
                random_state=self.random_state,
                max_squared_sum=max_squared_sum,
                sample_weight=sample_weight,
                l1_ratio=l1_ratio,
                score_params=routed_params.scorer.score,
            )
            for label in iter_encoded_labels
            for train, test in folds
            for l1_ratio in l1_ratios_
        )

        # _log_reg_scoring_path will output different shapes depending on the
        # multi_class param, so we need to reshape the outputs accordingly.
        # Cs is of shape (n_classes . n_folds . n_l1_ratios, n_Cs) and all the
        # rows are equal, so we just take the first one.
        # After reshaping,
        # - scores is of shape (n_classes, n_folds, n_Cs . n_l1_ratios)
        # - coefs_paths is of shape
        #  (n_classes, n_folds, n_Cs . n_l1_ratios, n_features)
        # - n_iter is of shape
        #  (n_classes, n_folds, n_Cs . n_l1_ratios) or
        #  (1, n_folds, n_Cs . n_l1_ratios)
        coefs_paths, Cs, scores, n_iter_ = zip(*fold_coefs_)
        self.Cs_ = Cs[0]
        if multi_class == "multinomial":
            coefs_paths = np.reshape(
                coefs_paths,
                (len(folds), len(l1_ratios_) * len(self.Cs_), n_classes, -1),
            )
            # equiv to coefs_paths = np.moveaxis(coefs_paths, (0, 1, 2, 3),
            #                                                 (1, 2, 0, 3))
            coefs_paths = np.swapaxes(coefs_paths, 0, 1)
            coefs_paths = np.swapaxes(coefs_paths, 0, 2)
            self.n_iter_ = np.reshape(
                n_iter_, (1, len(folds), len(self.Cs_) * len(l1_ratios_))
            )
            # repeat same scores across all classes
            scores = np.tile(scores, (n_classes, 1, 1))
        else:
            coefs_paths = np.reshape(
                coefs_paths,
                (n_classes, len(folds), len(self.Cs_) * len(l1_ratios_), -1),
            )
            self.n_iter_ = np.reshape(
                n_iter_, (n_classes, len(folds), len(self.Cs_) * len(l1_ratios_))
            )
        scores = np.reshape(scores, (n_classes, len(folds), -1))
        self.scores_ = dict(zip(classes, scores))
        self.coefs_paths_ = dict(zip(classes, coefs_paths))

        self.C_ = list()
        self.l1_ratio_ = list()
        self.coef_ = np.empty((n_classes, X.shape[1]))
        self.intercept_ = np.zeros(n_classes)
        for index, (cls, encoded_label) in enumerate(
            zip(iter_classes, iter_encoded_labels)
        ):
            if multi_class == "ovr":
                scores = self.scores_[cls]
                coefs_paths = self.coefs_paths_[cls]
            else:
                # For multinomial, all scores are the same across classes
                scores = scores[0]
                # coefs_paths will keep its original shape because
                # logistic_regression_path expects it this way

            if self.refit:
                # best_index is between 0 and (n_Cs . n_l1_ratios - 1)
                # for example, with n_cs=2 and n_l1_ratios=3
                # the layout of scores is
                # [c1, c2, c1, c2, c1, c2]
                #   l1_1 ,  l1_2 ,  l1_3
                best_index = scores.sum(axis=0).argmax()

                best_index_C = best_index % len(self.Cs_)
                C_ = self.Cs_[best_index_C]
                self.C_.append(C_)

                best_index_l1 = best_index // len(self.Cs_)
                l1_ratio_ = l1_ratios_[best_index_l1]
                self.l1_ratio_.append(l1_ratio_)

                if multi_class == "multinomial":
                    coef_init = np.mean(coefs_paths[:, :, best_index, :], axis=1)
                else:
                    coef_init = np.mean(coefs_paths[:, best_index, :], axis=0)

                # Note that y is label encoded and hence pos_class must be
                # the encoded label / None (for 'multinomial')
                w, _, _ = _logistic_regression_path(
                    X,
                    y,
                    pos_class=encoded_label,
                    Cs=[C_],
                    solver=solver,
                    fit_intercept=self.fit_intercept,
                    coef=coef_init,
                    max_iter=self.max_iter,
                    tol=self.tol,
                    penalty=self.penalty,
                    class_weight=class_weight,
                    multi_class=multi_class,
                    verbose=max(0, self.verbose - 1),
                    random_state=self.random_state,
                    check_input=False,
                    max_squared_sum=max_squared_sum,
                    sample_weight=sample_weight,
                    l1_ratio=l1_ratio_,
                )
                w = w[0]

            else:
                # Take the best scores across every fold and the average of
                # all coefficients corresponding to the best scores.
                best_indices = np.argmax(scores, axis=1)
                if multi_class == "ovr":
                    w = np.mean(
                        [coefs_paths[i, best_indices[i], :] for i in range(len(folds))],
                        axis=0,
                    )
                else:
                    w = np.mean(
                        [
                            coefs_paths[:, i, best_indices[i], :]
                            for i in range(len(folds))
                        ],
                        axis=0,
                    )

                best_indices_C = best_indices % len(self.Cs_)
                self.C_.append(np.mean(self.Cs_[best_indices_C]))

                if self.penalty == "elasticnet":
                    best_indices_l1 = best_indices // len(self.Cs_)
                    self.l1_ratio_.append(np.mean(l1_ratios_[best_indices_l1]))
                else:
                    self.l1_ratio_.append(None)

            if multi_class == "multinomial":
                self.C_ = np.tile(self.C_, n_classes)
                self.l1_ratio_ = np.tile(self.l1_ratio_, n_classes)
                self.coef_ = w[:, : X.shape[1]]
                if self.fit_intercept:
                    self.intercept_ = w[:, -1]
            else:
                self.coef_[index] = w[: X.shape[1]]
                if self.fit_intercept:
                    self.intercept_[index] = w[-1]

        self.C_ = np.asarray(self.C_)
        self.l1_ratio_ = np.asarray(self.l1_ratio_)
        self.l1_ratios_ = np.asarray(l1_ratios_)
        # if elasticnet was used, add the l1_ratios dimension to some
        # attributes
        if self.l1_ratios is not None:
            # with n_cs=2 and n_l1_ratios=3
            # the layout of scores is
            # [c1, c2, c1, c2, c1, c2]
            #   l1_1 ,  l1_2 ,  l1_3
            # To get a 2d array with the following layout
            #      l1_1, l1_2, l1_3
            # c1 [[ .  ,  .  ,  .  ],
            # c2  [ .  ,  .  ,  .  ]]
            # We need to first reshape and then transpose.
            # The same goes for the other arrays
            for cls, coefs_path in self.coefs_paths_.items():
                self.coefs_paths_[cls] = coefs_path.reshape(
                    (len(folds), self.l1_ratios_.size, self.Cs_.size, -1)
                )
                self.coefs_paths_[cls] = np.transpose(
                    self.coefs_paths_[cls], (0, 2, 1, 3)
                )
            for cls, score in self.scores_.items():
                self.scores_[cls] = score.reshape(
                    (len(folds), self.l1_ratios_.size, self.Cs_.size)
                )
                self.scores_[cls] = np.transpose(self.scores_[cls], (0, 2, 1))

            self.n_iter_ = self.n_iter_.reshape(
                (-1, len(folds), self.l1_ratios_.size, self.Cs_.size)
            )
            self.n_iter_ = np.transpose(self.n_iter_, (0, 1, 3, 2))

        return self

    def score(self, X, y, sample_weight=None, **score_params):
        """Score using the `scoring` option on the given test data and labels.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Test samples.

        y : array-like of shape (n_samples,)
            True labels for X.

        sample_weight : array-like of shape (n_samples,), default=None
            Sample weights.

        **score_params : dict
            Parameters to pass to the `score` method of the underlying scorer.

            .. versionadded:: 1.4

        Returns
        -------
        score : float
            Score of self.predict(X) w.r.t. y.
        """
        _raise_for_params(score_params, self, "score")

        scoring = self._get_scorer()
        if _routing_enabled():
            routed_params = process_routing(
                self,
                "score",
                sample_weight=sample_weight,
                **score_params,
            )
        else:
            routed_params = Bunch()
            routed_params.scorer = Bunch(score={})
            if sample_weight is not None:
                routed_params.scorer.score["sample_weight"] = sample_weight

        return scoring(
            self,
            X,
            y,
            **routed_params.scorer.score,
        )

    def get_metadata_routing(self):
        """Get metadata routing of this object.

        Please check :ref:`User Guide <metadata_routing>` on how the routing
        mechanism works.

        .. versionadded:: 1.4

        Returns
        -------
        routing : MetadataRouter
            A :class:`~sklearn.utils.metadata_routing.MetadataRouter` encapsulating
            routing information.
        """

        router = (
            MetadataRouter(owner=self.__class__.__name__)
            .add_self_request(self)
            .add(
                splitter=self.cv,
                method_mapping=MethodMapping().add(caller="fit", callee="split"),
            )
            .add(
                scorer=self._get_scorer(),
                method_mapping=MethodMapping()
                .add(caller="score", callee="score")
                .add(caller="fit", callee="score"),
            )
        )
        return router

    def _get_scorer(self):
        """Get the scorer based on the scoring method specified.
        The default scoring method is `accuracy`.
        """
        scoring = self.scoring or "accuracy"
        return get_scorer(scoring)

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        tags.input_tags.sparse = True
        return tags

```

下面我**从零重新讲解**你给的 `LogisticRegressionCV` 源码，按“初学者能理解”的方式，把每一段代码在做什么、为什么这么做、最终影响什么属性，讲清楚。为了不把信息一次性塞爆，我把讲解组织成 **A. 类与参数系统 → B. fit 主流程 → C. refit/选参细节 → D. score 与 metadata routing → E. 训练后你能用到哪些属性**。

---

# A. 这个类到底是什么？

```python
class LogisticRegressionCV(LogisticRegression, LinearClassifierMixin, BaseEstimator):
```

它是一个“**带交叉验证自动调参**”的逻辑回归模型。

* 继承 `LogisticRegression`：复用逻辑回归的多数能力（例如一些通用行为、标签系统等）。
* `LinearClassifierMixin`：提供线性分类器通用接口（例如决策函数相关行为）。
* `BaseEstimator`：让它符合 sklearn 生态规范（能 `get_params/set_params`、能进 Pipeline、能被 clone）。

你可以把它当成：

> **LogisticRegression + 内置 CV 找最佳 C / l1_ratio +（可选）用最佳参数在全量数据上再训练一次（refit）**

---

# B. 参数校验系统 `_parameter_constraints`：为什么要先删掉一些再加回来？

```python
_parameter_constraints: dict = {**LogisticRegression._parameter_constraints}

for param in ["C", "warm_start", "l1_ratio"]:
    _parameter_constraints.pop(param)

_parameter_constraints.update({...})
```

## 1）先复制父类的“参数合法性规则”

`LogisticRegression` 本来有一套参数约束，比如 `C` 必须是正数等。

## 2）为什么删掉 `C / warm_start / l1_ratio`？

因为在 **CV 版本**里：

* 不用单个 `C`，而是用 **`Cs`（一组候选 C）**
* 不用单个 `l1_ratio`，而是用 **`l1_ratios`（一组候选 l1_ratio）**
* `warm_start` 在 CV/并行路径里处理方式不同，因此不作为“外部参数”直接约束

## 3）新增本类约束

它为 `Cs/cv/scoring/l1_ratios/refit/penalty` 增加合法性说明。
初学者记住用途就行：

> **这段代码是为了让 sklearn 在你传错参数时更早、更清晰地报错。**

---

# C. `__init__`：只存参数，不训练

```python
def __init__(..., Cs=10, fit_intercept=True, cv=None, ...):
    self.Cs = Cs
    self.fit_intercept = fit_intercept
    ...
```

`__init__` 只做一件事：**把你传入的配置保存到 self 上**。
真正训练在 `.fit()`。

---

# D. `fit()`：核心主流程（逐段拆解）

```python
@_fit_context(prefer_skip_nested_validation=True)
def fit(self, X, y, sample_weight=None, **params):
```

`fit()` 是全类最重要的函数。你可以把它理解为：

> **检查参数 → 规范化数据 → 准备 CV 划分 → 并行训练/评分（遍历所有超参组合）→ 汇总 → 选最佳 →（可选）全量重训 → 填好模型属性**

下面按源码顺序讲。

---

## D1. 额外参数检查

```python
_raise_for_params(params, self, "fit")
```

`fit()` 从 sklearn 1.4 开始支持把一些参数“路由”给 splitter/scorer。
这里先检查 `**params` 里有没有不允许的内容，避免“悄悄忽略”。

---

## D2. 检查 solver 与 penalty/dual 是否兼容

```python
solver = _check_solver(self.solver, self.penalty, self.dual)
```

逻辑回归不同 solver 支持的 penalty 不同，例如：

* `lbfgs/newton-cg/sag`：一般只支持 L2
* `saga`：支持 l1/l2/elasticnet
* `liblinear`：支持 l1/l2，但 multinomial 有限制

这一步是“**预防性报错**”：不合法组合直接停。

---

## D3. ElasticNet 情况必须提供 `l1_ratios`

```python
if self.penalty == "elasticnet":
    if self.l1_ratios is None or len(self.l1_ratios)==0 or any(不在[0,1]):
        raise ValueError(...)
    l1_ratios_ = self.l1_ratios
else:
    if self.l1_ratios is not None:
        warnings.warn(...)
    l1_ratios_ = [None]
```

关键点：

* elasticnet 需要 `l1_ratio`（L1 与 L2 的混合比例）
* 非 elasticnet 时，`l1_ratios` 不起作用 → 给 warning
* 把 `l1_ratios_` 统一成“列表形式”（即使不是 elasticnet 也设为 `[None]`），这样后面写循环不用分支太多

---

## D4. 输入数据标准化（非常重要）

```python
X, y = validate_data(
    self, X, y,
    accept_sparse="csr",
    dtype=np.float64,
    order="C",
    accept_large_sparse=solver not in ["liblinear","sag","saga"],
)
check_classification_targets(y)
```

这一步做的是：

* 把 `X` 转成 solver 更喜欢的数据格式：

  * 稀疏矩阵允许，但最好 CSR
  * 统一用 float64（优化算法更稳）
  * 内存布局尽量连续（更快）
* `check_classification_targets(y)`：确保你的 `y` 是分类标签而不是回归值

---

## D5. 标签编码（字符串标签 → 整数标签）

```python
label_encoder = LabelEncoder().fit(y)
y = label_encoder.transform(y)
```

比如：`["cat","dog","cat"] → [0,1,0]`
这样内部计算更方便。

如果你传了 `class_weight` 的字典（用原始标签当 key），它也要同步映射：

```python
if isinstance(class_weight, dict):
    class_weight = { encoded(cls): v for cls,v in class_weight.items() }
```

同时保存原始类别：

```python
self.classes_ = label_encoder.classes_
```

这很关键：预测时还能把数字类别映射回原标签。

---

## D6. multi_class 处理（带弃用 warning）

源码里那一大段 warning 的本质是：

* `multi_class` 正在被 sklearn 弃用（1.5 起），未来统一 multinomial
* 这里为了兼容旧行为，最后会把它变成实际使用的 `multi_class`：

```python
multi_class = _check_multi_class(multi_class, solver, len(classes))
```

初学者记住结论：

> 这一步决定使用 **OvR** 还是 **multinomial** 多分类训练方式。

---

## D7. sag/saga 的加速项 `max_squared_sum`

```python
if solver in ["sag","saga"]:
    max_squared_sum = row_norms(X, squared=True).max()
else:
    max_squared_sum = None
```

仅用于 sag/saga 的收敛/步长计算，属于性能细节。

---

## D8. metadata routing：把参数分给 splitter/scorer

```python
if _routing_enabled():
    routed_params = process_routing(...)
else:
    routed_params = Bunch()
    routed_params.splitter = Bunch(split={})
    routed_params.scorer = Bunch(score=params)
    if sample_weight is not None:
        routed_params.scorer.score["sample_weight"] = sample_weight
```

如果你暂时没用“高级路由”，可以这样理解：

* 这一步把 `sample_weight` 等参数整理好
* 之后 CV 的 `split`、评分的 `score` 都会用到它们

---

## D9. 生成交叉验证 folds

```python
cv = check_cv(self.cv, y, classifier=True)
folds = list(cv.split(X, y, **routed_params.splitter.split))
```

`folds` 是一个列表，每个元素是 `(train_index, test_index)`。

---

## D10. 类别数处理（尤其是二分类）

```python
n_classes = len(encoded_labels)
if n_classes < 2: raise ValueError(...)
if n_classes == 2:
    n_classes = 1
    encoded_labels = encoded_labels[1:]
    classes = classes[1:]
```

为什么二分类要变成 `n_classes=1`？

* 二分类在 OvR 视角下，只需要“拟合一个正类 vs 负类”的模型就够了
* 这样后面循环只跑一次，省一半计算

---

## D11. multinomial 的 “只迭代一次” hack

```python
if multi_class == "multinomial":
    iter_encoded_labels = iter_classes = [None]
else:
    iter_encoded_labels = encoded_labels
    iter_classes = classes
```

* OvR：按类循环（每类一个二分类器）
* multinomial：一次拟合所有类 → 不需要按类循环
  所以给 `[None]`，保持循环结构统一。

---

## D12. class_weight='balanced' 的计算

```python
if class_weight == "balanced":
    class_weight = compute_class_weight(...)
    class_weight = dict(enumerate(class_weight))
```

这是 sklearn 的标准“按类频率反比”权重计算。

---

## D13. 并行跑 `_log_reg_scoring_path`（最核心）

```python
path_func = delayed(_log_reg_scoring_path)

prefer = "threads" if self.solver in ["sag","saga"] else "processes"

fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, prefer=prefer)(
    path_func(
        X, y,
        train, test,
        pos_class=label,
        Cs=self.Cs,
        ...
        l1_ratio=l1_ratio,
        score_params=routed_params.scorer.score,
    )
    for label in iter_encoded_labels
    for train, test in folds
    for l1_ratio in l1_ratios_
)
```

这一段你可以用一句话理解：

> 对每个 **(label 或 multinomial 的 None)** × 每个 **fold** × 每个 **l1_ratio**，
> 计算一条“沿着多个 `C` 值的训练与评分路径”，并把结果收集起来。

### 为什么叫 path？

因为 `_log_reg_scoring_path` 会在一组 `Cs` 上连续训练/评估（通常可以 warm-start），得到：

* 每个 C 对应的系数（coefs）
* 每个 C 对应的评分（scores）
* 迭代次数等

### 为什么 sag/saga 用 threads？

因为 sag/saga 释放 GIL，多线程更划算；其他 solver 用进程并行更常见。

---

# E. 汇总结果：reshape 成你能读懂的结构

并行返回的 `fold_coefs_` 是一个“扁平列表”。
源码接下来做的是把它**恢复成有意义的多维数组**：

```python
coefs_paths, Cs, scores, n_iter_ = zip(*fold_coefs_)
self.Cs_ = Cs[0]
...
self.scores_ = dict(zip(classes, scores))
self.coefs_paths_ = dict(zip(classes, coefs_paths))
```

* `scores_`：每个类对应一个数组，里面是 “每折 × 每个超参组合”的分数
* `coefs_paths_`：每个类对应一个数组，里面是 “每折 × 每个超参组合”的系数路径
* `n_iter_`：记录迭代次数

multinomial 与 ovr 的输出形状不一致，所以你看到很多 `reshape` / `swapaxes`：本质是**把维度对齐**。

---

# F. 选最优超参 + 生成最终 `coef_ / intercept_`

初始化最终输出：

```python
self.C_ = list()
self.l1_ratio_ = list()
self.coef_ = np.empty((n_classes, X.shape[1]))
self.intercept_ = np.zeros(n_classes)
```

然后对每个“要拟合的类”（OvR）或“一次 multinomial”循环：

## F1. `refit=True`：先选最优，再全量重训（最常用）

```python
best_index = scores.sum(axis=0).argmax()
best_index_C = best_index % len(self.Cs_)
C_ = self.Cs_[best_index_C]
best_index_l1 = best_index // len(self.Cs_)
l1_ratio_ = l1_ratios_[best_index_l1]
```

* `scores` 形状：`(n_folds, n_candidates)`
* `scores.sum(axis=0)`：把每个候选参数组合在所有折上汇总（相当于平均表现）
* `argmax()`：取总体最好的那组参数

接着用 CV 得到的系数路径做一个“合理的初始值”（warm start 的思路）：

```python
coef_init = np.mean(...coefs_paths..., axis=...)
```

最后用全量数据重新训练一次，只训练最佳参数那一个点：

```python
w, _, _ = _logistic_regression_path(
    X, y, Cs=[C_], coef=coef_init, l1_ratio=l1_ratio_, ...
)
w = w[0]
```

并写入最终参数：

* OvR：`self.coef_[index] = w[:n_features]`，截距 `w[-1]`
* multinomial：`w` 可能是 `(n_classes, n_features(+1))` 的结构

## F2. `refit=False`：不重训，直接平均各折最优参数

它会对每一折先找最优超参索引，然后把对应系数平均起来作为最终 `w`。
这个模式更像“折内最优的集合平均”，但一般不如 refit 直观。

---

# G. ElasticNet 情况：把扁平候选组合还原成网格

如果使用了 `l1_ratios`，为了让结果更好理解，它把原本扁平排列的：

`[c1,c2,c1,c2,c1,c2] (对应 l1_1,l1_2,l1_3)`
整理成二维网格形式（行 C，列 l1_ratio），因此又做了一轮 reshape + transpose。

---

# H. `score()`：按 scoring 规则打分

```python
scoring = self._get_scorer()
return scoring(self, X, y, **routed_params.scorer.score)
```

* 默认 scoring 是 accuracy
* 你也可以设 `scoring="f1" / "roc_auc" / 自定义 callable`

---

# I. `get_metadata_routing()`：告诉 sklearn 参数如何路由

这段是 sklearn 1.4+ 的机制：

* fit 里某些参数给 splitter.split
* score/fit 里某些参数给 scorer.score

初学者阶段知道“它是工程机制，保证和 Pipeline/元估计器兼容”即可。

---

# J. `__sklearn_tags__()`：声明支持稀疏输入

```python
tags.input_tags.sparse = True
```

告诉 sklearn：这个 estimator 可以吃稀疏矩阵（CSR）。

---

# K. 你训练完最该看哪些属性？

* `classes_`：原始类标签顺序
* `Cs_`：实际使用的 C 网格
* `scores_`：每类在各折、各超参下的分数（调参可视化就看它）
* `C_`：最终选出来的最佳 C（refit=True 时）
* `l1_ratio_`：最终选出来的最佳 l1_ratio（elasticnet 时）
* `coef_ / intercept_`：最终模型参数
* `coefs_paths_`：系数随超参变化的路径（分析正则化影响很有用）
* `n_iter_`：迭代次数

---

## 你接下来最容易卡住的 2 个点（我可以继续深入讲）

1. `_log_reg_scoring_path` 与 `_logistic_regression_path` 的返回值结构到底是什么？为什么能一次算一串 Cs？
2. multinomial vs ovr 的 `reshape/swapaxes` 到底在对齐哪些维度？（我可以用一个“2 折 × 3 个 C × 2 类”的小例子把形状一步步写出来）

你如果希望我继续，我建议下一步从 **`fold_coefs_` 的内容长什么样**开始，把“并行返回结果→reshape→最终属性”这一段彻底讲透。
