### sklearn.svm.SVC

* _class_ sklearn.svm.SVC(_*_, _C=1.0_, _kernel='rbf'_, _degree=3_, _gamma='scale'_, _coef0=0.0_, _shrinking=True_, _probability=False_, _tol=0.001_, _cache_size=200_, _class_weight=None_, _verbose=False_, _max_iter=-1_, _decision_function_shape='ovr'_, _break_ties=False_, _random_state=None_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/f3f51f9b6/sklearn/svm/_classes.py#L525)[¶](https://scikit-learn.org/1.1/modules/generated/sklearn.svm.SVC.html?highlight=svm+svc#sklearn.svm.SVC "Permalink to this definition")

Parameters:

**C**float, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

**kernel**{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’

Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape  `(n_samples,  n_samples)`.

**degree**int, default=3

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

**gamma**{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

-   if  `gamma='scale'`  (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,
    
-   if ‘auto’, uses 1 / n_features.
    

Changed in version 0.22: The default value of  `gamma`  changed from ‘auto’ to ‘scale’.

**coef0**float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

**shrinking**bool, default=True

Whether to use the shrinking heuristic. See the  [User Guide](https://scikit-learn.org/1.1/modules/svm.html#shrinking-svm).

**probability**bool, default=False

Whether to enable probability estimates. This must be enabled prior to calling  `fit`, will slow down that method as it internally uses 5-fold cross-validation, and  `predict_proba`  may be inconsistent with  `predict`. Read more in the  [User Guide](https://scikit-learn.org/1.1/modules/svm.html#scores-probabilities).

**tol**float, default=1e-3

Tolerance for stopping criterion.

**cache_size**float, default=200

Specify the size of the kernel cache (in MB).

**class_weight**dict or ‘balanced’, default=None

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as  `n_samples  /  (n_classes  *  np.bincount(y))`.

**verbose**bool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

**max_iter**int, default=-1

Hard limit on iterations within solver, or -1 for no limit.

**decision_function_shape**{‘ovo’, ‘ovr’}, default=’ovr’

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, note that internally, one-vs-one (‘ovo’) is always used as a multi-class strategy to train models; an ovr matrix is only constructed from the ovo matrix. The parameter is ignored for binary classification.

Changed in version 0.19: decision_function_shape is ‘ovr’ by default.

New in version 0.17: _decision_function_shape=’ovr’_  is recommended.

Changed in version 0.17: Deprecated  _decision_function_shape=’ovo’ and None_.

**break_ties**bool, default=False

If true,  `decision_function_shape='ovr'`, and number of classes > 2,  [predict](https://scikit-learn.org/1.1/glossary.html#term-predict)  will break ties according to the confidence values of  [decision_function](https://scikit-learn.org/1.1/glossary.html#term-decision_function); otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict.

New in version 0.22.

**random_state**int, RandomState instance or None, default=None

Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when  `probability`  is False. Pass an int for reproducible output across multiple function calls. See  [Glossary](https://scikit-learn.org/1.1/glossary.html#term-random_state).

Attributes:

**class_weight_**ndarray of shape (n_classes,)

Multipliers of parameter C for each class. Computed based on the  `class_weight`  parameter.

**classes_**ndarray of shape (n_classes,)

The classes labels.

[`coef_`](https://scikit-learn.org/1.1/modules/generated/sklearn.svm.SVC.html?highlight=svm+svc#sklearn.svm.SVC.coef_ "sklearn.svm.SVC.coef_")ndarray of shape (n_classes * (n_classes - 1) / 2, n_features)

Weights assigned to the features when  `kernel="linear"`.

**dual_coef_**ndarray of shape (n_classes -1, n_SV)

Dual coefficients of the support vector in the decision function (see  [Mathematical formulation](https://scikit-learn.org/1.1/modules/sgd.html#sgd-mathematical-formulation)), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the  [multi-class section of the User Guide](https://scikit-learn.org/1.1/modules/svm.html#svm-multi-class)  for details.

**fit_status_**int

0 if correctly fitted, 1 otherwise (will raise warning)

**intercept_**ndarray of shape (n_classes * (n_classes - 1) / 2,)

Constants in decision function.

**n_features_in_**int

Number of features seen during  [fit](https://scikit-learn.org/1.1/glossary.html#term-fit).

New in version 0.24.

**feature_names_in_**ndarray of shape (`n_features_in_`,)

Names of features seen during  [fit](https://scikit-learn.org/1.1/glossary.html#term-fit). Defined only when  `X`  has feature names that are all strings.

New in version 1.0.

**n_iter_**ndarray of shape (n_classes * (n_classes - 1) // 2,)

Number of iterations run by the optimization routine to fit the model. The shape of this attribute depends on the number of models optimized which in turn depends on the number of classes.

New in version 1.1.

**support_**ndarray of shape (n_SV)

Indices of support vectors.

**support_vectors_**ndarray of shape (n_SV, n_features)

Support vectors.

[`n_support_`](https://scikit-learn.org/1.1/modules/generated/sklearn.svm.SVC.html?highlight=svm+svc#sklearn.svm.SVC.n_support_ "sklearn.svm.SVC.n_support_")ndarray of shape (n_classes,), dtype=int32

Number of support vectors for each class.

[`probA_`](https://scikit-learn.org/1.1/modules/generated/sklearn.svm.SVC.html?highlight=svm+svc#sklearn.svm.SVC.probA_ "sklearn.svm.SVC.probA_")ndarray of shape (n_classes * (n_classes - 1) / 2)

Parameter learned in Platt scaling when  `probability=True`.

[`probB_`](https://scikit-learn.org/1.1/modules/generated/sklearn.svm.SVC.html?highlight=svm+svc#sklearn.svm.SVC.probB_ "sklearn.svm.SVC.probB_")ndarray of shape (n_classes * (n_classes - 1) / 2)

Parameter learned in Platt scaling when  `probability=True`.

**shape_fit_**tuple of int of shape (n_dimensions_of_X,)

Array dimensions of training vector  `X`.

In [30]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

iris = load_iris()
feature= iris.data
label = iris.target

x_train, x_test, y_train, y_test = train_test_split(feature,label,test_size=0.5, random_state=156)

svc_clf = svm.SVC(gamma=0.001, C=100.)
svc_clf.fit(x_train,y_train)
svc_pred = svc_clf.predict(x_test)
svc_accuracy = accuracy_score(y_test, svc_pred)
svc_accuracy




0.96

In [20]:
import pandas as pd


feature_name_df = pd.read_csv("./datasets/human_activity/features.txt", sep="\s+",header=None,
           names=["column_index", 'column_name'])

In [27]:
feature_dup_df = feature_name_df.groupby("column_name").count()
print(feature_dup_df[feature_dup_df["column_index"]>1].count())
feature_dup_df[feature_dup_df["column_index"]>1].head()


def get_new_feature_name_df(old_feature_name_df):
    #column_name으로 중복된 컬럼명에 대해서는 중복 차수 부여, col1, col1과 같이 2개의 중복 컬럼이 있을 경우 1, 2 
    feature_dup_df = pd.DataFrame(data=old_feature_name_df.groupby('column_name').cumcount(), columns=['dup_cnt'])
    # feature_dup_df의 index인 column_name을 reset_index()를 이용하여 컬럼으로 변환. 
    feature_dup_df = feature_dup_df.reset_index()
    # 인자로 받은 features_txt의 컬럼명 DataFrame과 feature_dup_df를 조인. 
    new_feature_name_df = pd.merge(old_feature_name_df.reset_index(), feature_dup_df, how='outer')
    # 새로운 컬럼명은 앞에 중복 차수를 접미어로 결합. 
    new_feature_name_df['column_name'] = new_feature_name_df[['column_name', 'dup_cnt']].apply(lambda x : x[0]+'_'+str(x[1]) 
                                                                                           if x[1] >0 else x[0] ,  axis=1)
    new_feature_name_df = new_feature_name_df.drop(['index'], axis=1)
    return new_feature_name_df

def get_human_dataset( ):
    
    # 각 데이터 파일들은 공백으로 분리되어 있으므로 read_csv에서 공백 문자를 sep으로 할당.
    feature_name_df = pd.read_csv('./datasets/human_activity/features.txt',sep='\s+',
                        header=None,names=['column_index','column_name'])
    
    # 중복된 feature명을 새롭게 수정하는 get_new_feature_name_df()를 이용하여 새로운 feature명 DataFrame생성. 
    new_feature_name_df = get_new_feature_name_df(feature_name_df)
    
    # DataFrame에 피처명을 컬럼으로 부여하기 위해 리스트 객체로 다시 변환
    feature_name = new_feature_name_df.iloc[:, 1].values.tolist()
    
    # 학습 피처 데이터 셋과 테스트 피처 데이터을 DataFrame으로 로딩. 컬럼명은 feature_name 적용
    X_train = pd.read_csv('./datasets/human_activity/train/X_train.txt',sep='\s+', names=feature_name )
    X_test = pd.read_csv('./datasets/human_activity/test/X_test.txt',sep='\s+', names=feature_name)
    
    # 학습 레이블과 테스트 레이블 데이터을 DataFrame으로 로딩하고 컬럼명은 action으로 부여
    y_train = pd.read_csv('./datasets/human_activity/train/y_train.txt',sep='\s+',header=None,names=['action'])
    y_test = pd.read_csv('./datasets/human_activity/test/y_test.txt',sep='\s+',header=None,names=['action'])
    
    # 로드된 학습/테스트용 DataFrame을 모두 반환 
    return X_train, X_test, y_train, y_test


column_index    42
dtype: int64


In [28]:
XX_train, XX_test, yy_train, yy_test = get_human_dataset( )

In [29]:
svc_clf.fit(XX_train, yy_train)
h_svc_pred = svc_clf.predict(XX_test)
h_svc_accuracy = accuracy_score(yy_test,h_svc_pred)
h_svc_accuracy

  y = column_or_1d(y, warn=True)


0.9616559212758737

In [54]:
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

params = {"max_depth" : [6,8,10,12,16,20,24]}

dt_clf = DecisionTreeClassifier(random_state=156)
grid_cv = GridSearchCV(dt_clf, param_grid=params,scoring="accuracy", cv=5,verbose=2)
grid_cv.fit(XX_train,yy_train)
grid_cv_pred = grid_cv.predict(XX_test)



Fitting 5 folds for each of 7 candidates, totalling 35 fits
[CV] END ........................................max_depth=6; total time=   1.6s
[CV] END ........................................max_depth=6; total time=   1.6s
[CV] END ........................................max_depth=6; total time=   1.5s
[CV] END ........................................max_depth=6; total time=   1.5s
[CV] END ........................................max_depth=6; total time=   1.5s
[CV] END ........................................max_depth=8; total time=   2.0s
[CV] END ........................................max_depth=8; total time=   1.9s
[CV] END ........................................max_depth=8; total time=   1.9s
[CV] END ........................................max_depth=8; total time=   1.9s
[CV] END ........................................max_depth=8; total time=   2.1s
[CV] END .......................................max_depth=10; total time=   2.4s
[CV] END .......................................m

In [53]:
best_dt = grid_cv.best_estimator_
best_dt

In [64]:
importance_feature = best_dt.feature_importances_
importance_feature = pd.Series(importance_feature,index=XX_train.columns)
impo_feature_top10 = importance_feature.sort_values(ascending=False)[:10]
impo_feature_top10

tGravityAcc-min()-X                0.240128
fBodyAccJerk-bandsEnergy()-1,16    0.201486
angle(Y,gravityMean)               0.133057
fBodyAccMag-energy()               0.109450
tGravityAcc-arCoeff()-Z,2          0.096247
fBodyGyro-maxInds-X                0.022719
tGravityAcc-energy()-Y             0.016840
tBodyGyro-correlation()-Y,Z        0.015651
tBodyAccMag-arCoeff()1             0.015083
tBodyGyro-max()-X                  0.008671
dtype: float64

In [65]:
index=impo_feature_top10.index

In [66]:
data=XX_train.loc[:,index]
test=XX_test.loc[:,index]

In [67]:
svc_clf.fit(data,yy_train)
svc_grid_pred = svc_clf.predict(test)
svc_grie_accuracy = accuracy_score(yy_test,svc_grid_pred)
svc_grie_accuracy

  y = column_or_1d(y, warn=True)


0.8642687478791992