# CatBoost
> CatBoost 모델 이용

- toc: true 
- badges: true
- comments: true
- categories: [Python, Analysis, Model]
- image: images/chart-preview.png

# CatBoost 특징

## Great quality without parameter tuning
- Reduce time spent on parameter tuning, because CatBoost provides great results with default parameters

## Categorical features support
- Improve your training results with CatBoost that allows you to use non-numeric factors, instead of having to pre-process your data or spend time and effort turning it to numbers.

## Fast and scalable GPU version
- Train your model on a fast implementation of gradient-boosting algorithm for GPU. Use a multi-card configuration for large datasets.

## Improved accuracy
- Reduce overfitting when constructing your models with a novel gradient-boosting scheme.

## Fast prediction
- Apply your trained model quickly and efficiently even to latency-critical tasks using CatBoost's model applier









# 모델 Feature 중요도 탐색

In short, you can do something like

```python
pd.DataFrame({'feature_importance': model.get_feature_importance(train_pool), 'feature_names': x_val.columns}).sort_values(by=['feature_importance'], ascending=False)
```

you can also make a function like

```python
def plot_feature_importance(importance,names,model_type):
    
    #Create arrays from feature importance and feature names
    feature_importance = np.array(importance)
    feature_names = np.array(names)
    
    #Create a DataFrame using a Dictionary
    data={'feature_names':feature_names,'feature_importance':feature_importance}
    fi_df = pd.DataFrame(data)
    
    #Sort the DataFrame in order decreasing feature importance
    fi_df.sort_values(by=['feature_importance'], ascending=False,inplace=True))
    
    #Define size of bar plot
    plt.figure(figsize=(10,8))
    #Plot Searborn bar chart
    sns.barplot(x=fi_df['feature_importance'], y=fi_df['feature_names'])
    #Add chart labels
    plt.title(model_type + 'FEATURE IMPORTANCE')
    plt.xlabel('FEATURE IMPORTANCE')
    plt.ylabel('FEATURE NAMES')
```

and plot the feature importance from different boosting algorithm

```python
#plot the xgboost result
plot_feature_importance(xgb_model.feature_importances_,train.columns,'XG BOOST')

#plot the catboost result
plot_feature_importance(cb_model.get_feature_importance(),train.columns,'CATBOOST')
```