## MODELING  
1. Regression model as a base line.  
2. Stacking a (good) classification model and regression model

### Regression Model


<img src="Final_Presentation_Images/reg_only.png">

### Stacking Model: classfier + regressor

A default classification model
<img src="Final_Presentation_Images/clf_only.png">

#### Custom Estimator 1: Stacking Regressor  (the best) 
Step1: fit a classifier with X_train,   
Step2: classifier's prediction, as a __new feature__, was appended to X_train -> X_train_new    
Step3: fit a regressor with __X_train_new__      
Prediction: regressor prediction, and convert all the negative values to zero's  

#### Custom Estimator 2: TrustClassfierRegressor 
Step1: fit a classifier with X_train   
Step2: fit a regressor with X_train  
Prediction: classfier prediction * regressor predicion

#### Custom Estimator 3: TrustClassfierRegressor_v2
Step1: fit a classifier with X_train  
Step2: fit a regressor with X_train where transaction_revenue>0   
Prediction: classfier prediction * regressor predicion

#### Searching Space:
- __stacking estimator__: 1 v.s 2 v.s 3
- __classification algorithm__:
    - logistic regression
        - penalty score
    - SVC
        - kernels
    - random forest classification
        - n_estimators
- __regression algorithm__:
    - linear regression
    -  random forest regressor
- __resampling training set__:
    - no resampling    
    - downsampled majority class : minority = 1:1  
    - upsampled minority class : majority = 1:1  
    

#### Some Good Stacking Models

<img src="Final_Presentation_Images/stacking.png">

__Best Parameters:__   
- No resampling  
- Stacking Regressor with   
    - BaggingClassifier (of SVC)  
    - RandomForestRegressor


In [None]:
class StackedRegressor(BaseEstimator, ClassifierMixin):  
    def __init__(self, classifier, regressor):
        self.classifier = classifier
        self.regressor = regressor
        
    def fit(self, X, y):
        class_labels = pd.Series(np.where(y>0,1,0))
        
        self.classifier.fit(X,class_labels)

        pred_class_labels = self.classifier.predict(X)
        pred_class_labels_df = pd.DataFrame(
            pred_class_labels, columns = ['pred_class_label'])
        
        X = X.reset_index(drop=True)
        pred_class_labels_df = pred_class_labels_df.reset_index(drop=True)
        X = X.join(pred_class_labels_df)
        self.regressor.fit(X,y)

        print(self.classifier.__class__.__name__, ",", 
              self.regressor.__class__.__name__)
        
    def predict(self, X):
        
        class_predict = self.classifier.predict(X)
        class_predict_df = pd.DataFrame(
             class_predict, columns = ['pred_class_label'])
        X = X.reset_index(drop=True)
        class_predict_df = class_predict_df.reset_index(drop=True)
        X = X.join(class_predict_df)
        regressor_predict = self.regressor.predict(X)
        regressor_predict = np.where(regressor_predict<0,0,regressor_predict)
        
        return regressor_predict
    
    def score(self, X, y):
        return np.sqrt(np.mean((y - self.predict(X))**2))
    
    def clf_score(self, X, y):
        y_true = pd.Series(np.where(y>0,1,0))
        y_pred = self.classifier.predict(X)
        return precision_recall_fscore_support(y_true, y_pred, 
                                               average='macro')

In [None]:
# classifier
best_classifier = BaggingClassifier(
    base_estimator=SVC(tol=0.01, kernel = 'poly', verbose=False),
    bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=0.01, n_estimators=100)
# regressor
best_regressor = RandomForestRegressor(
    n_estimators = 100, 
    min_samples_leaf = 15
)

#### Feature Importance: classification label works!
<img src="Final_Presentation_Images/feature_importance_stacking.png">

### Boosting
(add somthing??)

### All Models
<img src="Final_Presentation_Images/all_models.png">

## Summary