# Modeling

Here we will have some sample codes and links with respect to modeling section.


## Modeling Bigger Datasets 

1. [FTRL Implementation](https://www.kaggle.com/jiweiliu/ftrl-starter-code/code)
2. [LibFFM](https://github.com/guestwalk/libffm)
3. [Voapal Wabbit](https://github.com/JohnLangford/vowpal_wabbit/wiki)
4. [Incremental Learning](http://scikit-learn.org/stable/modules/scaling_strategies.html#incremental-learning)

## Time Series Forecasting

1. [R Tutorial](https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/)

2. [Python Tutorial](https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/)


## Bayesian Optimization

Some python libraries are

1. [Hyperopt](http://hyperopt.github.io/hyperopt/)

2. [Spearmint](https://github.com/JasperSnoek/spearmint)

3. [Bayesian Optimization](https://github.com/fmfn/BayesianOptimization) 

Example code can be seen in this [Kaggle Kernel](https://www.kaggle.com/dreeux/hyperparameter-tuning-using-hyperopt)

### Random Forest ###

In [5]:
def runRF(train_X, train_y, test_X, test_y=None, test_X2=None, depth=20, leaf=10, feat=0.2):
    model = ensemble.RandomForestClassifier(
            n_estimators = 1000,
                    max_depth = depth,
                    min_samples_split = 2,
                    min_samples_leaf = leaf,
                    max_features =  feat,
                    n_jobs = 4,
                    random_state = 0)
    model.fit(train_X, train_y)
    train_preds = model.predict_proba(train_X)[:,1]
    test_preds = model.predict_proba(test_X)[:,1]
    test_preds2 = model.predict_proba(test_X2)[:,1]
    test_loss = 0
    
    train_loss = metrics.log_loss(train_y, train_preds)
    test_loss = metrics.log_loss(test_y, test_preds)
    print "Train and Test loss : ", train_loss, test_loss
    return test_preds, test_loss, test_preds2

### XGBoost / Light GBM

In [3]:
def runXGB(train_X, train_y, test_X, test_y=None, test_X2=None, seed_val=0, rounds=500, dep=8, eta=0.05):
    params = {}
    params["objective"] = "binary:logistic"
    params['eval_metric'] = 'auc'
    params["eta"] = eta
    params["subsample"] = 0.7
    params["min_child_weight"] = 1
    params["colsample_bytree"] = 0.7
    params["max_depth"] = dep
    params["silent"] = 1
    params["seed"] = seed_val
    #params["max_delta_step"] = 2
    #params["gamma"] = 0.5
    num_rounds = rounds

    plst = list(params.items())
    xgtrain = xgb.DMatrix(train_X, label=train_y)

    xgtest = xgb.DMatrix(test_X, label=test_y)
    watchlist = [ (xgtrain,'train'), (xgtest, 'test') ]
    model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=100, verbose_eval=20)


    pred_test_y = model.predict(xgtest, ntree_limit=model.best_ntree_limit)
    pred_test_y2 = model.predict(xgb.DMatrix(test_X2), ntree_limit=model.best_ntree_limit)
    
    loss = metrics.roc_auc_score(test_y, pred_test_y)
    return pred_test_y, loss, pred_test_y2

### Neural Networks / Deep Learning

In [4]:
def runNN(train_X, train_y, test_X, test_y=None, test_X2=None, epochs=100, scale=False):
    if scale:
        sc = preprocessing.StandardScaler()
        all_X = pd.concat([train_X, test_X, test_X2], axis=0)
        sc.fit(all_X)
        train_X = sc.transform(train_X)
        test_X = sc.transform(test_X)
        test_X2 = sc.transform(test_X2)

    random.seed(12345)
    np.random.seed(12345)
    model = Sequential()
    model.add(Dense(200, input_shape=(train_X.shape[1],), init='he_uniform')) #, W_regularizer=regularizers.l1(0.002)))
    model.add(Activation('relu'))
    model.add(Dropout(0.3))

    #model.add(Dense(50, init='he_uniform'))
    #model.add(Activation('relu'))
    #model.add(Dropout(0.3))

    #model.add(Dense(100, init='he_uniform'))
    #model.add(Activation('relu'))
    #model.add(Dropout(0.3))

    model.add(Dense(1, init='he_uniform'))
    model.add(Activation('sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adagrad')
    
    ### Model fitting takes place ###
    model.fit(train_X, train_y, batch_size=512, nb_epoch=epochs, validation_data=(test_X, test_y), verbose=2, shuffle=True)
    
    preds = model.predict(test_X, verbose=0)
    preds_test2 = model.predict(test_X2, verbose=0)
    loss = metrics.log_loss(test_y, preds)
    return preds.ravel(), loss, preds_test2.ravel()

## Ensembling

Codes for basic ensembling methods can be seen in this [github link by MLWave](https://github.com/MLWave/Kaggle-Ensemble-Guide)

## Stacking 

1. [StackNet](https://github.com/kaz-Anova/StackNet) by Marios KazAnova
2. [Stacked Ensembles](https://h2o-release.s3.amazonaws.com/h2o/rel-ueno/2/docs-website/h2o-docs/data-science/stacked-ensembles.html) by H2O