# Here we provide some documentation for important classes and methods in `drfsc`

The main methods for the DRFSC class that users may interact with are: `set_rfsc_params`,
`load_data`, `fit`, `predict`, `predict_proba`, `score`, `feature_importance`, `pos_neg_prediction`, `single_prediction`.

In [1]:
from src.drfsc import *

In [2]:
help(DRFSC.set_rfsc_params)

Help on function set_rfsc_params in module src.drfsc:

set_rfsc_params(self, params: dict)
    Setter for RFSC parameters. Updates the RFSC parameters with the given dictionary. Dictionary must be in the form of {parameter_name: parameter_value}.
    
    Parameters
    ----------
    n_models : int 
        Number of models generated per iteration. Default=300.
    n_iters : int 
        Number of iterations. Default=150.
    tuning : float 
        Learning rate that dictates the speed of regressor inclusion probability (rip) convergence. Smaller values -> slower convergence. Default=50.
    tol : float 
        Tolerance condition. Default=0.002.
    alpha : float 
        Significance level for model pruning. Default=0.99.
    rip_cutoff : float 
        Determines rip threshold for feature inclusion in final model. Default=1.
    metric : str
        Optimization metric. Default='roc_auc'. Options: 'acc', 'roc_auc', 'weighted', 'avg_prec', 'f1', 'auprc'.
    verbose : bool 
      

In [3]:
help(DRFSC.load_data)

Help on function load_data in module src.drfsc:

load_data(self, X_train: numpy.ndarray, X_val: numpy.ndarray, Y_train: numpy.ndarray, Y_val: numpy.ndarray, X_test: numpy.ndarray = None, Y_test: numpy.ndarray = None, polynomial: int = 1, preprocess: bool = True)
    Preprocesses the data in the required way for the DRFSC model. Can be used to load data into the model if it has not been loaded yet. Scales the data to [0,1] and creates polynomial expansion based on the passed 'polynomial' parameter. 
    
    Parameters
    ----------
    X_train : np.ndarray or pd.DataFrame 
        Train set data
    X_val : np.ndarray or pd.DataFrame
        Validation set data
    Y_train : np.ndarray or pd.DataFrame
        Train set labels
    Y_val : np.ndarray or pd.DataFrame
        Validation set labels
    X_test : np.ndarray or pd.DataFrame, optional
        Test set data. Only required if postprocessing is required. Defaults to None.
    Y_test : np.ndarray or pd.DataFrame optional
        T

In [4]:
help(DRFSC.fit)

Help on function fit in module src.drfsc:

fit(self, X_train: numpy.ndarray, X_val: numpy.ndarray, Y_train: numpy.ndarray, Y_val: numpy.ndarray)
    The main function for fitting the model. Returns the a single final model if output == 'single', else returns a model ensemble based on the number of horizontal partitions (n_hbins).
    
    Parameters
    ----------
    X_train : np.ndarray or pd.DataFrame 
        Train set data
    X_val : np.ndarray or pd.DataFrame
        Validation set data
    Y_train : np.ndarray or pd.DataFrame
        Train set labels
    Y_val : np.ndarray or pd.DataFrame
        Validation set labels



In [5]:
help(DRFSC.predict)

Help on function predict in module src.drfsc:

predict(self, X_test)
    Uses the best model to predict on the test set
    
    Parameters
    ----------
    X_test : np.ndarray 
        Test set data
    
    Returns
    ----------
    np.ndarray containing the predicted labels



In [6]:
help(DRFSC.predict_proba)

Help on function predict_proba in module src.drfsc:

predict_proba(self, X_test: numpy.ndarray)
    Uses the best model to predict on the test set, returns labels 
    
    Parameters
    ----------
    X_test : np.ndarray 
        Test set data
        
    Returns
    -------
    np.ndarray containing the predicted probabilities



In [7]:
help(DRFSC.score)

Help on function score in module src.drfsc:

score(self, X_test: numpy.ndarray, Y_test: numpy.ndarray, metric: str = None)
    Used to evaluate the final model on the test set.
    
    Parameters
    ----------
    X_test : np.ndarray or pd.DataFrame
        Test set data
    Y_test : np.ndarray or pd.DataFrame or pd.Series
        Test set labels
    metric : str, optional
        Metric to use for evaluation. By default uses the metric specified in the constructor. Other options: ('acc', 'roc_auc', 'weighted', 'avg_prec', 'f1', 'auprc').
    
    Returns
    ----------
    evaluation : dict
        returns the score of the model based on the metric specified.



In [8]:
help(DRFSC.feature_importance)

Help on function feature_importance in module src.drfsc:

feature_importance(self)
    Creates a bar plot of the features of the model and their contribution to the final prediction.
    
    Returns
    -------
    figure : matplotlib figure
        hisogram of feature coefficients for features of the final model.



In [9]:
help(DRFSC.pos_neg_prediction)

Help on function pos_neg_prediction in module src.drfsc:

pos_neg_prediction(self, data_index: int = 0, X_test: numpy.ndarray = None)
    Creates a plot of the positive and negative parts of the prediction.
    
    Parameters
    ----------
    data_index : int
        Index of the data observation to be plotted. If X_test is not provided, then the index is relative to the provided training/validation data. If X_test is provided, then the index is relative to the provided test data. Default is 0.
    X_test : np.array or pd.DataFrame
        Test data to be used for prediction. If provided, then the index is relative to the provided test data. Default is None.
    
    Returns
    -------
    figure : matplotlib figure
        figure shows, for a given sample (indexed by data_index), the positive and negative parts of the prediction. That is, it takes the value of the coefficients and multiplies them by the feature values. The positive and negative parts of the prediction are then plo

In [10]:
help(DRFSC.single_prediction)

Help on function single_prediction in module src.drfsc:

single_prediction(self, data_index: int = 0, X_test: numpy.ndarray = None)
    Creates a plot of the single prediction of the final model
    
    Parameters
    ----------
    data_index : int
        Index of the data observation to be plotted. If X_test is not provided, then the index is relative to the provided training/validation data. If X_test is provided, then the index is relative to the provided test data. Default is 0.
    X_test : np.array or pd.DataFrame
        Test data to be used for prediction. If provided, then the index is relative to the provided test data. Default is None.
        
    Returns
    -------
    figure : matplotlib figure
        Shows for a given sample (indexed by data_index) the coefficients of the final model, weighted by the feature values for the indexed observation.

