# Bayesian Regularized Iterative Soft Thresholding Algorithm

### How to Run

The BARISTA algorithm is a class-specific attribute weighted Naive Bayes framework designed to mitigate overfitting and alleviate the conditional independence assumption of Naive Bayes. This is a brief tutorial on how to run the model.

**Data Importing**

Please upload your data from a csv file into a pandas dataframe.

In [29]:
import pandas as pd
breast_w = pd.read_csv('/filepath/breast_w.csv')

**Pre-processing**

Missing values are imputed with the mean value or max frequency depending on the attribute type. Numerical attributes are discretized using the MDL discretization technique (see our paper for details). Pass in the dataset, the target attribute column name as a string, and a list of attribute column names that are numerical; in this case there are none). Next, use the get_data function to get a design matrix, $X$ and a vector of labels, $y$. Finally, we split the data in a testing set and a training set to evaluate generalization performance.

In [30]:
import preprocess
breast_w = preprocess.Preprocess(breast_w, "Class", [])
X, y = breast_w.get_data()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
X_train = X_train.reset_index(drop = True)
y_train = y_train.reset_index(drop = True)
X_test = X_test.reset_index(drop = True)
y_test = y_test.reset_index(drop = True)

**Fitting the Data**

We can now fit the training data using the BARISTA algorithm. Once the fit method is used, the optimization procedure will be used. For parameter details, see the readMe file.

In [None]:
import BARISTA

barista = BARISTA.BARISTA()
barista.fit(training_samples = X_train, training_labels = y_train, scheme = 'FISTA', 
                    learning_rate = 0.1, convergence_constant= 1e-6, max_iterations = 5000, 
                    l1_penalty = 0.01, l2_penalty = 0.001)

**Classification Performance**

Now that the data has been fit, we can classify the testing instances. We also provide an accuracy score.

In [33]:
barista.predict(X_test, y_test)
print("Testing Accuracy:", barista.accuracy)

Testing Accuracy: 0.9657142857142857
