Interactive Feature Selection (IFS)

Interactive feature selection is a web application that involves the user in the feature selection preprocessing step in the machine learning process. Common automated feature selection algorithms search the space of feature subsets by optimizing a criterion function. Automated algorithms lack transparency and may output feature sets that are predictive of the training data but not of the underlying data generating mechanisms.

IFS incorporates the human expert who understands the classification problem domain and the semantics of the input data. The IFS workflow includes:

the user visually ranks the features by importance and expresses causal relationships among the features.
The application visualizes the user's inputs, provides analytics, and visualizes the dataset using parallel coordinates.
The user interactively selects a feature subset to build a model.
The user uses the visualization of the traing results to iteratively select feature subsets and generate high performing and more transparent models.

Accessing

To see access the web application with a pre-loaded dataset go to http://35.192.120.176:8889/demo

Tutorials

Components

Feature Importance

Rank features by importances by creating groupings of features with relatively the same level of importance. The inner groups are more important. The features outside of the groups are not relevant to the classification problem.

Causal Graph

Pycausal library is utilized to initialize the causal graph using the input data. User then add, remove, reverse edges to reflect their knowledge of the causal relationship among features.

Data Visualization and Feature Selection

Visualize the high dimensional dataset using parallel coordinates. The parallel coordinates are features. Features can be moved and the visualization dynamically updates. A boundary coordinate is used to indicate the selected feature set - features on the left of the boundary are selected and those on the right are not. After features are selected, a classifier is generated.

Performance Analysis

Confusion matrix, ROC curve, and statistic metrics are shown to help the user analyze the classifer and decide how to improve the classifier in the next iteration. User can go to the feature selection step to select a new set of features to create another model. User can compare the performance of the models.

Dependencies

Pycausal

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
data_processing		data_processing
ifs		ifs
images		images
log_data		log_data
oab_data		oab_data
static		static
templates		templates
verA_data		verA_data
README.md		README.md
data14619.txt		data14619.txt
data2123.txt		data2123.txt
data7727.txt		data7727.txt
data7985.txt		data7985.txt
eric.txt		eric.txt
ifs.py		ifs.py
kasey.txt		kasey.txt

helenzhao093/interactive-feature-selection

Folders and files

Latest commit

History

Repository files navigation

Interactive Feature Selection (IFS)

Accessing

Tutorials

Components

Feature Importance

Causal Graph

Data Visualization and Feature Selection

Performance Analysis

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Languages