Skip to content

nicoloverardo/matrix_regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Matrix Regression

CodeFactor codecov PyPI PyPI - Python Version PyPI - Downloads GitHub Code style: black

Buy Me A Coffee

Table of contents:

  1. Description
  2. Installation
  3. Usage

Description

Implementation of the MatrixRegression (MR) algorithm for multi-label text classification that can be used in an online learning context. It is presented in the following paper:

Popa, I. & Zeitouni, Karine & Gardarin, Georges & Nakache, Didier & Métais, Elisabeth. (2007). Text Categorization for Multi-label Documents and Many Categories. 421 - 426. 10.1109/CBMS.2007.108.

Abstract:

In this paper, we propose a new classification method that addresses classification in multiple categories of textual documents. We call it Matrix Regression (MR) due to its resemblance to regression in a high dimensional space. Experiences on a medical corpus of hospital records to be classified by ICD (International Classification of Diseases) code demonstrate the validity of the MR approach. We compared MR with three frequently used algorithms in text categorization that are k-Nearest Neighbors, Centroide and Support Vector Machine. The experimental results show that our method outperforms them in both precision and time of classification.

Installation

Via PyPi using pip, as easy as:

pip install matrixreg

Usage

from matrixregr.matrixregression import MatrixRegression

mr = MatrixRegression()

# Fit
mr.fit(X_train, y_train)

# Predict
mr.predict(X_test)

# Partial fit
mr.partial_fit(new_X, new_y)

Parameters optimization

This implementation is scikit-friendly; thus, it supports GridSearchCV

# Parameter to optimize
param_grid = [{"threshold": [0.3, 0.6, 0.9]}]

# Initialization
mr = MatrixRegression()
clf = GridSearchCV(mr, param_grid, cv = 5, verbose=10, n_jobs=-1, scoring='f1_micro')

# Fit
clf.fit(X_train, y_train)

# Results
clf.best_params_, clf.best_score_

About

Multi-label text-classification algorithm from Popa, Zeitouni & Gardarin

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages