Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f8eab9a
Add support for empty levels
mirand863 Jun 4, 2022
8d30921
Update pypi download stats
mirand863 Jun 5, 2022
353c743
Add test for empty levels to local classifier per node
mirand863 Jun 6, 2022
8c1c776
Update README.md
mirand863 Jun 6, 2022
ebc5e06
Replace for loops with list comprehension
mirand863 Jun 7, 2022
3ec6c4c
Merge branch 'main' into empty
mirand863 Jun 15, 2022
433ef86
Add docstring to _make_leveled function
mirand863 Jun 15, 2022
0eac5cb
Filter empty leaves
mirand863 Jun 15, 2022
3924453
Fix some tests
mirand863 Jun 15, 2022
9d962ae
Fix last failing test
mirand863 Jun 15, 2022
90686c6
Update README.md
mirand863 Jun 16, 2022
7ec06af
Update README.md
mirand863 Jun 16, 2022
dd5556b
Update README.md
mirand863 Jun 16, 2022
2873861
Update README.md
mirand863 Jun 16, 2022
f829e79
Update README.md
mirand863 Jun 16, 2022
3a53410
Update README.md
mirand863 Jun 16, 2022
20afce3
Update README.md
mirand863 Jun 16, 2022
c8cdc8f
Merge branch 'main' into empty
mirand863 Jun 17, 2022
5dafba0
Refactor create digraph method
mirand863 Jun 24, 2022
42a7607
Remove comments
mirand863 Jun 24, 2022
203aece
Merge branch 'main' into empty
mirand863 Jun 24, 2022
a153cfe
Fix bug
mirand863 Jun 24, 2022
5fd0253
Simplify example
mirand863 Jun 24, 2022
b5d1cd0
Add n_jobs to introduction
mirand863 Jun 24, 2022
13c11e9
Remove comma
mirand863 Jun 24, 2022
4affeb7
Expand table of contents
mirand863 Jun 24, 2022
31abb68
Add example for empty levels
mirand863 Jun 24, 2022
b060683
Update tests to cover 1 column only
mirand863 Jun 24, 2022
1757869
Update tests to cover 1 column only
mirand863 Jun 24, 2022
2344831
Fix lcpn and lcppn
mirand863 Jun 24, 2022
11f515b
Update example
mirand863 Jun 24, 2022
df8f451
Update example
mirand863 Jun 24, 2022
7763993
Remove addopts
mirand863 Jun 28, 2022
c9621b7
Add pytest options
mirand863 Jun 28, 2022
fe3f059
Refactor _fit_digraph
mirand863 Jun 28, 2022
840b21e
Refactor predict method for local classifier per parent node
mirand863 Jun 28, 2022
4f2e695
Enforce finding successors only on previous level
mirand863 Jun 28, 2022
67d03c0
Remove redundant return statement
mirand863 Jun 28, 2022
223c139
Update comment
mirand863 Jun 28, 2022
4de0dbd
Refactor predict() method for local classifier per level
mirand863 Jun 28, 2022
0daedf9
Refactor variable
mirand863 Jun 28, 2022
e5e49c1
Refactor repeated code
mirand863 Jun 28, 2022
468a775
Remove useless variables
mirand863 Jun 28, 2022
58babf3
Add examples
mirand863 Jun 29, 2022
aa294dc
Add comments
mirand863 Jun 29, 2022
8ef74d1
Remove useless variables
mirand863 Jun 29, 2022
342f69d
Apply black
mirand863 Jun 29, 2022
ee39612
More refactoring
mirand863 Jun 29, 2022
d960ef6
Add test for _fit_classifier
mirand863 Jun 29, 2022
e3f9ed6
Add black linting
mirand863 Jun 29, 2022
5d3f0fd
Move black linting
mirand863 Jun 29, 2022
c4d1732
Apply black
mirand863 Jun 29, 2022
da220fc
Add black badges
mirand863 Jun 29, 2022
e4d0f45
Merge branch 'main' into empty
mirand863 Jun 29, 2022
825009d
Add support for empty levels to hierarchical metrics
mirand863 Jun 29, 2022
7f0b877
Apply black style
mirand863 Jun 29, 2022
80d516d
Refactor method make_leveled to make it public
mirand863 Jun 29, 2022
793df1e
Update README.md
mirand863 Jun 29, 2022
63aac24
Update metrics.py
mirand863 Jun 30, 2022
df86c21
Attempt to fix make_levelled function
mirand863 Jun 30, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/deploy-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
python -m pip install .
- name: Test with pytest
run: |
pytest -v
pytest -v --flake8 --pydocstyle --cov=hiclass --cov-fail-under=90 --cov-report html
coverage xml
- name: Upload Coverage to Codecov
if: matrix.os == 'ubuntu-latest'
Expand Down
9 changes: 7 additions & 2 deletions .github/workflows/test-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ on:
- main

jobs:
build:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: psf/black@stable
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand All @@ -29,4 +34,4 @@ jobs:
python -m pip install .
- name: Test with pytest
run: |
pytest -v
pytest -v --flake8 --pydocstyle --cov=hiclass --cov-fail-under=90 --cov-report html
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

HiClass is an open-source Python library for hierarchical classification compatible with scikit-learn.

[![Deploy PyPI](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/mirand863/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/mirand863/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![Deploy PyPI](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/mirand863/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/mirand863/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

✨ Here is a **demo** that shows HiClass in action on hierarchical data:

Expand All @@ -16,7 +16,7 @@ HiClass is an open-source Python library for hierarchical classification compati
- [Who is using HiClass?](#who-is-using-hiclass)
- [Install](#install)
- [Quick start](#quick-start)
- [Step-by-step- walk-through](#step-by-step-walk-through)
- [Step-by-step walk-through](#step-by-step-walk-through)
- [API documentation](#api-documentation)
- [FAQ](#faq)
- [Support](#support)
Expand All @@ -34,7 +34,7 @@ HiClass is an open-source Python library for hierarchical classification compati
- **Hierarchical metrics:** HiClass supports the computation of hierarchical precision, recall and f-score, which are more appropriate for hierarchical data than traditional metrics.
- **Compatible with pickle:** Easily store trained models on disk for future use.

**Don't see a feature on this list?** Search our [issue tracker](https://github.com/mirand863/hiclass/issues) if someone has already requested it and add a comment to it explaining your use-case, or open a new issue if not. We prioritize our roadmap based on user feedback, so we'd love to hear from you.
**Any feature missing on this list?** Search our [issue tracker](https://github.com/mirand863/hiclass/issues) to see if someone has already requested it and add a comment to it explaining your use-case. Otherwise, please open a new issue describing the requested feature and possible use-case scenario. We prioritize our roadmap based on user feedback, so we would love to hear from you.

## Benchmarks

Expand Down Expand Up @@ -85,7 +85,7 @@ We would love to benchmark with larger datasets, if we can find them in the publ

Here is our public roadmap: https://github.com/mirand863/hiclass/projects/1.

We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you'd like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.
We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you would like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.


## Who is using HiClass?
Expand Down Expand Up @@ -123,7 +123,7 @@ Here's a quick example showcasing how you can train and predict using a local cl
from hiclass import LocalClassifierPerNode
from sklearn.ensemble import RandomForestClassifier

# define data
# Define data
X_train = [[1], [2], [3], [4]]
X_test = [[4], [3], [2], [1]]
Y_train = [
Expand Down Expand Up @@ -152,7 +152,7 @@ from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# define data
# Define data
X_train = [
'Struggling to repay loan',
'Unable to get annual report',
Expand Down Expand Up @@ -220,7 +220,9 @@ Please reach out to fabio.malchermiranda@hpi.de.

## Contributing

We are a small team on a mission to democratize hierarchical classification, and we'll take all the help we can get! If you'd like to get involved, here's information on [contribution guidelines and how to test the code locally](https://github.com/mirand863/hiclass/blob/main/CONTRIBUTING.md).
We are a small team on a mission to democratize hierarchical classification, and we will take all the help we can get! If you would like to get involved, here is information on [contribution guidelines and how to test the code locally](https://github.com/mirand863/hiclass/blob/main/CONTRIBUTING.md).

You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.

## Getting the latest updates

Expand Down
5 changes: 4 additions & 1 deletion docs/examples/README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Gallery of Examples
===================

These examples illustrate the main features of HiClass.
These examples illustrate the main features of HiClass.

.. toctree::
:hidden:
38 changes: 38 additions & 0 deletions docs/examples/plot_empty_levels.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# -*- coding: utf-8 -*-
"""
==========================
Different Number of Levels
==========================

HiClass supports different number of levels in the hierarchy.
For this example, we will train a local classifier per node
with a hierarchy similar to the following image:

.. figure:: ../algorithms/local_classifier_per_node.svg
:align: center
"""
from sklearn.linear_model import LogisticRegression

from hiclass import LocalClassifierPerNode

# Define data
X_train = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
X_test = [[9, 10], [7, 8], [5, 6], [3, 4], [1, 2]]
Y_train = [
["Bird"],
["Reptile", "Snake"],
["Reptile", "Lizard"],
["Mammal", "Cat"],
["Mammal", "Wolf", "Dog"],
]

# Use random forest classifiers for every node
rf = LogisticRegression()
classifier = LocalClassifierPerNode(local_classifier=rf)

# Train local classifier per node
classifier.fit(X_train, Y_train)

# Predict
predictions = classifier.predict(X_test)
print(predictions)
31 changes: 9 additions & 22 deletions docs/examples/plot_parallel_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
Larger datasets require more time for training.
While by default the models in HiClass are trained using a single core,
it is possible to train each local classifier in parallel by leveraging the library Ray [1]_.
In this example, we demonstrate how to train a hierarchical classifier in parallel,
using all the cores available, on a mock dataset from Kaggle [2]_.
In this example, we demonstrate how to train a hierarchical classifier in parallel by
setting the parameter :literal:`n_jobs` to use all the cores available. Training
is performed on a mock dataset from Kaggle [2]_.

.. [1] https://www.ray.io/
.. [2] https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification
Expand All @@ -25,29 +26,15 @@
from hiclass import LocalClassifierPerParentNode


def download(url: str, path: str) -> None:
"""
Download a file from the internet.

Parameters
----------
url : str
The address of the file to be downloaded.
path : str
The path to store the downloaded file.
"""
response = requests.get(url)
with open(path, "wb") as file:
file.write(response.content)


# Download training data
training_data_url = "https://zenodo.org/record/6657410/files/train_40k.csv?download=1"
training_data_path = "train_40k.csv"
download(training_data_url, training_data_path)
url = "https://zenodo.org/record/6657410/files/train_40k.csv?download=1"
path = "train_40k.csv"
response = requests.get(url)
with open(path, "wb") as file:
file.write(response.content)

# Load training data into pandas dataframe
training_data = pd.read_csv(training_data_path).fillna(" ")
training_data = pd.read_csv(path).fillna(" ")

# We will use logistic regression classifiers for every parent node
lr = LogisticRegression(max_iter=1000)
Expand Down
32 changes: 17 additions & 15 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,18 @@
#
import os
import sys
sys.path.insert(0, os.path.abspath('./../..'))
sys.path.insert(0, os.path.abspath('./../../hiclass'))

sys.path.insert(0, os.path.abspath("./../.."))
sys.path.insert(0, os.path.abspath("./../../hiclass"))
print(sys.path)

import sphinx_code_tabs

# -- Project information -----------------------------------------------------

project = 'hiclass'
copyright = '2022, Fabio Malcher Miranda, Niklas Köhnecke'
author = 'Fabio Malcher Miranda, Niklas Köhnecke'
project = "hiclass"
copyright = "2022, Fabio Malcher Miranda, Niklas Köhnecke"
author = "Fabio Malcher Miranda, Niklas Köhnecke"


# -- General configuration ---------------------------------------------------
Expand All @@ -32,15 +33,15 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.autosectionlabel',
'sphinx_code_tabs',
'sphinx_gallery.gen_gallery',
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.autosectionlabel",
"sphinx_code_tabs",
"sphinx_gallery.gen_gallery",
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand All @@ -55,12 +56,13 @@
use_rtd_scheme = False
try:
import sphinx_rtd_theme

extensions.extend(["sphinx_rtd_theme"])
use_rtd_scheme = True
except ImportError:
print("sphinx_rtd_theme was not installed, using alabaster as fallback!")

html_theme = 'sphinx_rtd_theme' if use_rtd_scheme else 'alabaster'
html_theme = "sphinx_rtd_theme" if use_rtd_scheme else "alabaster"


# Add any paths that contain custom static files (such as style sheets) here,
Expand All @@ -76,6 +78,6 @@
html_theme_options["sidebar_width"] = "230px"

sphinx_gallery_conf = {
'examples_dirs': '../examples',
'gallery_dirs': 'auto_examples',
}
"examples_dirs": "../examples",
"gallery_dirs": "auto_examples",
}
12 changes: 6 additions & 6 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ Welcome to hiclass' documentation!
:target: https://opensource.org/licenses/BSD-3-Clause
:alt: License

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black

.. toctree::
:titlesonly:
:includehidden:
:maxdepth: 3

introduction/index
get_started/index
auto_examples/index
algorithms/index

.. toctree::
:maxdepth: 3

api/index
api/index
Loading