Multilevel Panels

This package provides a class with methods to perform special hierarchy-preserving set operations on hierarchically organized data.

Examples of such hierarchically organized data include:

genus, species
countries, regions, cities
treatment/control group, subject, observation date

Such data are sometimes referred to as multilevel or nested and are commonly encountered in social sciences.

By 'hierarchy-preserving set operations', we mean the following:

when a row is intersected with another row that contains partially matching but less specific data, the row with more specific data is the result
when a row is unioned with another row that contains partially matching but less specific data, the row with less specific data is the result

For example, assume the following two datasets, where the first column specifies genus, and the second column species. None is used as a sentinel value to indicate "any".

A = [
    ('Felis', 'silvestris'),
    ('Canis', 'lupus'),
    ('Homo', None),
    ('Ovis', None),
]

B = [
    ('Felis', 'silvestris'),
    ('Homo', 'sapiens'),
    ('Homo', 'neanderthalensis'),
    ('Ovis', None),
]

Then we define A ∩ B to be:

{
    ('Felis', 'silvestris'),
    ('Homo', 'sapiens'),
    ('Homo', 'neanderthalensis'),
    ('Ovis', None),
}

And A ∪ B to be:

{
    ('Felis', 'silvestris'),
    ('Canis', 'lupus'),
    ('Homo', None),
    ('Ovis', None),
}

As expected, the order of operands does not matter: A ∩ B = B ∩ A and A ∪ B = B ∪ A. However, the order of the rows in the result is not guaranteed.

See the included example for more information.

Assumptions

Input data must be 2-dimenisonal integer or floating point numpy.ndarray arrays.
- It is trivial to convert non-numeric data to this format; see the included example.
The special value NaN is a sentinel indicating a wildcard; it will match any other value in the same hierarchical path.
Each input array's rows must be unique.
The hierarchy is assumed to flow from left to right; i.e., the 0-th column must contain the highest level of the hierarchy.
If a value in a row of the hierarchy is NaN, all remaining values in the row must be NaN.
- I.e.(1., NaN, NaN) is permitted, but (1., NaN, 1.) is not.
Operations must be performed on arrays with equal numbers of columns.

Usage

See the included example for usage examples.

Installation

Running make install will pip install the package into the current environment.

Running Tests

Run make develop followed by make test.

Algorithm

Decompose input arrays of width c into c arrays of widths [1, c].
- If n ∈ [1, c], then the n-th array contains input rows of width n whose original values in columns > n contained only NaN.
- Some of these c arrays may be empty.
Intersect or union the arrays of width c, preserving hierarchy by examining values in the arrays of width c - 1.
Concatenate and deduplicate these results as needed.
Recurse to step 2 with arrays of width c - 1. When the base case is reached, return.

Performance

This implementation can perform intersections and unions on datasets of shape (100,000, 3) in approximately 1 second on a MacBook Pro (Retina, 13-inch, Early 2015).

Known Limitations

Level Skipping

Input data containing rows that "skip" an intervening level of the hierarchy is not correctly handled. Such cases are detected and raise an exception. See the test suite for an example.

This limitation only affects panels with > 2 columns.

License

This software is distributed under the MIT License. See the license file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
src/multilevel_panels		src/multilevel_panels
tests		tests
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
dev-requirements.txt		dev-requirements.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilevel Panels

Assumptions

Usage

Installation

Running Tests

Algorithm

Performance

Known Limitations

Level Skipping

License

About

Releases

Packages

Languages

License

langelgjm/multilevel-panels

Folders and files

Latest commit

History

Repository files navigation

Multilevel Panels

Assumptions

Usage

Installation

Running Tests

Algorithm

Performance

Known Limitations

Level Skipping

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages