Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): add setting to retain categories #1340

Merged
merged 83 commits into from Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
a62bfe2
(feat): add options features.
ilan-gold Dec 18, 2023
76c4073
(feat): tests, doc strings
ilan-gold Dec 18, 2023
a51729c
(feat): add settings to docs
ilan-gold Dec 18, 2023
47dbb4e
(fix): add `describe_option` to exports, try to fix docs errors
ilan-gold Dec 18, 2023
80c20ae
(chore): add reset test
ilan-gold Dec 18, 2023
04f8495
(fix): no multi-inheritance in py3.9 for NamedTuple
ilan-gold Dec 18, 2023
2117762
(refactor): use decorator
ilan-gold Dec 29, 2023
4d6e5fd
(chore): move options section
ilan-gold Dec 29, 2023
c175192
Merge branch 'main' into ig/settings
ilan-gold Dec 29, 2023
6780df9
(chore): add release note
ilan-gold Dec 29, 2023
bed4df6
(refactor): class based implementation
ilan-gold Jan 4, 2024
36e61d5
(feat): add deprecation
ilan-gold Jan 4, 2024
47c2a76
(chore): clean up docstrings and variables
ilan-gold Jan 4, 2024
bfb91ad
(chore): redo release note
ilan-gold Jan 4, 2024
8d30f03
(bug): fix `api.md`
ilan-gold Jan 4, 2024
f710c1c
(style): fix grammar
ilan-gold Jan 4, 2024
722a918
finish up typing
flying-sheep Jan 5, 2024
21b087e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 5, 2024
f4999fe
style
flying-sheep Jan 5, 2024
46d6fcb
(feat): use attributes instead of items
ilan-gold Jan 5, 2024
0fa637a
(chore): no boolean without *
ilan-gold Jan 5, 2024
12fa336
(feat): support multi-option functionality
ilan-gold Jan 5, 2024
374c107
(feat): add `__dir__` method
ilan-gold Jan 5, 2024
1a021f2
(fix): `default_value` typing
ilan-gold Jan 5, 2024
4f16310
(feat): dynamic docstring as class method
ilan-gold Jan 9, 2024
4a1852d
(feat): tab completion in jupyter notebook for override
ilan-gold Jan 9, 2024
24a94cc
(feat): tab completion in jupyter notebook for override
ilan-gold Jan 9, 2024
3d55426
Merge branch 'main' into ig/settings
ilan-gold Jan 9, 2024
cb2767f
(feat): docstring for `override`
ilan-gold Jan 9, 2024
9ed8923
Merge branch 'ig/settings' of github.com:scverse/anndata into ig/sett…
ilan-gold Jan 9, 2024
494faa3
(refactor): do docstring update in wrapped function
ilan-gold Jan 9, 2024
dcc547f
Merge branch 'main' into ig/settings
ilan-gold Jan 10, 2024
dbc77ea
Merge branch 'main' into ig/settings
ilan-gold Jan 23, 2024
2334431
Merge branch 'main' into ig/settings
ilan-gold Jan 24, 2024
8a917f6
(chore): remove docstring types.
ilan-gold Jan 25, 2024
ffc8dc1
Merge branch 'ig/settings' of github.com:scverse/anndata into ig/sett…
ilan-gold Jan 25, 2024
79c13a3
(fix): `KeyError` -> `AttributeError`
ilan-gold Jan 25, 2024
f8bbc62
(refactor): `setattr` -> direct setting
ilan-gold Jan 25, 2024
9915805
(refactor): no more decorator for updating `override`
ilan-gold Jan 25, 2024
b9fa23e
(refactor): relabel options docstring variable
ilan-gold Jan 25, 2024
14a7d88
(fix): docstring tab
ilan-gold Jan 25, 2024
cca4ccd
(chore): add `override` to docs
ilan-gold Jan 25, 2024
0217443
(fix): clean up docstring methods
ilan-gold Jan 25, 2024
64ef6ab
(chore): clean up unused methods/objects
ilan-gold Jan 25, 2024
7cadcbe
(chore): add extra test
ilan-gold Jan 25, 2024
017e341
Merge branch 'main' into ig/settings
ilan-gold Jan 25, 2024
d824b0d
Merge branch 'main' into ig/settings
ilan-gold Jan 26, 2024
c4b37f6
(fix): remove evironment variables
ilan-gold Jan 26, 2024
ad09df7
Merge branch 'ig/settings' of github.com:scverse/anndata into ig/sett…
ilan-gold Jan 26, 2024
c5452db
(feat): add setting to retain categories
ilan-gold Jan 26, 2024
15fa08a
(chore): add extra test for `allowed_values=None`
ilan-gold Jan 26, 2024
25671cb
(chore): docs
ilan-gold Jan 26, 2024
aaf917f
(chore): clarify `override` usage
ilan-gold Jan 26, 2024
84d688a
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 26, 2024
d72b14d
Merge branch 'main' into ig/settings
ilan-gold Jan 29, 2024
b7ab7cf
Apply suggestions from code review
ilan-gold Jan 29, 2024
5c006c6
(chore): add `dir` test
ilan-gold Jan 29, 2024
76571fd
(chore): use mocking
ilan-gold Jan 29, 2024
5ece4e8
Merge branch 'categories_setting' of github.com:scverse/anndata into …
ilan-gold Jan 29, 2024
c770ebb
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 29, 2024
0a6e2d1
(fix): small docstring fix
ilan-gold Jan 30, 2024
6ddf6d2
(fix): validator api + tests with nice warnings
ilan-gold Jan 30, 2024
62641c1
(chore): remove leading space from note
ilan-gold Jan 30, 2024
33e2470
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 30, 2024
330ffe8
(chore): update from validation change
ilan-gold Jan 30, 2024
fcd22c2
(chore): make docstring clearer
ilan-gold Jan 30, 2024
d678408
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 30, 2024
0981bad
(fix): use `add_note`
ilan-gold Jan 30, 2024
8ad25fb
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 30, 2024
0c8f449
Merge branch 'main' into ig/settings
ilan-gold Jan 30, 2024
6bfb1e5
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 30, 2024
7476713
(refactor): unnecessary `else` in guard clause
ilan-gold Jan 30, 2024
24e7df0
(fix): do not raise DeprecationWarning
ilan-gold Jan 30, 2024
a0f54a4
Merge branch 'ig/settings' of github.com:scverse/anndata into ig/sett…
ilan-gold Jan 30, 2024
4f34f4c
Merge branch 'ig/settings' into categories_setting
ilan-gold Jan 30, 2024
af8878f
Merge branch 'categories_setting' of github.com:scverse/anndata into …
ilan-gold Jan 30, 2024
0822201
ugh
ilan-gold Jan 30, 2024
663c632
(chore): add 0/1 to docs for boolean env variables
ilan-gold Jan 30, 2024
da797d1
(chore): move env variable fetching into `register`
ilan-gold Jan 30, 2024
826dd66
(fix): typing from `cast`
ilan-gold Jan 30, 2024
0993718
(chore): docs link
ilan-gold Jan 30, 2024
f0acbb1
(fix): use `Enum` properly
ilan-gold Feb 2, 2024
7082232
Merge branch 'main' into categories_setting
ilan-gold Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
85 changes: 82 additions & 3 deletions anndata/_config.py
@@ -1,16 +1,18 @@
from __future__ import annotations

import os
import textwrap
import warnings
from collections.abc import Iterable
from contextlib import contextmanager
from enum import Enum
from inspect import Parameter, signature
from typing import TYPE_CHECKING, NamedTuple, TypeVar
from typing import TYPE_CHECKING, Any, NamedTuple, TypeVar

from anndata.compat.exceptiongroups import add_note

if TYPE_CHECKING:
from collections.abc import Callable
from collections.abc import Callable, Sequence

T = TypeVar("T")

Expand All @@ -30,6 +32,55 @@ class RegisteredOption(NamedTuple):
type: object


def check_and_get_environ_var(
key: str,
default_value: str,
allowed_values: Sequence[str] | None = None,
cast: Callable[[Any], T] | type[Enum] = lambda x: x,
) -> T:
"""Get the environment variable and return it is a (potentially) non-string, usable value.

Parameters
----------
key
The environment variable name.
default_value
The default value for `os.environ.get`.
allowed_values
Allowable string values., by default None
cast
Casting from the string to a (potentially different) python object, by default lambdax:x

Returns
-------
The casted value.
"""
environ_value_or_default_value = os.environ.get(key, default_value)
if (
allowed_values is not None
and environ_value_or_default_value not in allowed_values
):
warnings.warn(
f'Value "{environ_value_or_default_value}" is not in allowed {allowed_values} for environment variable {key}.\
Default {default_value} will be used.'
)
environ_value_or_default_value = default_value
return (
cast(environ_value_or_default_value)
if not isinstance(cast, type(Enum))
else cast[environ_value_or_default_value]
)


def check_and_get_bool(option, default_value):
return check_and_get_environ_var(
"ANNDATA_" + option.upper(),
str(int(default_value)),
["0", "1"],
lambda x: bool(int(x)),
)


_docstring = """
This manager allows users to customize settings for the anndata package.
Settings here will generally be for advanced use-cases and should be used with caution.
Expand All @@ -39,6 +90,8 @@ class RegisteredOption(NamedTuple):
{options_description}

For setting an option please use :func:`~anndata.settings.override` (local) or set the above attributes directly (global) i.e., `anndata.settings.my_setting = foo`.
For assignment by environment variable, use the variable name in all caps with `ANNDATA_` as the prefix before import of :mod:`anndata`.
For boolean environment variable setting, use 1 for `True` and 0 for `False`.
"""


Expand Down Expand Up @@ -111,6 +164,7 @@ def register(
description: str,
validate: Callable[[T], bool],
option_type: object | None = None,
get_from_env: Callable[[str, T], T] = lambda x, y: y,
) -> None:
"""Register an option so it can be set/described etc. by end-users

Expand All @@ -126,6 +180,9 @@ def register(
A function which returns True if the option's value is valid and otherwise should raise a `ValueError` or `TypeError`.
option
Optional override for the option type to be displayed. Otherwise `type(default_value)`.
get_from_env
An optional function which takes as arguments the name of the option and a default value and returns the value from the environment variable `ANNDATA_CAPS_OPTION` (or default if not present).
Default behavior is to return `default_value` without checking the environment.
"""
try:
validate(default_value)
Expand All @@ -144,7 +201,7 @@ def register(
self._registered_options[option] = RegisteredOption(
option, default_value, doc, validate, option_type
)
self._config[option] = default_value
self._config[option] = get_from_env(option, default_value)
self._update_override_function_for_new_option(option)

def _update_override_function_for_new_option(
Expand Down Expand Up @@ -294,5 +351,27 @@ def __doc__(self):
# PLACE REGISTERED SETTINGS HERE SO THEY CAN BE PICKED UP FOR DOCSTRING CREATION #
##################################################################################


categories_option = "remove_unused_categories"
categories_default_value = True
categories_description = (
"Whether or not to remove unused categories with :class:`~pandas.Categorical`."
)


def validate_bool(val) -> bool:
if not isinstance(val, bool):
raise TypeError(f"{val} not valid boolean")
return True


settings.register(
categories_option,
categories_default_value,
categories_description,
validate_bool,
get_from_env=check_and_get_bool,
)

##################################################################################
##################################################################################
6 changes: 4 additions & 2 deletions anndata/_core/anndata.py
Expand Up @@ -30,6 +30,7 @@
from anndata._warnings import ImplicitModificationWarning

from .. import utils
from .._config import settings
from ..compat import (
CupyArray,
CupySparseMatrix,
Expand Down Expand Up @@ -413,8 +414,9 @@ def _init_as_view(self, adata_ref: AnnData, oidx: Index, vidx: Index):
self._varp = adata_ref.varp._view(self, vidx)
# fix categories
uns = copy(adata_ref._uns)
self._remove_unused_categories(adata_ref.obs, obs_sub, uns)
self._remove_unused_categories(adata_ref.var, var_sub, uns)
if settings.remove_unused_categories:
self._remove_unused_categories(adata_ref.obs, obs_sub, uns)
self._remove_unused_categories(adata_ref.var, var_sub, uns)
# set attributes
self._obs = DataFrameView(obs_sub, view_args=(self, "obs"))
self._var = DataFrameView(var_sub, view_args=(self, "var"))
Expand Down
10 changes: 10 additions & 0 deletions anndata/tests/test_base.py
Expand Up @@ -12,6 +12,7 @@
from scipy.sparse import csr_matrix, issparse

from anndata import AnnData
from anndata._config import settings
from anndata.tests.helpers import assert_equal, gen_adata

# some test objects that we use below
Expand Down Expand Up @@ -399,6 +400,15 @@ def test_slicing_remove_unused_categories():
assert adata[2:4].obs["k"].cat.categories.tolist() == ["b"]


def test_slicing_dont_remove_unused_categories():
with settings.override(remove_unused_categories=False):
adata = AnnData(
np.array([[1, 2], [3, 4], [5, 6], [7, 8]]), dict(k=["a", "a", "b", "b"])
)
adata._sanitize()
assert adata[2:4].obs["k"].cat.categories.tolist() == ["a", "b"]


def test_get_subset_annotation():
adata = AnnData(
np.array([[1, 2, 3], [4, 5, 6]]),
Expand Down
110 changes: 103 additions & 7 deletions anndata/tests/test_config.py
@@ -1,8 +1,16 @@
from __future__ import annotations

import os
from enum import Enum

import pytest

from anndata._config import SettingsManager
from anndata._config import (
SettingsManager,
check_and_get_bool,
check_and_get_environ_var,
validate_bool,
)

option = "test_var"
default_val = False
Expand All @@ -18,12 +26,6 @@
type_3 = list[int]


def validate_bool(val) -> bool:
if not isinstance(val, bool):
raise TypeError(f"{val} not valid boolean")
return True


def validate_int_list(val) -> bool:
if not isinstance(val, list) or not [isinstance(type(e), int) for e in val]:
raise TypeError(f"{repr(val)} is not a valid int list")
Expand All @@ -49,6 +51,53 @@ def test_register_option_default():
assert description in settings.describe(option)


def test_register_with_env(monkeypatch):
with monkeypatch.context() as mp:
option_env = "test_var_env"
default_val_env = False
description_env = "My doc string env!"
option_env_var = "ANNDATA_" + option_env.upper()
mp.setenv(option_env_var, "1")

settings.register(
option_env,
default_val_env,
description_env,
validate_bool,
get_from_env=check_and_get_bool,
)

assert settings.test_var_env


def test_register_with_env_enum(monkeypatch):
with monkeypatch.context() as mp:
option_env = "test_var_env"
default_val_env = False
description_env = "My doc string env!"
option_env_var = "ANNDATA_" + option_env.upper()
mp.setenv(option_env_var, "b")

class TestEnum(Enum):
a = False
b = True

def check_and_get_bool_enum(option, default_value):
return check_and_get_environ_var(
"ANNDATA_" + option.upper(), "a", cast=TestEnum
).value

settings.register(
option_env,
default_val_env,
description_env,
validate_bool,
get_from_env=check_and_get_bool_enum,
)

assert settings.test_var_env


def test_register_bad_option():
with pytest.raises(TypeError, match="'foo' is not a valid int list"):
settings.register(
Expand Down Expand Up @@ -129,3 +178,50 @@ def test_deprecation_no_message():
def test_option_typing():
assert settings._registered_options[option_3].type == type_3
assert str(type_3) in settings.describe(option_3, print_description=False)


def test_check_and_get_environ_var(monkeypatch):
with monkeypatch.context() as mp:
option_env_var = "ANNDATA_OPTION"
assert hash("foo") == check_and_get_environ_var(
option_env_var, "foo", ["foo", "bar"], lambda x: hash(x)
)
mp.setenv(option_env_var, "bar")
assert hash("bar") == check_and_get_environ_var(
option_env_var, "foo", ["foo", "bar"], lambda x: hash(x)
)
mp.setenv(option_env_var, "Not foo or bar")
with pytest.warns(
match=f'Value "{os.environ[option_env_var]}" is not in allowed'
):
check_and_get_environ_var(
option_env_var, "foo", ["foo", "bar"], lambda x: hash(x)
)
assert hash("Not foo or bar") == check_and_get_environ_var(
option_env_var, "foo", cast=lambda x: hash(x)
)


def test_check_and_get_bool(monkeypatch):
with monkeypatch.context() as mp:
option_env_var = "ANNDATA_" + option.upper()
assert not check_and_get_bool(option, default_val)
mp.setenv(option_env_var, "1")
assert check_and_get_bool(option, default_val)
mp.setenv(option_env_var, "Not 0 or 1")
with pytest.warns(
match=f'Value "{os.environ[option_env_var]}" is not in allowed'
):
check_and_get_bool(option, default_val)


def test_check_and_get_bool_enum(monkeypatch):
with monkeypatch.context() as mp:
option_env_var = "ANNDATA_" + option.upper()
mp.setenv(option_env_var, "b")

class TestEnum(Enum):
a = False
b = True

assert check_and_get_environ_var(option_env_var, "a", cast=TestEnum).value
1 change: 1 addition & 0 deletions docs/release-notes/0.11.0.md
Expand Up @@ -3,6 +3,7 @@
```{rubric} Features
```
* Add `settings` object with methods for altering internally-used options, like checking for uniqueness on `obs`' index {pr}`1270` {user}`ilan-gold`
* Add `remove_unused_categories` option to `anndata.settings` to override current behavior. Default is `True` (i.e., previous behavior). Please refer to the [documentation](https://anndata.readthedocs.io/en/latest/generated/anndata.settings.html) for usage. {pr}`1340` {user}`ilan-gold`

```{rubric} Bugfix
```
Expand Down