Skip to content

Commit

Permalink
Merge branch 'ci/testing-env' of https://github.com/Zeroto521/pyjanitor
Browse files Browse the repository at this point in the history
… into ci/testing-env
  • Loading branch information
Zeroto521 committed Nov 7, 2022
2 parents f6bf739 + 8079eca commit 0249ce8
Show file tree
Hide file tree
Showing 23 changed files with 1,283 additions and 883 deletions.
44 changes: 0 additions & 44 deletions .github/workflows/codecov.yml

This file was deleted.

8 changes: 7 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
name: tests

on: [pull_request]
on:
push:
branches:
- dev
pull_request:
branches:
- dev

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
11 changes: 7 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,20 @@
- [DOC] Updated developer guide docs.
- [ENH] Allow column selection/renaming within conditional_join. Issue #1102. Also allow first or last match. Issue #1020 @samukweku.
- [ENH] New decorator `deprecated_kwargs` for breaking API. #1103 @Zeroto521
- [ENH] Extend select_columns to support non-string columns. Also allow selection on MultiIndex columns via level parameter. Issue #1105 @samukweku
- [ENH] Extend select_columns to support non-string columns. Issue #1105 @samukweku
- [ENH] Performance improvement for groupby_topk. Issue #1093 @samukweku
- [ENH] `min_max_scale` drop `old_min` and `old_max` to fit sklearn's method API. Issue #1068 @Zeroto521
- [ENH] Add `jointly` option for `min_max_scale` support to transform each column values or entire values. Default transform each column, similar behavior to `sklearn.preprocessing.MinMaxScaler`. (Issue #1067, PR #1112, PR #1123) @Zeroto521
- [INF] Require pyspark minimal version is v3.2.0 to cut duplicates codes. Issue #1110 @Zeroto521
- [ENH] Added support for extension arrays in `expand_grid`. Issue #1121 @samukweku
- [ENH] Add support for extension arrays in `expand_grid`. Issue #1121 @samukweku
- [ENH] Add `names_expand` and `index_expand` parameters to `pivot_wider` for exposing missing categoricals. Issue #1108 @samukweku
- [ENH] Add fix for slicing error when selecting columns in `pivot_wider`. Issue #1134 @samukweku
- [ENH] Add fix for slicing error when selecting columns in `pivot_wider`. Issue #1134 @samukweku
- [ENH] `dropna` parameter added to `pivot_longer`. Issue #1132 @samukweku
- [INF] Update `mkdocstrings` version and to fit its new coming features. PR #1138 @Zeroto521
- [BUG] Force `math.softmax` returning `Series`. PR #1139 @Zeroto521
- [INF] Set independent environment for building documentation. PR #1141 @Zeroto521
- [DOC] Add local documentation preview via github action artifact. PR #1149 @Zeroto521
- [ENH] Enable `encode_categorical` handle 2 (or more ) dimensions array. PR #1153 @Zeroto521
- [ENH] Faster computation for a single non-equi join, with a numba engine. Issue #1102 @samukweku
- [TST] Fix testcases failing on Window. Issue #1160 @Zeroto521, and @samukweku
- [INF] Cancel old workflow runs via Github Action `concurrency`. PR #1161 @Zeroto521
- [ENH] Faster computation for non-equi join, with a numba engine. Speed improvement for left/right joins when `sort_by_appearance` is False. Issue #1102 @samukweku
Expand All @@ -29,8 +28,12 @@
- [ENH] Fix error when `sort_by_appearance=True` is combined with `dropna=True`. Issue #1168 @samukweku
- [ENH] Add explicit default parameter to `case_when` function. Issue #1159 @samukweku
- [BUG] pandas 1.5.x `_MergeOperation` doesn't have `copy` keyword anymore. Issue #1174 @Zeroto521
- [ENH] `select_rows` function added for flexible row selection. Add support for MultiIndex selection via dictionary. Issue #1124 @samukweku
- [TST] Compat with macos and window, to fix `FailedHealthCheck` Issue #1181 @Zeroto521
- [INF] Merge two docs CIs (`docs-preview.yml` and `docs.yml`) to one. And add `documentation` pytest mark. PR #1183 @Zeroto521
- [INF] Merge `codecov.yml` (only works for the dev branch pushing event) into `tests.yml` (only works for PR event). PR #1185 @Zeroto521
- [TST] Fix failure for test/timeseries/test_fill_missing_timestamp. Issue #1184 @samukweku
- [BUG] Import `DataDescription` to fix: `AttributeError: 'DataFrame' object has no attribute 'data_description'`. PR #1191 @Zeroto521
- [INF] Set a series of complete testing envs. Issue #1127 @Zeroto521

## [v0.23.1] - 2022-05-03
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/select_columns.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
"version": "3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) \n[GCC 10.3.0]"
},
"orig_nbformat": 4
},
Expand Down
18 changes: 2 additions & 16 deletions janitor/accessors/__init__.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,3 @@
"""Miscellaneous mathematical operators.
"""Miscellaneous mathematical operators."""

Lazy loading used here to speed up imports.
"""

import warnings
from typing import Tuple


import lazy_loader as lazy

scipy_special = lazy.load("scipy.special")
ss = lazy.load("scipy.stats")
pf = lazy.load("pandas_flavor")
pd = lazy.load("pandas")
np = lazy.load("numpy")
pdtypes = lazy.load("pandas.api.types")
from janitor.accessors.data_description import DataDescription # noqa: F401
2 changes: 1 addition & 1 deletion janitor/functions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
from .reorder_columns import reorder_columns
from .round_to_fraction import round_to_fraction
from .row_to_names import row_to_names
from .select_columns import select_columns
from .select import select_columns, select_rows
from .shuffle import shuffle
from .sort_column_value_order import sort_column_value_order
from .sort_naturally import sort_naturally
Expand Down
7 changes: 4 additions & 3 deletions janitor/functions/coalesce.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import pandas_flavor as pf

from janitor.utils import check, deprecated_alias
from janitor.functions.utils import _select_column_names
from janitor.functions.utils import _select_index


@pf.register_dataframe_method
Expand Down Expand Up @@ -95,7 +95,8 @@ def coalesce(
"The number of columns to coalesce should be a minimum of 2."
)

column_names = _select_column_names([*column_names], df)
indices = _select_index([*column_names], df, axis="columns")
column_names = df.columns[indices]

if target_column_name:
check("target_column_name", target_column_name, [str])
Expand All @@ -106,7 +107,7 @@ def coalesce(
if target_column_name is None:
target_column_name = column_names[0]

outcome = df.filter(column_names).bfill(axis="columns").iloc[:, 0]
outcome = df.loc(axis=1)[column_names].bfill(axis="columns").iloc[:, 0]
if outcome.hasnans and (default_value is not None):
outcome = outcome.fillna(default_value)

Expand Down
9 changes: 5 additions & 4 deletions janitor/functions/conditional_join.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def conditional_join(
especially if the intervals do not overlap.
Column selection in `df_columns` and `right_columns` is possible using the
[`select_columns`][janitor.functions.select_columns.select_columns] syntax.
[`select_columns`][janitor.functions.select.select_columns] syntax.
For strictly non-equi joins,
involving either `>`, `<`, `>=`, `<=` operators,
Expand Down Expand Up @@ -143,7 +143,7 @@ def conditional_join(
:param keep: Choose whether to return the first match,
last match or all matches. Default is `all`.
:param use_numba: Use numba, if installed, to accelerate the computation.
Default is `False`.
Applicable only to strictly non-equi joins. Default is `False`.
:returns: A pandas DataFrame of the two merged Pandas objects.
"""

Expand Down Expand Up @@ -1214,10 +1214,11 @@ def _cond_join_select_columns(columns: Any, df: pd.DataFrame):
Returns a Pandas DataFrame.
"""

df = df.select_columns(columns)

if isinstance(columns, dict):
df = df.select_columns([*columns])
df.columns = [columns.get(name, name) for name in df]
else:
df = df.select_columns(columns)

return df

Expand Down
130 changes: 107 additions & 23 deletions janitor/functions/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from pandas.core.dtypes.concat import concat_compat

from janitor.functions.utils import (
_select_column_names,
_select_index,
_computations_expand_grid,
)
from janitor.utils import check
Expand Down Expand Up @@ -52,7 +52,7 @@ def pivot_longer(
row axis.
Column selection in `index` and `column_names` is possible using the
[`select_columns`][janitor.functions.select_columns.select_columns] syntax.
[`select_columns`][janitor.functions.select.select_columns] syntax.
Example:
Expand Down Expand Up @@ -382,17 +382,35 @@ def _data_checks_pivot_longer(
"when the columns are a MultiIndex."
)

is_multi_index = isinstance(df.columns, pd.MultiIndex)
indices = None
if column_names is not None:
if is_list_like(column_names):
column_names = list(column_names)
column_names = _select_column_names(column_names, df)
column_names = list(column_names)
if is_multi_index:
column_names = _check_tuples_multiindex(
df.columns, column_names, "column_names"
)
else:
if is_list_like(column_names):
column_names = list(column_names)
indices = _select_index(column_names, df, axis="columns")
column_names = df.columns[indices]
if not is_list_like(column_names):
column_names = [column_names]
else:
column_names = list(column_names)

if index is not None:
if is_list_like(index):
index = list(index)
index = _select_column_names(index, df)
index = list(index)
if is_multi_index:
index = _check_tuples_multiindex(df.columns, index, "index")
else:
if is_list_like(index):
index = list(index)
indices = _select_index(index, df, axis="columns")
index = df.columns[indices]
if not is_list_like(index):
index = [index]
else:
index = list(index)

if index is None:
if column_names is None:
Expand Down Expand Up @@ -1181,7 +1199,7 @@ def pivot_wider(
Column selection in `index`, `names_from` and `values_from`
is possible using the
[`select_columns`][janitor.functions.select_columns.select_columns] syntax.
[`select_columns`][janitor.functions.select.select_columns] syntax.
A ValueError is raised if the combination
of the `index` and `names_from` is not unique.
Expand Down Expand Up @@ -1455,27 +1473,69 @@ def _data_checks_pivot_wider(
checking happens.
"""

is_multi_index = isinstance(df.columns, pd.MultiIndex)
indices = None
if index is not None:
if is_list_like(index):
index = list(index)
index = _select_column_names(index, df)
index = list(index)
if is_multi_index:
if not isinstance(index, list):
raise TypeError(
"For a MultiIndex column, pass a list of tuples "
"to the index argument."
)
index = _check_tuples_multiindex(df.columns, index, "index")
else:
if is_list_like(index):
index = list(index)
indices = _select_index(index, df, axis="columns")
index = df.columns[indices]
if not is_list_like(index):
index = [index]
else:
index = list(index)

if names_from is None:
raise ValueError(
"pivot_wider() is missing 1 required argument: 'names_from'"
)

if is_list_like(names_from):
names_from = list(names_from)
names_from = _select_column_names(names_from, df)
names_from = list(names_from)
if is_multi_index:
if not isinstance(names_from, list):
raise TypeError(
"For a MultiIndex column, pass a list of tuples "
"to the names_from argument."
)
names_from = _check_tuples_multiindex(
df.columns, names_from, "names_from"
)
else:
if is_list_like(names_from):
names_from = list(names_from)
indices = _select_index(names_from, df, axis="columns")
names_from = df.columns[indices]
if not is_list_like(names_from):
names_from = [names_from]
else:
names_from = list(names_from)

if values_from is not None:
if is_list_like(values_from):
values_from = list(values_from)
out = _select_column_names(values_from, df)
out = list(out)
if is_multi_index:
if not isinstance(values_from, list):
raise TypeError(
"For a MultiIndex column, pass a list of tuples "
"to the values_from argument."
)
out = _check_tuples_multiindex(
df.columns, values_from, "values_from"
)
else:
if is_list_like(values_from):
values_from = list(values_from)
indices = _select_index(values_from, df, axis="columns")
out = df.columns[indices]
if not is_list_like(out):
out = [out]
else:
out = list(out)
# hack to align with pd.pivot
if values_from == out[0]:
values_from = out[0]
Expand Down Expand Up @@ -1550,3 +1610,27 @@ def _expand(indexer, retain_categories):
ordered=indexer.ordered,
)
return indexer


def _check_tuples_multiindex(indexer, args, param):
"""
Check entries for tuples,
if indexer is a MultiIndex.
Returns a list of tuples.
"""
all_tuples = (isinstance(arg, tuple) for arg in args)
if not all(all_tuples):
raise TypeError(
f"{param} must be a list of tuples "
"when the columns are a MultiIndex."
)

not_found = set(args).difference(indexer)
if any(not_found):
raise KeyError(
f"Tuples {*not_found,} in the {param} "
"argument do not exist in the dataframe's columns."
)

return args
Loading

0 comments on commit 0249ce8

Please sign in to comment.