Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] nth function #3346

Closed
wants to merge 83 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
6b80fb7
skeleton for nth integers
Sep 2, 2022
9359413
seketon logic for nth, using rowindex
Sep 2, 2022
08d04ac
add tests
Sep 2, 2022
784c3f7
add method and tests for method
Sep 2, 2022
41ac89d
update
Sep 3, 2022
081c462
add newline
Sep 3, 2022
b6a4f79
add newline
Sep 3, 2022
0923a80
use VirtualColumn implementation instead of RowIndex
Sep 15, 2022
135b4c6
FExpr/Expr adjustments in the docs (#3347)
oleksiyskononenko Sep 9, 2022
a4d84f3
Minor docs adjustments to `cumcount()` and `ngroup()` (#3349)
oleksiyskononenko Sep 9, 2022
382528e
Fix cumulative functions when invoked on the latent columns (#3348)
oleksiyskononenko Sep 9, 2022
7a0e192
Fix status badge in documentation (#3351)
oleksiyskononenko Sep 14, 2022
5118a0d
skeleton for skipna logic
Sep 15, 2022
36ec769
skipna added
Sep 15, 2022
e4c2c91
Merge branch 'main' into samukweku/nth
oleksiyskononenko Sep 19, 2022
8ac6a99
[ENH] Column aliasing (#3333)
samukweku Sep 20, 2022
565ba32
add nth method
Sep 20, 2022
e71103f
add nth method
Sep 20, 2022
9baaa9e
move SKIPNA to single location
Sep 21, 2022
9b14189
Merge branch 'main' into samukweku/nth
oleksiyskononenko Sep 21, 2022
6ab25d5
fix tests for tests_f.py::test_nth
Sep 22, 2022
895fa14
add whitespace to test-nth.py
Sep 22, 2022
092bdb6
add tests for strings for nth
Sep 22, 2022
99e0550
remove irrelevant comments
Sep 22, 2022
251ab0e
update method args for nth
Sep 22, 2022
9a61331
update parameter from nth to n
Sep 22, 2022
160629f
keep column name
Sep 22, 2022
6efa708
updates based on feedback
Sep 24, 2022
fba70de
whitespace between brackets
Sep 24, 2022
08339b3
name update
Sep 24, 2022
aad773f
gby padding
Sep 25, 2022
cd0a37e
updates based on feedback
Sep 25, 2022
be522c4
fix whitespace
Sep 25, 2022
3550669
Re-factoring and cosmetics
oleksiyskononenko Sep 27, 2022
5a37997
skeleton for dropna logic
Sep 29, 2022
b1b95f0
skeleton
Sep 29, 2022
0d56e96
rowall impl flaw
Sep 30, 2022
1f140b9
skipna is any/all/none
samukweku Oct 23, 2022
01b9f48
no idea how to use rowall/rowany;
samukweku Oct 23, 2022
6a3ab3c
no idea
samukweku Oct 23, 2022
6e7f983
remove skipna argument - keep it simple
samukweku Jan 2, 2023
fef7ab9
add first/last to init
samukweku Jan 3, 2023
47d5f1c
add whitespace to test-nth.py
Sep 22, 2022
392cb46
updates based on feedback
Sep 24, 2022
d7459a1
whitespace between brackets
Sep 24, 2022
6ca609c
gby padding
Sep 25, 2022
3565c1e
updates based on feedback
Sep 25, 2022
fa0783c
Re-factoring and cosmetics
oleksiyskononenko Sep 27, 2022
dbce97b
skeleton for dropna logic
Sep 29, 2022
d3a27f8
skeleton
Sep 29, 2022
28a2b90
rowall impl flaw
Sep 30, 2022
d1bdebe
no idea how to use rowall/rowany;
samukweku Oct 23, 2022
c4ca0e2
remove skipna argument - keep it simple
samukweku Jan 2, 2023
d6435a8
implement first/last
samukweku Jan 2, 2023
ee5a854
implement first/last - rebase
samukweku Jan 2, 2023
78e615e
implement first/last - rebase
samukweku Jan 2, 2023
c21ca74
Build and test Python 3.7 wheels on Windows (#3357)
oleksiyskononenko Sep 25, 2022
4f6a02a
Adjustments to fread documenation (#3361)
oleksiyskononenko Sep 26, 2022
6eb48a2
Fix casting void columns to categoricals (#3362)
oleksiyskononenko Sep 27, 2022
bae12ea
Improve header detection heuristics in `fread()` (#3364)
oleksiyskononenko Oct 1, 2022
93c2244
Implement casting of the most column types to categoricals (#3365)
oleksiyskononenko Oct 4, 2022
fb0df58
Use col/cols and convert shift docs to standard format (#3366)
oleksiyskononenko Oct 7, 2022
098c410
Implement `dt.categories()` (#3367)
oleksiyskononenko Oct 11, 2022
fd310d0
Fix "See also" sections for `cat*` types and `cbind()`/`rbind()` docs…
oleksiyskononenko Oct 11, 2022
a836fd6
Add basic support for `Grouping::GtoFEW` (#3370)
oleksiyskononenko Oct 15, 2022
ae6e9c6
Implement `dt.codes()` (#3371)
oleksiyskononenko Oct 18, 2022
690e094
Implement casts from categorical types, add `to_csv()` support for ca…
oleksiyskononenko Oct 21, 2022
ad728aa
Adjust copyright years in `types/type_*.cc`
oleksiyskononenko Oct 21, 2022
a1dd57c
Implement statistics for categorical columns (#3373)
oleksiyskononenko Oct 26, 2022
20e3da1
[ENH] Enhance `dt.fillna()` to support filling with a particular valu…
samukweku Oct 26, 2022
c11a122
Update documentation regarding removal of python 3.6 (#3377)
oleksiyskononenko Oct 28, 2022
15d225f
[ENH] Add `reverse` parameter to `cumsum()`, `cumprod()`, `cummin()` …
samukweku Nov 20, 2022
020961e
Add parameter `na_position` to `dt.sort()` documentation (#3389)
oleksiyskononenko Nov 29, 2022
ae2045a
Fix groupby behaviour on columns with missing values (#3391)
oleksiyskononenko Nov 29, 2022
dd264de
Add support for Python `3.11` (#3387)
oleksiyskononenko Dec 1, 2022
dbad9c5
Refactor `sum()` and `prod()` reducers to use `FExpr` (#3388)
samukweku Dec 12, 2022
f220005
Enable macOS on AppVeyor for py311 (#3395)
oleksiyskononenko Dec 12, 2022
997bd88
Add `dt.prod()` test for a grouped column, `by()` docs adjustment (#3…
samukweku Dec 14, 2022
56317c6
Refactor `.extend()` and `.remove()` to use `FExpr` (#3393)
samukweku Dec 15, 2022
da2c467
add method and tests for method
Sep 2, 2022
e3f5fbd
FExpr/Expr adjustments in the docs (#3347)
oleksiyskononenko Sep 9, 2022
33e66b9
Fix cumulative functions when invoked on the latent columns (#3348)
oleksiyskononenko Sep 9, 2022
0a6722a
[ENH] Column aliasing (#3333)
samukweku Sep 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 38 additions & 1 deletion ci/Jenkinsfile.groovy
@@ -1,6 +1,6 @@
#!/usr/bin/groovy
//------------------------------------------------------------------------------
// Copyright 2018-2020 H2O.ai
// Copyright 2018-2022 H2O.ai
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
Expand Down Expand Up @@ -256,12 +256,14 @@ ansiColor('xterm') {
/opt/python/cp38-cp38/bin/python3.8 ci/ext.py wheel --audit && \
/opt/python/cp39-cp39/bin/python3.9 ci/ext.py wheel --audit && \
/opt/python/cp310-cp310/bin/python3.10 ci/ext.py wheel --audit && \
/opt/python/cp311-cp311/bin/python3.11 ci/ext.py wheel --audit && \
echo '===== Py3.8 Debug =====' && unzip -p dist/*debug*.whl datatable/_build_info.py && \
mv dist/*debug*.whl . && \
echo '===== Py3.7 =====' && unzip -p dist/*cp37*.whl datatable/_build_info.py && \
echo '===== Py3.8 =====' && unzip -p dist/*cp38*.whl datatable/_build_info.py && \
echo '===== Py3.9 =====' && unzip -p dist/*cp39*.whl datatable/_build_info.py && \
echo '===== Py3.10 =====' && unzip -p dist/*cp310*.whl datatable/_build_info.py && \
echo '===== Py3.11 =====' && unzip -p dist/*cp311*.whl datatable/_build_info.py && \
mv *debug*.whl dist/ && \
ls -la dist"
"""
Expand Down Expand Up @@ -296,10 +298,14 @@ ansiColor('xterm') {
source /Users/jenkins/datatable_envs/py310/bin/activate
python ci/ext.py wheel
deactivate
source /Users/jenkins/datatable_envs/py311/bin/activate
python ci/ext.py wheel
deactivate
echo '===== Py3.7 =====' && unzip -p dist/*cp37*.whl datatable/_build_info.py
echo '===== Py3.8 =====' && unzip -p dist/*cp38*.whl datatable/_build_info.py
echo '===== Py3.9 =====' && unzip -p dist/*cp39*.whl datatable/_build_info.py
echo '===== Py3.10 =====' && unzip -p dist/*cp310*.whl datatable/_build_info.py
echo '===== Py3.11 =====' && unzip -p dist/*cp311*.whl datatable/_build_info.py
ls dist
"""
stash name: 'x86_64-macos-wheels', includes: "dist/*.whl"
Expand Down Expand Up @@ -340,12 +346,14 @@ ansiColor('xterm') {
/opt/python/cp38-cp38/bin/python3.8 ci/ext.py wheel --audit && \
/opt/python/cp39-cp39/bin/python3.9 ci/ext.py wheel --audit && \
/opt/python/cp310-cp310/bin/python3.10 ci/ext.py wheel --audit && \
/opt/python/cp311-cp311/bin/python3.11 ci/ext.py wheel --audit && \
echo '===== Py3.8 Debug =====' && unzip -p dist/*debug*.whl datatable/_build_info.py && \
mv dist/*debug*.whl . && \
echo '===== Py3.7 =====' && unzip -p dist/*cp37*.whl datatable/_build_info.py && \
echo '===== Py3.8 =====' && unzip -p dist/*cp38*.whl datatable/_build_info.py && \
echo '===== Py3.9 =====' && unzip -p dist/*cp39*.whl datatable/_build_info.py && \
echo '===== Py3.10 =====' && unzip -p dist/*cp310*.whl datatable/_build_info.py && \
echo '===== Py3.11 =====' && unzip -p dist/*cp311*.whl datatable/_build_info.py && \
mv *debug*.whl dist/ && \
ls -la dist"
"""
Expand Down Expand Up @@ -435,6 +443,20 @@ ansiColor('xterm') {
}
}
}) <<
namedStage('Test x86_64-manylinux-py311', { stageName, stageDir ->
node(NODE_LINUX) {
buildSummary.stageWithSummary(stageName, stageDir) {
cleanWs()
dumpInfo()
dir(stageDir) {
unstash 'datatable-sources'
unstash 'x86_64-manylinux-wheels'
test_in_docker("x86_64-manylinux-py311", "311",
DOCKER_IMAGE_X86_64_MANYLINUX)
}
}
}
}) <<
namedStage('Test ppc64le-manylinux-py37', doPpcTests, { stageName, stageDir ->
node(NODE_PPC) {
buildSummary.stageWithSummary(stageName, stageDir) {
Expand Down Expand Up @@ -505,6 +527,20 @@ ansiColor('xterm') {
}
}
}) <<
namedStage('Test ppc64le-manylinux-py311', doPpcTests && doPy38Tests, { stageName, stageDir ->
node(NODE_PPC) {
buildSummary.stageWithSummary(stageName, stageDir) {
cleanWs()
dumpInfo()
dir(stageDir) {
unstash 'datatable-sources'
unstash 'ppc64le-manylinux-wheels'
test_in_docker("ppc64le-manylinux-py311", "311",
DOCKER_IMAGE_PPC64LE_MANYLINUX)
}
}
}
}) <<
namedStage('Test x86_64-macos-py37', { stageName, stageDir ->
node(NODE_MACOS) {
buildSummary.stageWithSummary(stageName, stageDir) {
Expand Down Expand Up @@ -792,6 +828,7 @@ def get_python_for_docker(String pyver, String image) {
if (pyver == "38") return "/opt/python/cp38-cp38/bin/python3.8"
if (pyver == "39") return "/opt/python/cp39-cp39/bin/python3.9"
if (pyver == "310") return "/opt/python/cp310-cp310/bin/python3.10"
if (pyver == "311") return "/opt/python/cp311-cp311/bin/python3.11"
}
throw new Exception("Unknown python ${pyver} for docker ${image}")
}
Expand Down
101 changes: 65 additions & 36 deletions ci/appveyor.yml
@@ -1,7 +1,7 @@
image:
- Visual Studio 2019
- macos-monterey
- Ubuntu
- macos-monterey

init:
# Uncomment the line below to enable RDP access on AppVeyor
Expand Down Expand Up @@ -82,18 +82,18 @@ build_script:


# =======================================================================
# Build and test wheel for Python 3.10
# Build and test wheel for Python 3.11
# =======================================================================

source $HOME/venv3.10/bin/activate
source $HOME/venv3.11/bin/activate

python -V

python ci/ext.py wheel

DT_WHEEL=`ls dist/*-cp310-*.whl`
DT_WHEEL=`ls dist/*-cp311-*.whl`

echo "----- _build_info.py for Python 3.10 ------------------------------"
echo "----- _build_info.py for Python 3.11 ------------------------------"

cat src/datatable/_build_info.py

Expand Down Expand Up @@ -183,42 +183,37 @@ build_script:



if ($env:DT_RELEASE -eq "True") {


# =======================================================================
# Build and test wheel for Python 3.7
# =======================================================================

$env:PATH = "C:/Python37-x64;C:/Python37-x64/Scripts;$DEFAULT_PATH"
# =======================================================================
# Build and test wheel for Python 3.7
# =======================================================================

python -V
$env:PATH = "C:/Python37-x64;C:/Python37-x64/Scripts;$DEFAULT_PATH"

python ci/ext.py wheel
python -V

$DT_WHEEL = ls dist/*-cp37-*.whl
python ci/ext.py wheel

echo "DT_WHEEL = $DT_WHEEL"
$DT_WHEEL = ls dist/*-cp37-*.whl

echo "----- _build_info.py for Python 3.7 ------------------------------"
echo "DT_WHEEL = $DT_WHEEL"

cat src/datatable/_build_info.py
echo "----- _build_info.py for Python 3.7 ------------------------------"

echo "------------------------------------------------------------------"
cat src/datatable/_build_info.py

python -m pip install --upgrade pip
echo "------------------------------------------------------------------"

python -m pip install $DT_WHEEL
python -m pip install --upgrade pip

python -m pip install pytest docutils pandas pyarrow
python -m pip install $DT_WHEEL

python -m pytest -ra --maxfail=10 -Werror -vv --showlocals ./tests/
python -m pip install pytest docutils pandas pyarrow

if(!$?) { Exit $LASTEXITCODE }
python -m pytest -ra --maxfail=10 -Werror -vv --showlocals ./tests/

python -m pip uninstall -y $DT_WHEEL
if(!$?) { Exit $LASTEXITCODE }

}
python -m pip uninstall -y $DT_WHEEL



Expand Down Expand Up @@ -257,20 +252,20 @@ build_script:


# =======================================================================
# Build and test wheel for Python 3.9
# Build and test debug wheel for Python 3.8
# =======================================================================

$env:PATH = "C:/Python39-x64;C:/Python39-x64/Scripts;$DEFAULT_PATH"
$env:PATH = "C:/Python38-x64;C:/Python38-x64/Scripts;$DEFAULT_PATH"

python -V

python ci/ext.py wheel
python ci/ext.py debugwheel

$DT_WHEEL = ls dist/*-cp39-*.whl
$DT_WHEEL = ls dist/*debug-cp38-*.whl

echo "DT_WHEEL = $DT_WHEEL"

echo "----- _build_info.py for Python 3.9 ------------------------------"
echo "----- _build_info.py for Python 3.8 debug wheel ------------------"

cat src/datatable/_build_info.py

Expand All @@ -291,20 +286,20 @@ build_script:


# =======================================================================
# Build and test debug wheel for Python 3.9
# Build and test wheel for Python 3.9
# =======================================================================

$env:PATH = "C:/Python39-x64;C:/Python39-x64/Scripts;$DEFAULT_PATH"

python -V

python ci/ext.py debugwheel
python ci/ext.py wheel

$DT_WHEEL = ls dist/*debug-cp39-*.whl
$DT_WHEEL = ls dist/*-cp39-*.whl

echo "DT_WHEEL = $DT_WHEEL"

echo "----- _build_info.py for Python 3.9 debug wheel ------------------"
echo "----- _build_info.py for Python 3.9 ------------------------------"

cat src/datatable/_build_info.py

Expand Down Expand Up @@ -356,3 +351,37 @@ build_script:

python -m pip uninstall -y $DT_WHEEL



# =======================================================================
# Build and test wheel for Python 3.11
# =======================================================================

$env:PATH = "C:/Python311-x64;C:/Python311-x64/Scripts;$DEFAULT_PATH"

python -V

python ci/ext.py wheel

$DT_WHEEL = ls dist/*-cp311-*.whl

echo "DT_WHEEL = $DT_WHEEL"

echo "----- _build_info.py for Python 3.11 ------------------------------"

cat src/datatable/_build_info.py

echo "------------------------------------------------------------------"

python -m pip install --upgrade pip

python -m pip install $DT_WHEEL

python -m pip install pytest docutils pandas pyarrow

python -m pytest -ra --maxfail=10 -Werror -vv --showlocals ./tests/

if(!$?) { Exit $LASTEXITCODE }

python -m pip uninstall -y $DT_WHEEL

1 change: 1 addition & 0 deletions ci/ext.py
Expand Up @@ -337,6 +337,7 @@ def build_extension(cmd, verbosity=3):
"-Weverything",
"-Wno-c++98-compat-pedantic",
"-Wno-c99-extensions",
"-Wno-disabled-macro-expansion",
"-Wno-exit-time-destructors",
"-Wno-float-equal",
"-Wno-global-constructors",
Expand Down
11 changes: 6 additions & 5 deletions docs/api/dt/by.rst
Expand Up @@ -26,11 +26,12 @@
will select those rows where column A reaches its peak value within
each group (there could be multiple such rows within each group).

- Before ``j`` is evaluated, the ``by()`` clause adds all its columns
at the start of ``j`` (unless ``add_columns`` argument is ``False``). If
``j`` is a "select-all" slice (i.e. ``:``), then those columns will
also be excluded from the list of all columns so that they will be
present in the output only once.
- Before ``j`` is evaluated, the ``by()`` clause adds all the groupby
columns at the start of ``j`` (unless ``add_columns`` argument is
``False``). If ``j`` is a "select-all" slice (i.e. ``:`` or
``f[:]``), then the groupby columns will be excluded
from the list of all columns, so that they will be present in the output
only once.

- During evaluation of ``j``, the reducer functions, such as
:func:`min`, :func:`sum`, etc, will be evaluated by-group, that is
Expand Down
23 changes: 23 additions & 0 deletions docs/api/dt/categories.rst
@@ -0,0 +1,23 @@

.. xfunction:: datatable.categories
:src: src/core/expr/fexpr_categories.cc pyfn_categories
:tests: tests/types/test-categorical.py
:cvar: doc_dt_categories
:signature: categories(cols)

.. x-version-added:: 1.1.0

Get categories for categorical data.

Parameters
----------
cols: FExpr
Input categorical data.

return: FExpr
f-expression that returns categories for each column
from `cols`.

except: TypeError
The exception is raised when one of the columns from `cols`
has a non-categorical type.
12 changes: 11 additions & 1 deletion docs/api/dt/cbind.rst
Expand Up @@ -13,16 +13,26 @@
Parameters
----------
frames: Frame | List[Frame] | None
The list/tuple/sequence/generator expression of Frames to append.
It may also contain `None` values, which will be simply
skipped.

force: bool
If `True`, allows Frames to be appended even if they have unequal
number of rows. The resulting Frame will have number of rows equal
to the largest among all Frames. Those Frames which have less
than the largest number of rows, will be padded with NAs (with the
exception of Frames having just 1 row, which will be replicated
instead of filling with NAs).

return: Frame
A new frame that is created by appending columns from `frames`.


See also
--------
- :func:`rbind()` -- function for row-binding several frames.
- :meth:`dt.Frame.cbind()` -- Frame method for cbinding some frames to
- :meth:`dt.Frame.cbind()` -- Frame method for cbinding several frames to
another.


Expand Down
23 changes: 23 additions & 0 deletions docs/api/dt/codes.rst
@@ -0,0 +1,23 @@

.. xfunction:: datatable.codes
:src: src/core/expr/fexpr_codes.cc pyfn_codes
:tests: tests/types/test-categorical.py
:cvar: doc_dt_codes
:signature: codes(cols)

.. x-version-added:: 1.1.0

Get integer codes for categorical data.

Parameters
----------
cols: FExpr
Input categorical data.

return: FExpr
f-expression that returns integer codes for each column
from `cols`.

except: TypeError
The exception is raised when one of the columns from `cols`
has a non-categorical type.