Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Column aliasing #3333

Merged
merged 11 commits into from
Sep 20, 2022
22 changes: 13 additions & 9 deletions docs/api/fexpr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,29 @@
:src: src/core/expr/fexpr.h FExpr
:cvar: doc_FExpr

FExpr is an object that encapsulates computations to be done on a frame.
``FExpr`` is a class that encapsulates computations to be done on a frame.

FExpr objects are rarely constructed directly (though it is possible too),
``FExpr`` objects are rarely constructed directly (though it is possible too),
instead they are more commonly created as inputs/outputs from various
functions in :mod:`datatable`.

Consider the following example::

math.sin(2 * f.Angle)

Here accessing column "Angle" in namespace ``f`` creates an ``FExpr``.
Here accessing column "Angle" in namespace ``f`` creates an ``FExpr`` object.
Multiplying this ``FExpr`` by a python scalar ``2`` creates a new ``FExpr``.
And finally, applying the sine function creates yet another ``FExpr``. The
resulting expression can be applied to a frame via the
:meth:`DT[i,j] <dt.Frame.__getitem__>` method, which will compute that expression
:meth:`DT[i, j] <dt.Frame.__getitem__>` method, which will compute that expression
using the data of that particular frame.

Thus, an ``FExpr`` is a stored computation, which can later be applied to a
Frame, or to multiple frames.

Because of its delayed nature, an ``FExpr`` checks its correctness at the time
when it is applied to a frame, not sooner. In particular, it is possible for
the same expression to work with one frame, but fail with another. In the
the same expression to work on one frame, but fail on another. In the
example above, the expression may raise an error if there is no column named
"Angle" in the frame, or if the column exists but has non-numeric type.

Expand All @@ -36,7 +36,7 @@

Also, all functions that accept ``FExpr``s as arguments, will also accept
certain other python types as an input, essentially converting them into
``FExpr``s. Thus, we will sometimes say that a function accepts **FExpr-like**
``FExpr``s. Hence, we will sometimes say that a function accepts **FExpr-like**
objects as arguments.

All binary operators ``op(x, y)`` listed below work when either ``x``
Expand All @@ -53,11 +53,14 @@
* - :meth:`.__init__(e)`
- Create an ``FExpr``.

* - :meth:`.alias()`
- Assign new names to the columns from the ``FExpr``.

* - :meth:`.extend()`
- Append another FExpr.
- Append another ``FExpr``.

* - :meth:`.remove()`
- Remove columns from the FExpr.
- Remove columns from the ``FExpr``.


Arithmeritc operators
Expand Down Expand Up @@ -255,7 +258,7 @@
:class: api-table

* - :meth:`.__bool__()`
- Implicitly convert FExpr into a boolean value.
- Implicitly convert ``FExpr`` into a boolean value.

* - :meth:`.__getitem__()`
- Apply slice to a string column.
Expand Down Expand Up @@ -298,6 +301,7 @@
.__sub__() <fexpr/__sub__>
.__truediv__() <fexpr/__truediv__>
.__xor__() <fexpr/__xor__>
.alias() <fexpr/alias>
.as_type() <fexpr/as_type>
.count() <fexpr/count>
.countna() <fexpr/countna>
Expand Down
71 changes: 71 additions & 0 deletions docs/api/fexpr/alias.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@

.. xmethod:: datatable.FExpr.alias
:src: src/core/expr/fexpr.cc PyFExpr::alias
:cvar: doc_FExpr_alias
:signature: alias(self, *names)

Assign new names to the columns from the current ``FExpr``.


Parameters
----------
names: str | List[str] | Tuple[str]
New names that should be assigned to the columns from
the current ``FExpr``.

return: FExpr
New ``FExpr`` which sets new `names` on the current one.


Examples
--------

Create a frame::

>>> from datatable import dt, f, by
>>>
>>> DT = dt.Frame([[1, 2, 3], ["one", "two", "three"]])
>>> DT
| C0 C1
| int32 str32
-- + ----- -----
0 | 1 one
1 | 2 two
2 | 3 three
[3 rows x 2 columns]


Assign new names when selecting data from the frame::

>>> DT[:, f[:].alias("numbers", "strings")]
| numbers strings
| int32 str32
-- + ------- -------
0 | 1 one
1 | 2 two
2 | 3 three
[3 rows x 2 columns]


Assign new name for the newly computed column only::

>>> DT[:, [f[:], (f[0] * f[0]).alias("numbers_squared")]]
| C0 C1 numbers_squared
| int32 str32 int32
-- + ----- ----- ---------------
0 | 1 one 1
1 | 2 two 4
2 | 3 three 9
[3 rows x 3 columns]


Assign new name for the group by column::

>>> DT[:, f[1], by(f[0].alias("numbers"))]
| numbers C1
| int32 str32
-- + ------- -----
0 | 1 one
1 | 2 two
2 | 3 three
[3 rows x 2 columns]
5 changes: 4 additions & 1 deletion docs/releases/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
-[new] Added function :func:`dt.cumcount()` and :func:`dt.ngroup()`,
to return the row number and group number respectively. [#3279]

-[enh] Added reducer functions :func:`dt.countna()` and :func:`dt.nunique()`. [#2999]
-[new] Added reducer functions :func:`dt.countna()` and :func:`dt.nunique()`. [#2999]

-[new] Class :class:`dt.FExpr` now has method :meth:`.nunique()`,
which behaves exactly as the equivalent base level function :func:`dt.nunique()`.
Expand All @@ -101,6 +101,9 @@
-[new] Added function :func:`dt.fillna()`, as well as :meth:`.fillna()` method,
to impute missing values. [#3279]

-[new] Class :class:`dt.FExpr` now has method :meth:`.alias()`,
to assign new names to the selected columns. [#2684]

-[enh] Function :func:`dt.re.match()` now supports case insensitive matching. [#3216]

-[enh] Function :func:`dt.qcut()` can now be used in a groupby context. [#3165]
Expand Down
1 change: 1 addition & 0 deletions src/core/documentation.h
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ extern const char* doc_Frame_types;
extern const char* doc_Frame_view;

extern const char* doc_FExpr;
extern const char* doc_FExpr_alias;
extern const char* doc_FExpr_as_type;
extern const char* doc_FExpr_count;
extern const char* doc_FExpr_countna;
Expand Down
45 changes: 45 additions & 0 deletions src/core/expr/fexpr.cc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "documentation.h"
#include "expr/expr.h" // OldExpr
#include "expr/fexpr.h"
#include "expr/fexpr_alias.h"
#include "expr/fexpr_column.h"
#include "expr/fexpr_dict.h"
#include "expr/fexpr_frame.h"
Expand Down Expand Up @@ -293,6 +294,50 @@ DECLARE_METHOD(&PyFExpr::re_match)
// Miscellaneous
//------------------------------------------------------------------------------

oobj PyFExpr::alias(const XArgs& args) {
strvec names_vec;
size_t argi = 0;

for (auto arg : args.varargs()) {
if (arg.is_string()) {
names_vec.push_back(arg.to_string());
} else if (arg.is_list_or_tuple()) {
py::oiter names_iter = arg.to_oiter();
names_vec.reserve(names_iter.size());
size_t namei = 0;

for (auto name : names_iter) {
if (name.is_string()) {
names_vec.emplace_back(name.to_string());
} else {
throw TypeError()
<< "`datatable.FExpr.alias()` expects all elements of lists/tuples "
<< "of names to be strings, instead for name `" << argi << "` "
<< "element `" << namei << "` is "
<< name.typeobj();
}
namei++;
}

} else {
throw TypeError()
<< "`datatable.FExpr.alias()` expects all names to be strings, or "
<< "lists/tuples of strings, instead name `" << argi << "` is "
<< arg.typeobj();
}
argi++;
}

return PyFExpr::make(new FExpr_Alias(ptrExpr(expr_), std::move(names_vec)));

}


DECLARE_METHOD(&PyFExpr::alias)
->name("alias")
->docs(dt::doc_FExpr_alias)
->allow_varargs();


oobj PyFExpr::as_type(const XArgs& args) {
auto as_typeFn = oobj::import("datatable", "as_type");
Expand Down
1 change: 1 addition & 0 deletions src/core/expr/fexpr.h
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ class PyFExpr : public py::XObject<PyFExpr> {
py::oobj len(); // [DEPRECATED]
py::oobj re_match(const py::XArgs&); // [DEPRECATED]

py::oobj alias(const py::XArgs&);
py::oobj as_type(const py::XArgs&);
py::oobj count(const py::XArgs&);
py::oobj countna(const py::XArgs&);
Expand Down
71 changes: 71 additions & 0 deletions src/core/expr/fexpr_alias.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
//------------------------------------------------------------------------------
// Copyright 2022 H2O.ai
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
// to deal in the Software without restriction, including without limitation
// the rights to use, copy, modify, merge, publish, distribute, sublicense,
// and/or sell copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
// IN THE SOFTWARE.
//------------------------------------------------------------------------------
#include "expr/eval_context.h"
#include "expr/fexpr_alias.h"
namespace dt {
namespace expr {


FExpr_Alias::FExpr_Alias(ptrExpr&& arg, strvec&& names) :
arg_(std::move(arg)),
names_(std::move(names))
{}


std::string FExpr_Alias::repr() const {
std::string out = "alias";
out += '(';
out += arg_->repr();
out += ", [";
for (auto name : names_) {
out += name;
out += ",";
}
out += "]";
out += ')';
return out;
}


Workframe FExpr_Alias::evaluate_n(EvalContext& ctx) const {
Workframe wf = arg_->evaluate_n(ctx);
if (wf.ncols() != names_.size()) {
throw ValueError()
<< "The number of columns does not match the number of names: "
<< wf.ncols() << " vs " << names_.size();
}

Workframe wf_out(ctx);
auto gmode = wf.get_grouping_mode();

for (size_t i = 0; i < wf.ncols(); ++i) {
Workframe arg_out(ctx);
Column col = wf.retrieve_column(i);
arg_out.add_column(std::move(col), std::string(names_[i]), gmode);
wf_out.cbind( std::move(arg_out) );
}

return wf_out;
}


}} // dt::expr
42 changes: 42 additions & 0 deletions src/core/expr/fexpr_alias.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
//------------------------------------------------------------------------------
// Copyright 2022 H2O.ai
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
// to deal in the Software without restriction, including without limitation
// the rights to use, copy, modify, merge, publish, distribute, sublicense,
// and/or sell copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
// IN THE SOFTWARE.
//------------------------------------------------------------------------------
#ifndef dt_EXPR_FEXPR_ALIAS_h
#define dt_EXPR_FEXPR_ALIAS_h
#include "expr/fexpr_func.h"
namespace dt {
namespace expr {


class FExpr_Alias : public FExpr_Func {
private:
ptrExpr arg_;
strvec names_;

public:
FExpr_Alias(ptrExpr&& arg, strvec&& names);
std::string repr() const override;
Workframe evaluate_n(EvalContext& ctx) const override;
};


}} // dt::expr
#endif