New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow each Column in a DataTable have its own RowIndex #1188
Labels
High priority
high-priority tasks
improve
Improvement of an existing functionality
join
refactor
Internal code changes, clean-ups or reorganizations that are not externally visible
Milestone
Comments
Merged
st-pasha
added a commit
that referenced
this issue
Aug 2, 2018
Issue #1188 appears to be much harder than it seemed at the first sight, so I'll attempt to implement it in several batches. * Internal method `columns_from_array()` removed -- same functionality can be achieved via `columns_from_columns()`; * `ColumnSet` objects can now contain columns with different `RowIndex`es (however `DataTable`s still can't); * Fixed a bug in DataTable's constructor where the last column was not checked properly during construction (the bug never manifested itself though); * Python-facing `_Column` object acquired new property `rowindex`, and 2 new methods: `replace_rowindex()` and `topython()` (the last one mostly useful for debug purposes); * `DataTable.isview` semantics has changed: now it returns True if *any* of the columns has a RowIndex (this change is backward-compatible); * Python-facing `_RowIndex` object acquired property `ptr`, which can be used in python to compare whether to`RowIndex`es are equal; * Methods `make_*()` in `EvaluationEngine` were removed, since they served as a useless indirection level; * `SliceCSNode` no longer uses `columns_from_slice()` method (preparing to remove the latter); * Added methods to `EvaluationEngine` to distinguish between "source"/"final" RowIndexes -- these are still work in progress, and currently coexist with the old-style `.rowindex` property; * Attempt to disentangle final/source RowIndex calculation in RowsNode and GroupbyNode -- still WIP; * Several places where column's "meta" was mentioned were cleaned up -- there is no such concept anymore.
abal5
pushed a commit
that referenced
this issue
Aug 27, 2018
Issue #1188 appears to be much harder than it seemed at the first sight, so I'll attempt to implement it in several batches. * Internal method `columns_from_array()` removed -- same functionality can be achieved via `columns_from_columns()`; * `ColumnSet` objects can now contain columns with different `RowIndex`es (however `DataTable`s still can't); * Fixed a bug in DataTable's constructor where the last column was not checked properly during construction (the bug never manifested itself though); * Python-facing `_Column` object acquired new property `rowindex`, and 2 new methods: `replace_rowindex()` and `topython()` (the last one mostly useful for debug purposes); * `DataTable.isview` semantics has changed: now it returns True if *any* of the columns has a RowIndex (this change is backward-compatible); * Python-facing `_RowIndex` object acquired property `ptr`, which can be used in python to compare whether to`RowIndex`es are equal; * Methods `make_*()` in `EvaluationEngine` were removed, since they served as a useless indirection level; * `SliceCSNode` no longer uses `columns_from_slice()` method (preparing to remove the latter); * Added methods to `EvaluationEngine` to distinguish between "source"/"final" RowIndexes -- these are still work in progress, and currently coexist with the old-style `.rowindex` property; * Attempt to disentangle final/source RowIndex calculation in RowsNode and GroupbyNode -- still WIP; * Several places where column's "meta" was mentioned were cleaned up -- there is no such concept anymore.
abal5
pushed a commit
that referenced
this issue
Sep 13, 2018
Issue #1188 appears to be much harder than it seemed at the first sight, so I'll attempt to implement it in several batches. * Internal method `columns_from_array()` removed -- same functionality can be achieved via `columns_from_columns()`; * `ColumnSet` objects can now contain columns with different `RowIndex`es (however `DataTable`s still can't); * Fixed a bug in DataTable's constructor where the last column was not checked properly during construction (the bug never manifested itself though); * Python-facing `_Column` object acquired new property `rowindex`, and 2 new methods: `replace_rowindex()` and `topython()` (the last one mostly useful for debug purposes); * `DataTable.isview` semantics has changed: now it returns True if *any* of the columns has a RowIndex (this change is backward-compatible); * Python-facing `_RowIndex` object acquired property `ptr`, which can be used in python to compare whether to`RowIndex`es are equal; * Methods `make_*()` in `EvaluationEngine` were removed, since they served as a useless indirection level; * `SliceCSNode` no longer uses `columns_from_slice()` method (preparing to remove the latter); * Added methods to `EvaluationEngine` to distinguish between "source"/"final" RowIndexes -- these are still work in progress, and currently coexist with the old-style `.rowindex` property; * Attempt to disentangle final/source RowIndex calculation in RowsNode and GroupbyNode -- still WIP; * Several places where column's "meta" was mentioned were cleaned up -- there is no such concept anymore.
Merged
st-pasha
added a commit
that referenced
this issue
Dec 1, 2018
- `DataTable.resize_rows()` now works correctly for a DataTable where its columns have different row-indices; - This method is now more efficient: for multi-column Frames a view is created, without having to modify the data in each column; - Python `Frame.nrows` setter now always pads the Frame with NAs, even when it had only 1 row (previously 1-row frames were tiled); - Method `dt.repeat(frame, n)` added to create a frame by repeating an existing frame multiple times. This is a WIP for #1188
st-pasha
added
the
refactor
Internal code changes, clean-ups or reorganizations that are not externally visible
label
Jan 12, 2019
Prerequisite: #1477 |
st-pasha
added a commit
that referenced
this issue
Jan 29, 2019
st-pasha
added a commit
that referenced
this issue
Jan 29, 2019
This PR is the final step in implementing #1188: to allow each Column in a Frame to have its own distinct RowIndex. This allows us to: - cbind frames that have different rowindices without materializing individual columns; - perform joins without materializing the resulting frame; - Select filtered columns alongside the computed ones in `DT[i, j, ...]` expression. Additional changes: - Removed deprecated C API functions `datatable_get_column_data`, `datatable_unpack_slicerowindex`, `datatable_unpack_arrayrowindex` which are no longer used; - `DataTable::rowindex` property was removed; - Removed obsolete function `_datatable.get_internal_function_ptrs()`; - Added function `datatable.internal.get_rowindex(frame, i)`; - Removed `Frame.internal.rowindex` and `Frame.internal.rowindex_type`; - Fixed sort function when applied to a 1-row view Frame. Closes #1188
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
High priority
high-priority tasks
improve
Improvement of an existing functionality
join
refactor
Internal code changes, clean-ups or reorganizations that are not externally visible
This is needed to implement efficient joins: the columns from the Left frame and the columns from the Right frame will have different RowIndices applied to them.
This feature may also be handy when selecting regular columns alongside with the computed ones.
This change will remove the
rowindex
property inDataTable
object, possibly replacing it withnrowindices
field counting the number of different rowindices for all columns in the frame.The text was updated successfully, but these errors were encountered: