Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Frame in pure C #1066

Closed
14 tasks done
st-pasha opened this issue May 30, 2018 · 0 comments · Fixed by #1714
Closed
14 tasks done

Reimplement Frame in pure C #1066

st-pasha opened this issue May 30, 2018 · 0 comments · Fixed by #1714
Assignees
Labels
EPIC ⭐ Big task that may encompass many smaller ones performance Issues focused on the speed of execution of various datatable functions. refactor Internal code changes, clean-ups or reorganizations that are not externally visible
Milestone

Comments

@st-pasha
Copy link
Contributor

st-pasha commented May 30, 2018

In order to achieve better micro performance, we should make Frame a pure-python class. This is especially important for handling the __getitem__ / __setitem__ / __call__ methods.

Methods that remain to convert into C++:

  • __repr__() ... __repr_pretty__()
  • view() (widget viewer)
  • rbind()
  • save() (rename into to_jay())
  • to_csv()
  • sort()
  • min(), max(), ..., nmodal()
  • min1(), max1(), ..., nmodal1()
  • to_pandas()
  • to_numpy()
  • materialize()
  • __sizeof__()
  • internal.check()
  • internal.isview()
@st-pasha st-pasha added refactor Internal code changes, clean-ups or reorganizations that are not externally visible performance Issues focused on the speed of execution of various datatable functions. labels May 30, 2018
@st-pasha st-pasha self-assigned this May 30, 2018
@st-pasha st-pasha mentioned this issue Jul 11, 2018
This was referenced Aug 14, 2018
st-pasha added a commit that referenced this issue Aug 22, 2018
* Added `varargs` and `varkwds` iterators to `PKArgs` class;
* Implement getters/setters for core `Frame` class properties `.nrows`, `.ncols` and `.shape`;
* Python `Frame` class now inherits from core `Frame` class, allowing incremental transition for #1066;
* Removed from python `Frame` properties `.nrows`, `.ncols` and `.shape` since they are now inherited;
* Removed `Frame.resize()` method;
* Small improvements to `py::obj` class;
* The `Args` class now better integrates with `py::obj`;
* Added `py::otuple` class for easier tuple creation.
abal5 pushed a commit that referenced this issue Aug 27, 2018
* Added `varargs` and `varkwds` iterators to `PKArgs` class;
* Implement getters/setters for core `Frame` class properties `.nrows`, `.ncols` and `.shape`;
* Python `Frame` class now inherits from core `Frame` class, allowing incremental transition for #1066;
* Removed from python `Frame` properties `.nrows`, `.ncols` and `.shape` since they are now inherited;
* Removed `Frame.resize()` method;
* Small improvements to `py::obj` class;
* The `Args` class now better integrates with `py::obj`;
* Added `py::otuple` class for easier tuple creation.
st-pasha added a commit that referenced this issue Aug 31, 2018
* Added method `frame.copy()`;
* Several modes of Frame initialization are now done in C++ instead of in Python;
* Creating a Frame from `core._DataTable` object is no longer supported (the use of these objects will be slowly phased out).

WIP for #1066
abal5 pushed a commit that referenced this issue Sep 13, 2018
* Added `varargs` and `varkwds` iterators to `PKArgs` class;
* Implement getters/setters for core `Frame` class properties `.nrows`, `.ncols` and `.shape`;
* Python `Frame` class now inherits from core `Frame` class, allowing incremental transition for #1066;
* Removed from python `Frame` properties `.nrows`, `.ncols` and `.shape` since they are now inherited;
* Removed `Frame.resize()` method;
* Small improvements to `py::obj` class;
* The `Args` class now better integrates with `py::obj`;
* Added `py::otuple` class for easier tuple creation.
abal5 pushed a commit that referenced this issue Sep 13, 2018
* Added method `frame.copy()`;
* Several modes of Frame initialization are now done in C++ instead of in Python;
* Creating a Frame from `core._DataTable` object is no longer supported (the use of these objects will be slowly phased out).

WIP for #1066
@st-pasha st-pasha added this to the Release 0.9.0 milestone Feb 8, 2019
st-pasha added a commit that referenced this issue Feb 8, 2019
This PR removes many internal Frame methods which we do no longer use (after switching to pure C++ computation). This makes us closer towards the ultimate goal of getting rid of `Frame.internal` entirely.

* Method `Frame.scalar()` is now deprecated: you can use `Frame[0, 0]` instead;
* Removed `Frame.internal.sort()`
* Removed `Frame.internal.to_scalar()`
* Removed `Frame.internal.delete_columns()`
* Removed `Frame.internal.replace_rowindex()`
* Removed `Frame.internal.replace_columns_slice()`
* Removed `Frame.internal.replace_columns_array()`
* Removed `Frame.internal.join()`
* Removed `py_Groupby` class
* Removed `DataTable::sortby()`

WIP for #1066
st-pasha added a commit that referenced this issue Feb 11, 2019
- `dt.internal.get_rowindex()` renamed into `dt.internal.frame_column_rowindex()`;
- added function `dt.internal.frame_column_data_r()`;
- `RowIndex` object now has `__repr__()`;
- removed classes `MeanReducer`, `MinMaxReducer`, `StdevReducer` -- instead use `ReduceExpr` class everywhere;
- in `BaseExpr` and derived classes removed code that was used for Llvm functionality;

WIP for #1066
st-pasha added a commit that referenced this issue Feb 20, 2019
Conversion was made slightly more efficient: now we don't need to select an individual column via `self[:, i]`, but instead the operation is controlled via a global parameter `pybuffers::single_col`. The numpy arrays returned from this are already 1D, so no need to apply `.ravel()` afterwards either.

WIP for #1066
st-pasha added a commit that referenced this issue Feb 23, 2019
* Eliminated python CsvWriter class, and moved all corresponding code to C++;
* Modernized logging functionality during csv writing: a new class `LogMessage` was added, which implements C++ stream-based logging;
* The code is now set up in such a way that it would be possible to provide custom logger to the csv writer;
* Added several new methods in `py::Arg` and `py::oobj`.

WIP for #1066
st-pasha added a commit that referenced this issue Feb 25, 2019
* `Frame.rbind()` method moved to C++;
* `Frame.append()` method marked as deprecated;
* `dt.cbind()` and `dt.rbind()` functions re-implemented in C++;
* added several new tests for rbind, to ensure better coverage;
* `rbind()` can now accept either a sequence or a list of `Frame`s, similar to `cbind()`.

WIP for #1066
@st-pasha st-pasha mentioned this issue Feb 25, 2019
st-pasha added a commit that referenced this issue Feb 25, 2019
* Method `Frame.save()` renamed into `Frame.to_jay()`;
* The old method name is deprecated, together with the NFF format (unless we find a compelling reason to keep NFF around);
* The C++ code for `.to_jay()` refactored.

WIP for #1066
st-pasha added a commit that referenced this issue Mar 6, 2019
Added documentation for this method, and a test.

WIP for #1066
st-pasha added a commit that referenced this issue Mar 6, 2019
* Rename .reify() -> .materialize()

* Frame.materialize() method moved to C++

Added documentation for this method, and a test.

WIP for #1066
st-pasha added a commit that referenced this issue Mar 8, 2019
This PR does final cleanup of the code that remained from the old `pydatatable::obj` structure, and completes the transition of `Frame` into C++. It also removes a lot of old supporting code, which is no longer used after the conversion.

- Property `Frame.internal` is removed completely;
- Small refactoring in `Column::from_buffer()` function;
- Removed files "py_datatable.h|cc", "py_column.h|cc", "py_utils.h|cc", "py_types.h|cc";
- Files "utils.h|cc" moved into "utils/misc.h|cc";
- Removed macros that MSVC had difficulty with;
- Internal function `obj.to_frame()` renamed into `obj.to_datatable()`;
- Minor code cleanup.

Closes #1066
@st-pasha st-pasha added the EPIC ⭐ Big task that may encompass many smaller ones label Mar 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPIC ⭐ Big task that may encompass many smaller ones performance Issues focused on the speed of execution of various datatable functions. refactor Internal code changes, clean-ups or reorganizations that are not externally visible
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant