Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for python in operator in filtering functions #699

Open
johnygomez opened this issue Jan 24, 2018 · 14 comments
Open

Add support for python in operator in filtering functions #699

johnygomez opened this issue Jan 24, 2018 · 14 comments
Labels
improve Improvement of an existing functionality

Comments

@johnygomez
Copy link
Member

I'd like to filter rows according to functions like

lambda x: x[0] in my_list

which use pythonic syntax (syntactic sugar). Currently I need to rewrite this to primitive formula, testing all elements in the list separately.

@johnygomez johnygomez added the improve Improvement of an existing functionality label Jan 24, 2018
@st-pasha st-pasha changed the title Add support for python syntax in filtering functions Add support for python in operator in filtering functions Jan 24, 2018
@st-pasha
Copy link
Contributor

Update https://stackoverflow.com/questions/61494957 when this is implemented

@st-pasha st-pasha removed their assignment Sep 24, 2020
@ghost
Copy link

ghost commented Apr 2, 2021

Any updates when this might be implemented?

@samukweku
Copy link
Collaborator

I guess the core maintainers are currently focused on building up the time series functionality in datatable; however, since it is open source, contributions are very much welcome.

@ghost
Copy link

ghost commented Apr 3, 2021

I doubt I have the skills and deep level understanding to contribute such a feature. The fact that this feature is still missing implies to me that it takes some time and sophistication to develop it, hence the maintainers weren't able to include it so far.
Regardless of that, what are the necessary educational resources to begin to understand how datatable works under the hood?

@samukweku
Copy link
Collaborator

@Peter-Pasta I am still finding my way around the source code. The core maintainers can explain better

@st-pasha
Copy link
Contributor

st-pasha commented Apr 5, 2021

We have a tutorial on creating a new datatable function: https://datatable.readthedocs.io/en/latest/develop/create-fexpr.html

Now, since in is an operator and not a regular function, the process will be slightly more complicated: you'd need to fill the tp_as_sequence slot and implement the sq_contains method.

As for the "core" of the function, then there are two examples that are quite similar: the replace() function, which compares each value with a list (or map) of values, and the join() function which compares each value with a sorted column via binary search.

Overall, on a difficulty scale from 1 (easy) to 5 (hard), I would rate this task as 2 or 3.

@samukweku
Copy link
Collaborator

samukweku commented Apr 11, 2021

I think it might be easier to write a function, instead of an operator for in, maybe dt.in. I would like to give it a shot

@samukweku
Copy link
Collaborator

Also need guidance @st-pasha @oleksiyskononenko ; when building datatable in editable mode, I dont have an easy-install.pth in my site-packages folder, only a easy-install.py file. As such, I cant run this command: echo "`pwd`/src" >> ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth

@samukweku
Copy link
Collaborator

@oleksiyskononenko @st-pasha Any ideas on how I can fix the issue above?

@st-pasha
Copy link
Contributor

@samukweku Sorry, I was on vacation last week and didn't see your message.

So the main challenge with "editable mode" installations in python is that there is no official PEP standard for this, which makes it hard to provide reliable instructions here. You can try one of the following approaches:

  1. Create the easy-install.pth file using the command above. It should work as-is, or if you have an older version of shell, try echo "`pwd`/src" >> `ls ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth` .
  2. Create a virtual environment specifically for datatable development, using the virtualenv command.

@samukweku
Copy link
Collaborator

@st-pasha , still having issues with the installation. Sucessfully got it as editable. However, the datatable version is 0.11.1. I uninstalled it, (pip uninstall datatable), thinking that would take care of the problem (as suggested here); however I get the error message below, when I try to run make test :

make test                                                                                                                                                             (make_mistakes) 
python -m pytest -ra --maxfail=10 -Werror tests
ImportError while loading conftest '/home/sam/github/datatable/tests/conftest.py'.
tests/__init__.py:14: in <module>
    from datatable.lib import core
E   ModuleNotFoundError: No module named 'datatable'
make: *** [Makefile:59: test] Error 4

Could you kindly suggest how I can fix this?

@st-pasha
Copy link
Contributor

On my computer I have the following configuration: the repository is checked out into

$ pwd
/Users/pasha/github/datatable

The content of the "easy-install.pth" is

$ ls ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth
/Users/pasha/py36/lib/python3.6/site-packages/easy-install.pth
$ cat `ls ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth`
/Users/pasha/github/datatable/src

And I can verify that this works by checking

$ python
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import datatable
>>> datatable.__file__
'/Users/pasha/github/datatable/src/datatable/__init__.py'

The import command may fail like this if the core wasn't compiled yet with either make debug or make build:

>>> import datatable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pasha/github/datatable/src/datatable/__init__.py", line 23, in <module>
    from .frame import Frame
  File "/Users/pasha/github/datatable/src/datatable/frame.py", line 23, in <module>
    from datatable.lib._datatable import Frame
  File "/Users/pasha/github/datatable/src/datatable/lib/__init__.py", line 31, in <module>
    from . import _datatable as core
ImportError: cannot import name '_datatable'

However, if the import says that datatable not found, then it would indicate the installation in editable mode failed somehow.

@samukweku
Copy link
Collaborator

samukweku commented Apr 22, 2021

@st-pasha thanks; found the error on my end and fixed; the echo part wasn't copying the right thing to my easy-install.pth file. All good now.

Another question: if changes are made to the C++ code, make build is required. How do I test code changes in the python section? say for instance i want f.string_column.len() to return 2. silly example but i hope you get my point. This does not involve any C++, so how do I do that?

@st-pasha
Copy link
Contributor

If you make changes to C++, you need to run make build (or make debug) and then restart python console (or reload kernel in jupyter). If you make changes to python only, then you just need to restart the python console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improve Improvement of an existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants