Added implementation of nunique function #29

chraberturas · 2024-01-16T14:22:08Z

Feature

Solves: Feature: Implement nunique from Pandas API #16

What does this change introduce?

An implementation of the nunique function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nunique.html

This is intended to be a 1:1 implementation of the nunique function from pandas.

There are two changes:

This implementation returns a dictionary instead of a Series.
In accordance with the correspondence between q nulls and Python nulls established in the .pd conversion, the empty string is regarded as a null value for columns of type symbol.

General

Code

Has all temporary code used during development been removed?
Has all commented out (unused) code been removed?
Where reasonable have you ensured there is no duplication of existing code?
If applicable for your use-case have you ensured that the code is performant?

Testing

Have unit tests been created or existing ones updated to test this new functionality?

Documentation

Has documentation been added for all public code?
Has a release note been included for the new feature?
Has any documentation which would benefit from this feature been updated to use the most up to date functionality?
If a new class has been added has a documentation stub .md file associated with it been created?
If any documentation page has been created has it been added to mkdocs.yml
Have you checked your changes with a spell checker? (US English)

… between Python and kdb+

… into chraberturas/pandas-api-nunique

nipsn

Overall OK, but there are some adjustments that I think would need to be made before merging.

Also, some interesting flags have been raised in regards to mixed value columns that will need addressing at some point.

src/pykx/pandas_api/pandas_meta.py

tests/test_pandas_api.py

docs/user-guide/advanced/Pandas_API.ipynb

… feature/pandas-api-nunique # Conflicts: # src/pykx/pandas_api/pandas_meta.py

nipsn

Looks good. Your solution to type checking on columns looks more robust than what I proposed originally.

nipsn · 2024-01-18T10:34:51Z

Just spoke with the PyKX team. They told me that for this specific case, they would expect the behavior to be closer to what in q would be count distinct.

I think that as it stands right now the implementation follows these lines pretty closely. However, I think that in case we had a mixed type column we should no longer raise a NotImplementedError, so it closely resembles the q behavior. If it fails in q, it should fail in this version as well. We should also change the unit test that monitored this error to instead expect a QError.

TLDR: If we have several nulls of different types on a single column, they should all count as distinct values.

src/pykx/pandas_api/pandas_meta.py

* Added implementation of nunique function * Added test for handling strings nulls (" "), differentiating behavior between Python and kdb+ * Suggested changes. Error with mixed lists and tests for this case. * QError for mixed lists (suggested by Kx) * minor: rename filternan (suggested) --------- Co-authored-by: chraberturas <christian.aberturas@hablapps.com>

chraberturas and others added 5 commits December 12, 2023 12:03

Added implementation of nunique function

34ee237

Added test for handling strings nulls (" "), differentiating behavior…

217cedf

… between Python and kdb+

Added test for handling strings nulls (" "), differentiating behavior…

1c5dbda

… between Python and kdb+

Merge branch 'main' into chraberturas/pandas-api-nunique

a507b0d

Merge remote-tracking branch 'origin/chraberturas/pandas-api-nunique'…

257bece

… into chraberturas/pandas-api-nunique

chraberturas added documentation Improvements or additions to documentation python tests work in progress Working on it labels Jan 16, 2024

chraberturas added this to the Pandas API 2nd Block milestone Jan 16, 2024

chraberturas requested review from marcosvm13 and cperezln January 16, 2024 14:22

chraberturas self-assigned this Jan 16, 2024

chraberturas added Ready to review and removed work in progress Working on it labels Jan 16, 2024

chraberturas requested review from nipsn, tortolavivo23 and MiguelGomezC January 16, 2024 14:26

chraberturas marked this pull request as ready for review January 16, 2024 14:26

chraberturas linked an issue Jan 16, 2024 that may be closed by this pull request

Feature: Implement nunique from Pandas API #16

Open

marcosvm13 approved these changes Jan 16, 2024

View reviewed changes

cperezln approved these changes Jan 16, 2024

View reviewed changes

nipsn requested changes Jan 17, 2024

View reviewed changes

chraberturas added 3 commits January 18, 2024 08:39

Suggested changes. Error with mixed lists and tests for this case.

9fe428c

Suggested changes. Error with mixed lists and tests for this case.

0f04d8e

Merge remote-tracking branch 'origin/feature/pandas-api-nunique' into…

60ea6bb

… feature/pandas-api-nunique # Conflicts: # src/pykx/pandas_api/pandas_meta.py

chraberturas requested a review from nipsn January 18, 2024 08:07

nipsn approved these changes Jan 18, 2024

View reviewed changes

QError for mixed lists (suggested by Kx)

2aa3a6e

MiguelGomezC reviewed Jan 18, 2024

View reviewed changes

src/pykx/pandas_api/pandas_meta.py Outdated Show resolved Hide resolved

tortolavivo23 approved these changes Jan 19, 2024

View reviewed changes

minor: rename filternan (suggested)

4aff510

chraberturas merged commit abcc1b4 into feature/pandas-api-2nd-block Jan 22, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added implementation of nunique function #29

Added implementation of nunique function #29

chraberturas commented Jan 16, 2024 •

edited

Loading

nipsn left a comment

nipsn left a comment

nipsn commented Jan 18, 2024

Added implementation of nunique function #29

Added implementation of nunique function #29

Conversation

chraberturas commented Jan 16, 2024 • edited Loading

Feature

What does this change introduce?

General

Code

Testing

Documentation

nipsn left a comment

Choose a reason for hiding this comment

nipsn left a comment

Choose a reason for hiding this comment

nipsn commented Jan 18, 2024

chraberturas commented Jan 16, 2024 •

edited

Loading