Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4652: Support categorical data in from_dataframe #4737

Merged
merged 2 commits into from
Aug 1, 2022

Conversation

pyrito
Copy link
Collaborator

@pyrito pyrito commented Jul 29, 2022

Signed-off-by: Karthik Velayutham vkarthik@ponder.io

What do these changes do?

Currently we do not handle the case when we call from_dataframe on a DF containing category types. This is because _get_validity_buffer didn't account for the fact that category types use sentinel values instead of a bit mask for null types. This PR adds a bit of clean-up and fixes this issue.

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Dataframes with categorical columns cannot be interchanged #4652
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

Signed-off-by: Karthik Velayutham <vkarthik@ponder.io>
@pyrito pyrito requested a review from a team as a code owner July 29, 2022 20:58
@codecov
Copy link

codecov bot commented Jul 29, 2022

Codecov Report

Merging #4737 (6034bd4) into master (05bf659) will increase coverage by 4.48%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master    #4737      +/-   ##
==========================================
+ Coverage   85.26%   89.74%   +4.48%     
==========================================
  Files         259      260       +1     
  Lines       19218    19501     +283     
==========================================
+ Hits        16386    17501    +1115     
+ Misses       2832     2000     -832     
Impacted Files Coverage Δ
...frame/pandas/exchange/dataframe_protocol/column.py 76.36% <75.00%> (+3.63%) ⬆️
modin/logging/config.py 94.59% <0.00%> (-1.30%) ⬇️
modin/experimental/batch/test/test_pipeline.py 91.66% <0.00%> (ø)
modin/pandas/series.py 94.33% <0.00%> (+0.24%) ⬆️
modin/pandas/series_utils.py 99.43% <0.00%> (+0.56%) ⬆️
...ndas/exchange/dataframe_protocol/from_dataframe.py 90.53% <0.00%> (+0.59%) ⬆️
modin/core/io/text/excel_dispatcher.py 94.01% <0.00%> (+0.85%) ⬆️
...ns/pandas_on_ray/partitioning/partition_manager.py 82.19% <0.00%> (+1.36%) ⬆️
...tations/pandas_on_python/partitioning/partition.py 93.75% <0.00%> (+2.08%) ⬆️
... and 33 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@YarShev YarShev merged commit 8521bbe into modin-project:master Aug 1, 2022
YarShev pushed a commit to YarShev/modin that referenced this pull request Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataframes with categorical columns cannot be interchanged
4 participants