Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit any Mapping as source of DataFrame #1

Closed
wants to merge 5 commits into from

Conversation

mrkn
Copy link
Owner

@mrkn mrkn commented May 22, 2024

This is a proof-of-concept patch to accept a Mapping to create a DataFrame.
Applying this change makes pd.DataFrame(Dict("a" => [1, 2, 3], "b" => [4, 5, 6])) in Julia work.

julia> df = pd.DataFrame(Dict("a" => [1, 2, 3], "b" => [4, 5, 6]))
Python:
   b  a
0  4  1
1  5  2
2  6  3

There is no performance degradation.

$ asv continuous -E virtualenv -b ^frame_ctor.FromDict origin/main HEAD
· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
·· Installing 684a22fb <support_mapping_in_dataframe> into virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
· Running 16 total benchmarks (2 commits * 1 environments * 8 benchmarks)
[ 0.00%] · For pandas commit 2aa155ae <main> (round 1/2):
[ 0.00%] ·· Building for virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
[ 0.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[ 3.12%] ··· Running (frame_ctor.FromDicts.time_dict_of_categoricals--)........
[25.00%] · For pandas commit 684a22fb <support_mapping_in_dataframe> (round 1/2):
[25.00%] ·· Building for virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
[25.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[28.12%] ··· Running (frame_ctor.FromDicts.time_dict_of_categoricals--)........
[50.00%] · For pandas commit 684a22fb <support_mapping_in_dataframe> (round 2/2):
[50.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[53.12%] ··· frame_ctor.FromDicts.time_dict_of_categoricals                                327±8μs
[56.25%] ··· frame_ctor.FromDicts.time_list_of_dict                                    16.8±0.09ms
[59.38%] ··· frame_ctor.FromDicts.time_nested_dict                                      16.1±0.2ms
[62.50%] ··· frame_ctor.FromDicts.time_nested_dict_columns                              16.3±0.3ms
[65.62%] ··· frame_ctor.FromDicts.time_nested_dict_index                                13.2±0.1ms
[68.75%] ··· frame_ctor.FromDicts.time_nested_dict_index_columns                       12.9±0.08ms
[71.88%] ··· frame_ctor.FromDicts.time_nested_dict_int64                                29.1±0.1ms
[75.00%] ··· frame_ctor.FromDictwithTimestamp.time_dict_with_timestamp_offsets                  ok
[75.00%] ··· ======== ============
              offset
             -------- ------------
              <Nano>   7.97±0.1ms
              <Hour>   10.9±0.2ms
             ======== ============

[75.00%] · For pandas commit 2aa155ae <main> (round 2/2):
[75.00%] ·· Building for virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
[75.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[78.12%] ··· frame_ctor.FromDicts.time_dict_of_categoricals                                330±2μs
[81.25%] ··· frame_ctor.FromDicts.time_list_of_dict                                     17.2±0.3ms
[84.38%] ··· frame_ctor.FromDicts.time_nested_dict                                     16.2±0.08ms
[87.50%] ··· frame_ctor.FromDicts.time_nested_dict_columns                              16.2±0.1ms
[90.62%] ··· frame_ctor.FromDicts.time_nested_dict_index                                13.2±0.2ms
[93.75%] ··· frame_ctor.FromDicts.time_nested_dict_index_columns                        13.5±0.2ms
[96.88%] ··· frame_ctor.FromDicts.time_nested_dict_int64                                29.7±0.3ms
[100.00%] ··· frame_ctor.FromDictwithTimestamp.time_dict_with_timestamp_offsets                  ok
[100.00%] ··· ======== =============
               offset
              -------- -------------
               <Nano>   8.16±0.07ms
               <Hour>   11.2±0.06ms
              ======== =============


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@mrkn mrkn force-pushed the support_mapping_in_dataframe branch from 10e3985 to d71995a Compare May 23, 2024 06:03
@mrkn mrkn closed this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants