Add improved type parsing capabilities for `st.data_editor` #6551

LukasMasuch · 2023-04-23T19:36:40Z

📚 Context

This PR introduces some logic to determine the data type of the values in DataFrame column or index. I originally tried to prevent doing this, but some upcoming features of the column configuration project will require this, and it is also necessary for keeping the editing logic performant. Unfortunately, to actually get the correct column data kind in every possible situation, we need to combine info from the DataFrame column dtype, the inferred dtype via pd.api.types.infer_dtype as well as the field type from the Arrow schema :(

These are all the changes done in this PR:

Create a new column_config_utils.py module and move some parts of data_editor into this module without applying any code changes: _INDEX_IDENTIFIER, ColumnConfig, ColumnConfigMapping, _marshall_column_config
Implement a way to determine the correct underlying data type (-> column data kind) for any DataFrame column.
Use column schema (data kind) in all methods that apply edits: _apply_cell_edits, _apply_row_additions, _apply_dataframe_edits, _parse_value

The st.dataframe and st.data_editor components will have three different notions of data types, so here is an overview to make this a bit less confusing:

Column data kind (e.g. integer, float, string, bool): This is the data type of the values in the column.
Column type (e.g. text, number, selectbox): The column type is used in the frontend to provide certain display & editing capabilities. A column type can be compatible with multiple data kinds. And a data kind can be edited by different column types.
Data format (e.g. pd.DataFrame, List of values, Snowpark Table): This is the datastructure type of the input data and - in most cases - also the structure that is returned by the data_editor.

🧪 Testing Done

Screenshots included
Added/Updated unit tests
Added/Updated e2e tests

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

lib/streamlit/elements/lib/column_config_utils.py

lib/streamlit/elements/data_editor.py

lib/tests/streamlit/elements/lib/column_config_utils_test.py

willhuang1997 · 2023-04-25T18:24:13Z

lib/tests/streamlit/elements/lib/column_config_utils_test.py

+
+
+SHARED_DATA_KIND_TEST_CASES = [
+    (pd.Series(["a", "b", "c"], dtype=pd.StringDtype()), ColumnDataKind.STRING),


I see ["a","b","c"] and other such things duplicated. Should we create a variable for that and reuse so that if these tests need to be changed for x reason, u just have to change one spot instead of a lot of places? Same for the [1,2,3] and [1.1,2.2.,3.3] and [1,2.2,3]?

Unfortunately, all the cases in the SHARED_DATA_KIND_TEST_CASES are already the ones that work across all methods. The other cases are more specific to each specific determined method. So, it gets a lot harder to share even more cases. E.g. some are only supported by one method and others by multiple :(

For example, the string case ["a", "b", "c"] only works for all methods if the series is explicitly set to dtype=pd.StringDtype(). But arrow and inferre type can also handle this without the dtype being set

willhuang1997

LGTM

…e-parsing

* develop: Add improved type parsing capabilities for `st.data_editor` (streamlit#6551)

…t#6551) * Add functionality to check underlying types * Remove not-implemented types * Add comment * Some cleanup * Add unit test * Fix unit tests * Finish unit test * Add tests for index columns * Remove type compatibility checks * Remove refactoring * Remove changes to column config object * Remove final import * Fix test issue * Add dtype object to empty series for compatibility * Add negative int and float to test * Add a couple of comments about column data kind

LukasMasuch added 7 commits April 23, 2023 21:35

Add functionality to check underlying types

434e174

Remove not-implemented types

f437a9d

Add comment

6395828

Some cleanup

52a4c8f

Add unit test

4e77966

Fix unit tests

cb927e9

Finish unit test

88d3568

LukasMasuch added the security-assessment-completed label Apr 24, 2023

github-advanced-security bot found potential problems Apr 24, 2023

View reviewed changes

lib/streamlit/elements/lib/column_config_utils.py Dismissed Show dismissed Hide dismissed

LukasMasuch added 3 commits April 24, 2023 17:12

Add tests for index columns

2b5146c

Remove type compatibility checks

b3da87e

Remove refactoring

2bd2094

github-advanced-security bot found potential problems Apr 24, 2023

View reviewed changes

lib/streamlit/elements/lib/column_config_utils.py Fixed Show fixed Hide fixed

LukasMasuch added 2 commits April 24, 2023 17:52

Remove changes to column config object

1a3aa19

Remove final import

9d2d17a

LukasMasuch marked this pull request as ready for review April 24, 2023 15:55

LukasMasuch added 2 commits April 24, 2023 18:22

Fix test issue

ce696cf

Add dtype object to empty series for compatibility

fdffc21

willhuang1997 reviewed Apr 24, 2023

View reviewed changes

lib/streamlit/elements/data_editor.py Outdated Show resolved Hide resolved

willhuang1997 reviewed Apr 25, 2023

View reviewed changes

lib/tests/streamlit/elements/lib/column_config_utils_test.py Outdated Show resolved Hide resolved

willhuang1997 reviewed Apr 25, 2023

View reviewed changes

willhuang1997 approved these changes Apr 25, 2023

View reviewed changes

LukasMasuch added 3 commits April 25, 2023 22:25

Merge remote-tracking branch 'remote/develop' into feature/better-typ…

6f868c6

…e-parsing

Add negative int and float to test

8dcef4d

Add a couple of comments about column data kind

1166b90

LukasMasuch merged commit c84f17b into develop Apr 25, 2023
76 checks passed

tconkling added a commit to tconkling/streamlit that referenced this pull request Apr 25, 2023

Merge branch 'develop' into tim/FrontendAppLibSplit

45aaaf3

* develop: Add improved type parsing capabilities for `st.data_editor` (streamlit#6551)

sfc-gh-kmcgrady deleted the feature/better-type-parsing branch October 5, 2023 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add improved type parsing capabilities for `st.data_editor` #6551

Add improved type parsing capabilities for `st.data_editor` #6551

LukasMasuch commented Apr 23, 2023 •

edited

willhuang1997 Apr 25, 2023 •

edited

LukasMasuch Apr 25, 2023

LukasMasuch Apr 25, 2023

willhuang1997 left a comment



		SHARED_DATA_KIND_TEST_CASES = [
		(pd.Series(["a", "b", "c"], dtype=pd.StringDtype()), ColumnDataKind.STRING),

Add improved type parsing capabilities for st.data_editor #6551

Add improved type parsing capabilities for st.data_editor #6551

Conversation

LukasMasuch commented Apr 23, 2023 • edited

📚 Context

🧪 Testing Done

willhuang1997 Apr 25, 2023 • edited

Choose a reason for hiding this comment

LukasMasuch Apr 25, 2023

Choose a reason for hiding this comment

LukasMasuch Apr 25, 2023

Choose a reason for hiding this comment

willhuang1997 left a comment

Choose a reason for hiding this comment

Add improved type parsing capabilities for `st.data_editor` #6551

Add improved type parsing capabilities for `st.data_editor` #6551

LukasMasuch commented Apr 23, 2023 •

edited

willhuang1997 Apr 25, 2023 •

edited