New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New caching methods cannot detect the change of df column name #6236
Comments
Hey @PierXuY, thank you so much for flagging this behavior! Our team is investigating further. |
Thanks for reporting this. I was able to reproduce it here. The reason is that we are hashing the dataframe based on its ID. This probably means that all changes that are applied in place will not lead to a new ID and, subsequently, not lead to recompute the cached function. A potential workaround would be to create a new copy of the dataframe and applying the renaming on the copy. |
Hey @PierXuY 👋 Good news! In the next Streamlit release (1.24.0), we're bringing back That allows you to override Streamlit's hashing of dataframes based on its id. Here's an example you can use to verify it works.. available in import pandas as pd
import streamlit as st
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})
@st.cache_data(hash_funcs={pd.core.frame.DataFrame: lambda x: str(x)})
def show(df):
return df
columns_name = st.text_input("New column name")
if columns_name:
try:
df.columns = columns_name.split(",")
except:
df.columns = ['A','B','C']
st.error('Invalid, please enter three column names separated by commas, such as "q, w, e".')
if st.button('add a column'):
df['new'] = 4
st.write("st.dataframe(df)")
st.dataframe(df)
st.write("st.dataframe(show(df))")
st.dataframe(show(df)) Note: the hash func used above, def hash_dataframe_custom(df):
h1 = pd.util.hash_pandas_object(df)
column_names = list(df.columns)
column_names.append(h1)
return column_names
@st.cache_data(hash_funcs={pd.core.frame.DataFrame: hash_dataframe_custom})
def show(df):
return df and it still works as expected. |
Fixed in #7331 |
Checklist
Summary
Use @st.cache_data, when the input parameter of the function is pd.DataFrame, the change of column name cannot be detected. When the column name of df is changed, the wrong df will be returned!
Reproducible Code Example
Steps To Reproduce
1.Run the above code using streamlit.
2.Enter three column names in the text box, separated by commas.
3.You will see that the df processed by the show function has not changed, that is, the df with the new column name has not been cached again.
4.Click add a column,you will see that st.dataframe(show(df)) has changed, but it is still not the correct data.
Expected Behavior
The change of df column name can also be detected.
Current Behavior
No response
Is this a regression?
Debug info
Additional Information
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: