New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash "Invalid input data. Must be a Pandas or Polars dataframe" on "row" question #465
Comments
Based on code from \pandasai\smart_datalake_init_.py (line 325):
And method _add_result_to_memory
I checked smart_df.datalake._memory.get_conversation() and see correct message: User: give me all names Bot: Here is the data you requested. It means that result['type'] is dataframe. Also I have no exception "Invalid input data. We cannot convert it to a dataframe." from \pandasai\smart_dataframe_init_.py |
I think problem is in \pandasai\smart_dataframe_init_.py
Let's check it:
Output: True False It means |
Or maybe even this one to check pandas and polaris without import polaris here:
But in this case I'm not sure what to do with pd.Series. Maybe better to import polaris here and check it in normal way. Or do not raise exception at all - just return original df if it's not string and we can't load it as DataFrame. |
@PavelAgurov I believe its something do with the prompt so i tried some other prompts to test,
prompt:
I will check on this later in depth. hope it helps for now! |
Please find short test: https://colab.research.google.com/drive/15WniinCDUd_tL_z6APwEqQTXcbD9nVq2?usp=sharing (skip second part of this test, because it's already about #470) |
My assumption - you loaded data directly from csv, but I loaded it from dafaframe. |
@PavelAgurov I looked deeper into it now. The issue is with the prompt where it was returning first row
By changing the prompt to Also, you do not need to initialize sns.load_dataset (
|
Maybe better to fix bug in code instead of changing prompt? :) And yes, it's not needed to wrap it as DataFrame, it was just stupid test from my side :) |
@gventuri we need to have workaround to handle these cases. |
Do you need exception ""Invalid input data. We cannot convert it to a dataframe." ? We can try to convert data into DataFrame, if not - just return "as is" (after validation that it's not a string). |
@sandiemann @PavelAgurov thanks a lot for looking into it, will try to figure out how to handle it. Maybe we could make it so a |
Maybe just like this?
|
I think it's most critical from my findings, because most of my questions to the data return with this error. |
No ideas? |
I did fork and will test my solution. |
Tested with fix - works good. No error.
|
@PavelAgurov thanks a lot for reporting. Glad the fix works, closing the issue :) |
will you merge it? |
@PavelAgurov from what I realize, the fix is the following, right:
So basically it also handles series. |
Not sure, but I see problem here:
pd.Frame is not instance of (list, dict) and it will not work if we have DataFrame here. Let's check it:
Output: True False False Solution can be to remove this checking or add direct checking:
|
From other side - I can't find example how to reproduce it with DataFrame. Maybe it's not a case. |
馃悰 Describe the bug
I use titanik data (attached).
Model is turbo3.5.
Load data:
df = pd.read_csv('./data_examples/titanic.csv')
Run:
Question is "what is first row?"
Result - crash:
Error: Invalid input data. Must be a Pandas or Polars dataframe.. Track: Traceback (most recent call last): File "C:\DiskD\GptPOCs\AskYourDataPOC\main.py", line 142, in result = smart_df.chat(question) File "d:\Anaconda3\lib\site-packages\pandasai\smart_dataframe_init_.py", line 167, in chat return self.dl.chat(query) File "d:\Anaconda3\lib\site-packages\pandasai\smart_datalake_init.py", line 329, in chat return self.format_results(result) File "d:\Anaconda3\lib\site-packages\pandasai\smart_datalake_init.py", line 356, in format_results return SmartDataframe( File "d:\Anaconda3\lib\site-packages\pandasai\smart_dataframe_init.py", line 68, in init self.load_engine() File "d:\Anaconda3\lib\site-packages\pandasai\smart_dataframe_init.py", line 119, in _load_engine raise ValueError( ValueError: Invalid input data. Must be a Pandas or Polars dataframe.
Trace log:
conversational can be True or False - I have crash in both cases.
The text was updated successfully, but these errors were encountered: