Scalability issues #78

BellmannRichard · 2023-05-07T15:50:14Z

You clearly have to pass the whole data frame to the OAI API. Even for small data frames (hundreds of rows, dozens of columns) this could easily fill up a 4096 context, or make users spend a lot of money. You should compute the number of tokens before you make the API call, and it’s that over some threshold, warn the user.

Also, this will clearly not scale to the size of the datasets used in the industry. Try a random dataset with 10000 rows and 100 columns for example. If it doesn’t work (as I expect) consider testing some fix, such as maybe split the di in chunks, summarize them and use the summaries to answer the research question. Summaries will most likely mess up the floating point numbers, though. All in all, I don’t see how this can work even for medium-sized dataframes

yzaparto · 2023-05-07T16:07:52Z

Hi @BellmannRichard
Thanks for raising the concern.
We are not passing the whole df to OpenAi. Its a small subset of that (df.head).
I would say give it a try on large datasets and if it breaks feel free to create an issue.

gventuri · 2023-05-07T17:37:40Z

Hey @BellmannRichard, as @yzaparto we only send the first 5 records of the table. The only scalability issue comes with table with many columns, I'll turn this in a discussion, let's see if we manage to find a solution!

Sinaptik-AI locked and limited conversation to collaborators May 7, 2023

gventuri converted this issue into discussion #82 May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Scalability issues #78

Scalability issues #78

BellmannRichard commented May 7, 2023

yzaparto commented May 7, 2023

gventuri commented May 7, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Scalability issues #78

Scalability issues #78

Comments

BellmannRichard commented May 7, 2023

yzaparto commented May 7, 2023

gventuri commented May 7, 2023

This issue was moved to a discussion.