Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability issues #78

Closed
BellmannRichard opened this issue May 7, 2023 · 2 comments
Closed

Scalability issues #78

BellmannRichard opened this issue May 7, 2023 · 2 comments

Comments

@BellmannRichard
Copy link

You clearly have to pass the whole data frame to the OAI API. Even for small data frames (hundreds of rows, dozens of columns) this could easily fill up a 4096 context, or make users spend a lot of money. You should compute the number of tokens before you make the API call, and it’s that over some threshold, warn the user.

Also, this will clearly not scale to the size of the datasets used in the industry. Try a random dataset with 10000 rows and 100 columns for example. If it doesn’t work (as I expect) consider testing some fix, such as maybe split the di in chunks, summarize them and use the summaries to answer the research question. Summaries will most likely mess up the floating point numbers, though. All in all, I don’t see how this can work even for medium-sized dataframes

@yzaparto
Copy link
Contributor

yzaparto commented May 7, 2023

Hi @BellmannRichard
Thanks for raising the concern.
We are not passing the whole df to OpenAi. Its a small subset of that (df.head).
I would say give it a try on large datasets and if it breaks feel free to create an issue.

@gventuri
Copy link
Collaborator

gventuri commented May 7, 2023

Hey @BellmannRichard, as @yzaparto we only send the first 5 records of the table. The only scalability issue comes with table with many columns, I'll turn this in a discussion, let's see if we manage to find a solution!

@Sinaptik-AI Sinaptik-AI locked and limited conversation to collaborators May 7, 2023
@gventuri gventuri converted this issue into discussion #82 May 7, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants