Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow dataframe operations - use Dask instead of Pandas, dtype optimizations #210

Closed
perfectly-preserved-pie opened this issue Mar 22, 2024 · 1 comment
Assignees
Labels
wontfix This will not be worked on

Comments

@perfectly-preserved-pie
Copy link
Owner

perfectly-preserved-pie commented Mar 22, 2024

So Pandas isn't necessarily slow, but there has always been a noticeable lag when playing with the checkboxes, radio buttons, and sliders. Especially the Pets radio button and the rental/list price slider. I think it's getting even worse now that I've reached 4000+ rows in my dataframes.

I'm wondering if Dask might be a better option here. Especially because I have a fat ass 20c/40t CPU server that is literally doing nothing 99% of the time. I'm in a position where I could throw CPU horsepower at a problem until it's fixed.

I could also optimize the dtypes I'm using. For example, Bedrooms and Bathrooms don't need a full Int64 dtype - they could just as well use the int8 dtype which has a max of 100; it's unlikely a house is going to have more than 100 bedrooms or bathrooms.

@perfectly-preserved-pie perfectly-preserved-pie added enhancement New feature or request fix Fixing an issue or problem labels Mar 22, 2024
@perfectly-preserved-pie perfectly-preserved-pie added wontfix This will not be worked on and removed enhancement New feature or request fix Fixing an issue or problem labels Mar 27, 2024
@perfectly-preserved-pie
Copy link
Owner Author

Ugh, honestly with just 4k rows I don't think this could even be considered a "big" dataset where optimizations like this would actually matter.

The filters work almost instantly on localhost. The delay is probably coming from the fact that I gotta send a ~20MB or whatever JSON across the internet from the production website.

My efforts would probably be better utilized reducing that JSON payload size, not switching to Dask or whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant