Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_rel_size() and get_path_keys() questions #43

Closed
johnmudd opened this issue Dec 6, 2013 · 0 comments
Closed

get_rel_size() and get_path_keys() questions #43

johnmudd opened this issue Dec 6, 2013 · 0 comments

Comments

@johnmudd
Copy link

johnmudd commented Dec 6, 2013

I just want to verify some details before I implement these methods. My underlying data is ISAM files with indexes.

get_rel_size():
Do I return the number of rows that I expect my execute() method to yield or the numbers of rows that Postgres will return after applying the full search criteria? In get_rel_size() I plan to look at the quals (if any), determine which indexes I can use based on the quals, calculate the number of rows I'll yield for each index and return the lowest number, the number associated with the “best” index. It's similar to the steps I'll follow in execute().

My assumption is that I should use an index whenever possible to help reduce the number of records I’ll yield and to minimize the number of reads I’ll perform to find these candidate records. I also assume there’s not much benefit in applying every qual to the records. Postgres can do that filtering about as efficiently as I can. And I assume in some cases the full search criteria is not represented in the quals so in those cases I have no choice but to let Postgres perform the final filtering.

Will execute() always receive the exact same quals as get_rel_size()? If so then I'll save my index selection from get_rel_size() for later use in execute(). In fact, I probably should cache my decisions since the app will be repeating a limited set of queries.

get_path_keys():
I plan to return one row count estimate for each index in my ISAM file. In the case of compound indexes I'll return one estimate for each combination of fields. So in the case of an index of three fields such as [a,b,c] I’ll return row counts for [a], [b], [a,b], [c], [a,c], [b,c] and [a,b,c]. I’m wondering if these estimates on the combinations will be misleading though. I think I can return accurate estimates but the numbers won’t reflect the amount of reads required to locate the records. For example, if only field “a” appears in the search criteria then I can use the index to go directly to the target records. But it will be more complicated for searches by “c” alone. The index will help me skip some records but in the worst case I will have to read all records to find matches on “c” alone. Is that a valid concern?

@johnmudd johnmudd closed this as completed Dec 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant