-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cudf.options
#11193
Add cudf.options
#11193
Conversation
At a glance, this design looks solid. Before we move forward too far with this, though, given how many different times something like this has been requested it would be good to see how well this works in practice for the use cases that we've consider (offhand, things like requiring stable sorts or forcing exact pandas compatibility for certain APIs come to mind). |
@vyasr certainly. The direct motivation of this PR is #10558. It's currently blocked by #11182 to fully implement what's needed. The stable sort idea sounds useful but I recall the way pandas config those is by choosing the sorting method in the API, not through config options. Is there a more specific stable sort use case you are referring to that I missed? |
The stable sort was just an arbitrary example. I was referring to the fact that APIs like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. I'm confused about why this deviates so strongly from the design of Pandas' options. I would try to align with that if at all possible, rather than coming up with an entirely different structure and naming scheme.
Part of me agrees with this, but just to play devil's advocate: do we anticipate having any overlap in the actual options that we offer? If not, it might be rather confusing to have something that looks like the same API but actually has no shared behavior since it's a completely different set of configuration options. |
Potentially? For example, we currently piggyback off of Pandas' options context to control display behavior. There's no reason the user couldn't control the beahvior of Pandas and cuDF separately. |
Summarizing offline sync: We want to model cudf configuration interface close to pandas options interface to provide a more coherent experience for people who transfer from pandas. As of today, the options that pandas support for compute and I/O engines are incompatible with the nature of a GPU dataframe library. Thus we don't model |
Codecov Report
@@ Coverage Diff @@
## branch-22.08 #11193 +/- ##
===============================================
Coverage ? 86.43%
===============================================
Files ? 144
Lines ? 22808
Branches ? 0
===============================================
Hits ? 19714
Misses ? 3094
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback attached. I am glad the decision was made to stick close to what Pandas provides here - I have a few suggestions for improved alignment.
def _build_option_description(name, opt): | ||
return ( | ||
f"{name}:\n" | ||
f"\t{opt.description}\n" | ||
f"\t[Default: {opt.default}] [Current: {opt.value}]" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make more sense to implement a custom __str__
on Option
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be alright to do that, but the Option
doesn't know its own name
, so it's missing a crucial piece of information. I'm happy with the current state here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One feature of pandas options functions that I don't think it's publicized well is that you can pass a string that can regex match an option name e.g.
pd.get_option('my')
would be equivalent to pd.get_option('my_option')
if 'my'
could only match 'my_option'
Personally, I am not fond of this feature as it's not explicit and not sure if there's a benefit for increased code complexity, but noting here if you want to include that functionality.
I agree that's not necessary here, for simplicity's sake. We can align with the pandas design in lots of ways but regex matching feels like overkill unless/until we need it (e.g. if we have dozens of options and option namespaces with dot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor wording, but looks good, thanks!
Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com> Co-authored-by: Lawrence Mitchell <wence@gmx.li>
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggested refactors and typos/grammar fixes.
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
rerun tests |
@gpucibot merge |
Thanks all! 🙏 |
This PR introduces a cudf option to allow user to control the default bitwidth for integer and floating types. The first iteration only plans to provide three options: `None`, 32bit and 64bit. When set as `None`, that means the result dtype will align with what pandas constructs. Otherwise, default to what user specifies. "Default" implies that it should only affects places that requires type inference, that includes: - CSV/JSON readers when dtypes are not specified - cuDF constructors - Materializing a range index. This PR is the first demonstration use of `cudf.option`, depending on #11193. Diff will reduce once it's merged. closes #11182 #10318 Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Ashwin Srinath (https://github.com/shwina) - Vyas Ramasubramani (https://github.com/vyasr) URL: #11272
This PR adds
cudf.options
, a global dictionary to store configurations. A set of helper functions to manage the registries are also included. See documentation included in the PR for detail.See demonstration use in: #11272
Closes #5311