-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize naming conventions for estimators and package #6
Comments
Regarding estimator names, what do you think about using |
@aazuspan, honestly I hadn't come across the The only reason I've shied away from using this terminology on our estimators is because they seem to treat imputation as a way to fill missing values on the way to transformation/classification/regression. That is, it's a necessary pre-filling step such that other estimators down the line don't choke. The expectation for our estimators is that the But recognize that I haven't yet come up with a better term yet ... how about I'm willing to go with |
Perfect, I'll just close this issue as resolved 😆
I completely agree. It's a challenging distinction to make because they function almost identically, but have very different purposes.
No, I think your instincts were right. The association between imputation and filling missing data is just too strong for Back to the drawing board! |
Based on discussions in #2 and #13, starting to learn toward using
I might be tempted to give a nod to being explicit that these are KNearest regressors, so we could consider names like:
It gets a bit awkward/redundant with MSN (most similar neighbor) and GNN (gradient nearest neighbor) because they already have NN implied. So, my current thinking is:
Does that make |
I'm happy with these names, and I'm also in favor of being more explicit by including In the interest of thinking through every option, would there be any advantage to combining similar estimators into a single estimator that could be selected with a parameter like
They are a little wordy... I think the package name would be okay since it's only used once or twice, but the module name is definitely a mouthful to import every time. I don't want to criticize without tossing out some alternatives, so just to put a few ideas on the board for consideration:
|
I definitely agree that there is very little difference between the estimators we've introduced and that they likely could be combined into a single estimator. My aversion to this is mainly based on trying to decipher the I think one goal that I have in mind is inter-comparison between the methods along with their hyperparameters through
I agree we'd probably want to avoid an import name that has been used before (unfortunately "nearest-neighbors" and "neural-networks" share the same abbreviation). All of your suggestions are better than mine, but thought I'd throw one more out - sknnr (aka "skinner" - a nod to the Simpsons principal). It looks like there is a package on PyPi called "skinner", but not under active development. I like "skneighbors" as well. |
Yes, I think that kind of parameter spaghetti is definitely something we should try to avoid. The sklearn framework gives us an advantage there since it forces separation of the model configuration from the fitting, predicting, and scoring parameters, but I still think you're right to err on the side of atomic over configurable.
I think either approach is reasonable, but I'm leaning towards the way you have it set up now. I agree that simpler functions are better, and in particular I'm not a fan of string params with predefined options because they're harder for users to remember and force us to handle invalid inputs. In any case, I don't think we're locking ourselves into anything and can always reconsider later. I imagine if we wanted to switch to a
I don't have a ton of experience with those, but the conclusion I came to was that a
I like |
All sounds good to me, @aazuspan. I agree that we can revisit the decision of the |
I think so! It will be a little painful to merge that change in with any branches we have ongoing since I think git will consider all the old src files deleted, but we'll have to deal with it at some point, so I'm not sure we gain anything by waiting.
I think locally it should be as simple as a find-replace in VS Code and renaming the module folder. There could be some other snags I'm not remembering, but I've done this a few times and can't remember any major issues. It's also possible we'll have to rebuild our hatch environments using
Yeah, that should be pretty painless. You can continue to push to the old repo name and Github will redirect it, although it might raise a warning. I think the long-term fix is to run |
I may be showing my ignorance here, but considering our flow to this point, it seems like we only have one or maybe two active feature branches going and right now they are branching off |
I propose renaming
|
The problem I'm thinking of is with unpushed local branches. For example, if you're working on
I've never worked with a
I like the name and the added flexibility with the implementation. My only concern would be how that fits in with #15. If it's a quick fix to make it dataframe compatible, maybe we could tackle two issues with one commit, so to speak. I was also wondering if that should be moved out of |
Ah, yep, I totally agree with this. At this time, I don't have any unpushed local branches (I stashed the work on
Again, we can always introduce it down the line if we find that it becomes necessary.
Good thoughts for both of these. I'm leaning toward a separate PR once we do the renaming. I'll do the renaming now. |
Sounds good to me! |
Resolved all but |
Currently, our estimators are named to follow method names introduced in R's
yaImpute
package. We likely want to conform to scikit-learn's model of being more explicit about whether estimators are classifiers or regressors. kNN methods fit into a funny space here - most often they are thought of as classifiers as they use thek
nearest neighbors in prediction. However, some attributes are continuous and are calculated as (weighted) means from thek
neighbors attributes, which is more similar to KNeighborsRegressor.Also need to decide whether it makes sense to stick with
scikit-learn-knn
as a package name asknn
is already a meaningful term inscikit-learn
. We could also decide to "pay homage" toyaImpute
which is serving as the inspiration for this package.The text was updated successfully, but these errors were encountered: