-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create feature vectorizer class / API. #49
Comments
|
Yes, I was explicitly thinking about sklearn as a model. |
We should also think about how to balance distance metric versus feature representations. |
Tagging this issue. Kyle is your vision for this a kind of map function, where you can efficiently apply a function that takes each member of a set of snapshots to a vector? Maybe you could write a little more about what the intended use cases of this are, so I (and maybe others) can get an idea of what you're thinking. |
I think it's basically just re-imagining metric.prepare_trajectory as a separate class. |
As an example of how we might use pandas here, think about the case of calculating dihedral angles. Right now, the output is this:
The way pandas could be useful is by providing a natural way to merge the metadata (rid) and the values (angles). We could also have a way to switch between a string index and a "multiindex", where the multi-index would contain the following: Type of calculation (chi torsion) I'm not trying to claim that this is the best way to do this, but it is one way to help streamline this stuff... |
Then the "job" of the vectorizer would be to do two things:
|
I was just talking to @msultan about this yesterday afternoon. A useful part of the (feature/vector)izer api would be a minimal operator-type logic. For algorithms like ktICA where you're, in some sense throwing the kitchen sink at the problem in terms of very large feature spaces (sure, they might be implicit but it's still in the spirit), you might want to use a sort of operator logic on featurizers to build up a complex "compound" feature space. i.e.
I'm not sure what operations really make sense. There's adding two featurizers. Multiplying by a scalar Maybe I'm overthinking this. I'm not really sure what the use cases are except for some kind of very exhaustive enumeration. cc: @schwancr |
I think Christian should give his thoughts on the desired properties.
|
It sounds like a decent idea to provide some operators for the featurizer objects, but I'm worried it could be confusing, and I bet someone will end up adding the result of two featurizer.call's as opposed to adding two featurizers. For instance, you could add operators for building a Hybrid metric:
Those two methods would be the same, but the first one to me could be confusing. I think I'd prefer to have a hybrid featurizer just like we have a hybrid metric. In fact, even in the |
I agree that some form of addition operation is critical, as we don't want to manually keep track of calculating each feature. |
I'm not as fond of the outer product. This does bring up the related point of how to design the MSMB3 hooks etc. |
The outer product is a lot like using the kernel trick with second degree polynomials. So we could have a Polynomial featurizer if we wanted to. But again I think it's clearer to have the featurizers be initialized by calling |
OK, I'm fine with creating features from lists, rather than explicitly adding them. |
By the way, are we set on the "featurizer" name? |
I don't think it was quite clear to me last night that this operator stuff is just an alternative interface to a bunch of constructors for init methods for classes like SumFeaturizer and ScalarMultipleFeaturizer, etc. I'm not set on the name featurizer though. -Robert On Tue, Aug 6, 2013 at 10:08 AM, Christian Schwantes
|
The idea is that we want a robust framework for featurizing trajectories. Key abilities:
The text was updated successfully, but these errors were encountered: