Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft for subset access of features #34

Merged
merged 7 commits into from
Apr 14, 2011
Merged

Draft for subset access of features #34

merged 7 commits into from
Apr 14, 2011

Conversation

karlnapf
Copy link
Member

Attached my first draft for subset access for features, which is needed for cross-validation, which is my proposed gsoc project.
Features stay in memory as before, but it is possible to tell the class to use a subset for all feature access methods.
If this is ok for you, I will implement the same thing for all feature types.
Feedback and suggestions are welcome!
Heiko

@@ -48,6 +48,9 @@ CFeatures::CFeatures(CFile* loader)

CFeatures::~CFeatures()
{
if (subset_inds)
delete subset_inds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be delete[]

@sonney2k
Copy link
Member

I am actually very happy about this initial draft. I have tried to give you as much feedback as possible. Hopefully it will simplify the code a bit and I am all for doing this for the other feature types (when that one is finished and tested - I would suggest to write a python_modular function to test it, where you set a subset_idx that is exactly 0...(num_vec-1) - i.e. identical and then run some of the svm examples).

I think this will enable us to only have to specify the data set once, setting different indices for training/validation/testing etc. Should be one of the most difficult parts already. The other difficult part is how to register the model parameters. When these two are resolved you are on track.

-some changes to names
-some errors
-some design changes
@karlnapf
Copy link
Member Author

Hi there,
thanks for the feedback.
I changed all the discussed points.
Hope, it is ok now
Heiko

*/
virtual int32_t get_num_vectors()=0 ;
//virtual int32_t get_num_vectors_all()=0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have

virtual int32_t get_num_vectors_all()
{

return get_num_vectors();

}

here - don't you agree?

…modular

appending from another string features instances now only appends a subset, if one is defined
* @param m_subset_idx index matrix
* @param m_subset_len number of subset indices
*/
virtual void set_feature_subset(int32_t* subset_idx, int32_t subset_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh sigh. same stupid issue with swig interfaces. we need to copy the data here. or have two functions copy_feature_subset and set_feature_subset. Don't change it just now - I seriously consider fixing up the swig interfaces.

@@ -533,7 +569,7 @@ template <class ST> class CStringFeatures : public CFeatures
*/
virtual inline int32_t get_max_vector_length()
{
return max_string_length;
return m_subset_idx ? max_string_length_total : max_string_length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should always return max_string_length here and only adjust max_string_length to have to correct value.

small changed to std getter
method subset_idx_conversion() is now not pure virtual anymore
added comments
copy_features() now works woth subsets
code cleanups
@karlnapf
Copy link
Member Author

I left the setter for the subset matrix as it is for the moment.
The rest works, I already tested it with libshogun
To wirte a python example, first the setter has to work with swig right?
Shouldnt I change it for now or should I wait for the update of the swig-stuff?

@sonney2k sonney2k merged commit 601b8ab into shogun-toolbox:master Apr 14, 2011
spothound pushed a commit to spothound/shogun that referenced this pull request Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants