Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft for subset access of features #34

Merged
merged 7 commits into from Apr 14, 2011
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 39 additions & 0 deletions src/libshogun/features/Features.cpp
Expand Up @@ -48,6 +48,9 @@ CFeatures::CFeatures(CFile* loader)

CFeatures::~CFeatures()
{
if (m_subset_idx)
delete[] m_subset_idx;

clean_preprocs();
}

Expand All @@ -66,11 +69,16 @@ CFeatures::init(void)
&num_preproc, "preprocessed",
"Feature[i] is already preprocessed.");

m_parameters->add_vector(&m_subset_idx, &m_subset_len, "subset_idx", "Subset indices.");


properties = FP_NONE;
cache_size = 0;
preproc = NULL;
num_preproc = 0;
preprocessed = NULL;
m_subset_idx = NULL;
m_subset_len = 0;
}

/// set preprocessor
Expand Down Expand Up @@ -275,3 +283,34 @@ bool CFeatures::check_feature_compatibility(CFeatures* f)
(this->get_feature_type() == f->get_feature_type()));
return result;
}

void CFeatures::set_feature_subset(int32_t* subset_idx, int32_t subset_len)
{
if (m_subset_idx)
delete[] m_subset_idx;

m_subset_idx=subset_idx;
m_subset_len=subset_len;
}

void CFeatures::remove_feature_subset()
{
if (!m_subset_idx)
{
delete[] m_subset_idx;
m_subset_idx=NULL;
m_subset_len=0;
}
}

void CFeatures::get_feature_subset(int32_t** subset_idx, int32_t* subset_len)
{
ASSERT(m_subset_idx);
int64_t length = sizeof(int32_t)*m_subset_len;

*subset_len=m_subset_len;
*subset_idx=(int32_t*)malloc(length);
if (!*subset_idx)
SG_ERROR("Allocating %ld bytes failes\n", length);
memcpy(*subset_idx, m_subset_idx, length);
}
66 changes: 63 additions & 3 deletions src/libshogun/features/Features.h
Expand Up @@ -49,6 +49,9 @@ namespace shogun
* (dense matrices), CSparseFeatures (sparse matrices), CStringFeatures (a
* set of strings) from which all the specific features like CSimpleFeatures<float64_t>
* (dense real valued feature matrices) are derived.
*
* It is possible to set/get/reset subset matrices.
* These may be used by subclasses for access to subsets of features.
*/
class CFeatures : public CSGObject
{
Expand Down Expand Up @@ -149,13 +152,21 @@ class CFeatures : public CSGObject
*/
inline int32_t get_cache_size() { return cache_size; };

/** get number of examples/vectors
/** get number of examples/vectors, possibly corresponding to the current subset
*
* abstract base method
*
* @return number of examples/vectors
* @return number of examples/vectors (possibly of subset, if implemented)
*/
virtual int32_t get_num_vectors()=0 ;
virtual int32_t get_num_vectors()=0;

/** get number of ALL examples/vectors (no subset here)
*
* abstract base method
*
* @ return number of examples/vectors (ALL, nosubset)
*/
//virtual int32_t get_num_vectors_all()=0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right we indeed need a function like this


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have

virtual int32_t get_num_vectors_all()
{

return get_num_vectors();

}

here - don't you agree?

/** in case there is a feature matrix allow for reshaping
*
Expand Down Expand Up @@ -232,6 +243,45 @@ class CFeatures : public CSGObject
properties &= (properties | p) ^ p;
}

/** removes (and deletes) the current subset indices matrix */
virtual void remove_feature_subset();

/** getter for the subset indices
*
* @param m_subset_idx (copy of) subset indices matrix (returned)
* @param m_subset_len (copy of) number ofsubset indices (returned)
*/
void get_feature_subset(int32_t** subset_idx, int32_t* subset_len);

/** getter for the subset indices
*
* @param m_subset_len reference to number of subset indices (returned)
* @return subset indices array
*/
int32_t* get_feature_subset(int32_t& subset_len)
{
subset_len=m_subset_len;
return m_subset_idx;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot know this - and I guess we will change this at at some later point. but for now: The get_feature_subset() function above would be callable by swig wrapped interfaces (one line %apply line in modular/Features.i is required for that). But unfortunately these interfaces require an allocated pointer and a copy of the data. Not nice I know but this is the way it is for now. So please introduce two such get_feature_subset functions - one like you have now but using references to set the data sizes that returns the subset index and one that allocates memory for the index (malloc this time!) / so a copy.

Look at features/SimpleFeatures.h line 361 and below - it will be clear from that.


/** sets the subset indices matrix which is afterwards used for feature access
*
* @param m_subset_idx index matrix
* @param m_subset_len number of subset indices
*/
virtual void set_feature_subset(int32_t* subset_idx, int32_t subset_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh sigh. same stupid issue with swig interfaces. we need to copy the data here. or have two functions copy_feature_subset and set_feature_subset. Don't change it just now - I seriously consider fixing up the swig interfaces.


protected:
/** returns the corresponding real index (in array) of a subset index
* (if there is a subset)
*
* should/has to be be overwritten when subset support is
* implemented in a subclass. for now, it just returns identity
*
* @ return array index of the provided subset index
*/
inline int32_t subset_idx_conversion(int32_t idx) { return idx; }

private:
/** feature properties */
uint64_t properties;
Expand All @@ -247,6 +297,16 @@ class CFeatures : public CSGObject

/// i'th entry is true if features were already preprocessed with preproc i
bool* preprocessed;

protected:
/** (possibly) contains a matrix of indices that form a subset of the stored
* features. If set, all vector access methods work on the specified subset.
* If it is NULL, it is ignored.
*/
int32_t* m_subset_idx;

/** length of the subset */
int32_t m_subset_len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add one line of documentation for both varialbes.

};
}
#endif