float type matrices? #4

cdeterman · 2015-04-13T20:15:38Z

Given the focus of this package is to use the minimal amount of memory as efficiently as possible I believe it should include big.matrix objects of type float. There may be situations in which single precision is sufficient and therefore would need only ~half the space as a double matrix. I know R does not have any single precision data type but seeing how all the 'heavy lifting' is done in C++ it seems like this should be an approachable thing. However I would like to have additional opinions on the following points before I begin writing a bunch of code.

Naturally, do you agree that float type matrices should be a part of this package?
The current structure has the matrix_type representing the byte size of each type. Continuing this with float types would lead a conflict in various case statements. The only solution I could come up off the top of my head was to add another field to the big.matrix object that is more specific to the data type and not the byte size. e.g. pMat->matrix_data_type() would return a string (i.e. 'int', 'float', 'double', etc). This would lead to other code requiring updating, such as the Rcpp Gallery posts unless a more elegant solution can be conceived.
Approaches likely would involved the use of typeid from <typeinfo> unless we would want to also begin moving towards C++11 standards where we could use the newer decltype function but possibly a moot point here (but worth beginning thoughts about C++11.

Any thoughts are appreciated :)

The text was updated successfully, but these errors were encountered:

cdeterman · 2015-04-14T14:57:25Z

As I am experimenting with this (I think i can avoid requiring the additional field mentioned above) do you have any previous code to check the sizeof the actual matrix? I have initially tried a few simple queries I thought would work but to neither returns the correct size. I want to try and get this to work correctly with double type matrices.

With a matrix of 1000x1000

Directly on the matrix the object points to (returns 8) but it appears this is data type size:

    Rcpp::XPtr<BigMatrix> pMat(bigMatAddr);
    return Rcpp::wrap(sizeof(pMat->matrix()));

Trying to use matrix accessor (returns 40?):

    Rcpp::XPtr<BigMatrix> pMat(bigMatAddr);
    MatrixAccessor<double> accMat(*pMat);
    return Rcpp::wrap(sizeof(accMat));

kaneplusplus · 2015-04-15T16:25:58Z

Supporting floats would be nice and I support any movement toward C++11. It feels like there is still a lot of code we could eliminate by adding modern C++ features.

I think biganalytics uses stable versions of calculations for things like variance so it seems like almost all of the work would be managing the element types.

Would the linear algebra operations then be handled by armadillo?

WRT sizeof, your sizeof(accMat)), 40 is the size of the 4 index_types + one pointer type (all 8 bytes). We could keep track of the size of the memory-mapping in a big.matrix when it is allocated. This would mean that a user would not need to resort to finding the backing and checking it's size manually.

cdeterman · 2015-04-15T16:31:41Z

There are float types within armadillo (e.g. fmat, fvec) so it would work very easily. It would be nice to have the size tracked. Do you have a method in mind to accomplish that? Otherwise do you know how to check the size manually from these objects? I want to confirm that the float type is properly being applied.

kaneplusplus · 2015-04-15T18:04:47Z

It'll need to be added to the Create* functions in BigMatrix.cpp. The calls to truncate and ftruncate calculate the size. One thing to note is that the size may not be the same as the amount of physical space being being used. On Linux and Windows "sparse files" are created by default (the Mac file system doesn't have this capability).

I can probably get to this on the weekend unless you want to take a look.

cdeterman · 2015-04-15T18:32:34Z

It would likely be better if you to modify the Create* functions. I am less familiar with the boost methods for shared memory objects. I think I have some working code to implement the float type matrices but don't have a validation method. That note regarding the size not the same as physical space is a good point. In the simplest scenario I want to at least confirm that the size is smaller for the float type matrices. In theory, they should be approximately half their double counterparts but if they don't at least that note will provide some sort of explanation.

kaneplusplus · 2015-04-15T18:39:04Z

That's fine. I'll send you a note when it's done. It seems like a nice feature and it provides the sanity check you're looking for to validate float types.

kaneplusplus · 2015-04-19T17:00:35Z

A new member of BigMatrix has been added, called _allocationSize and it keeps track of the total number of bytes allocated to a BigMatrix object. This value of this member can be found using the allocation_size method.

kaneplusplus · 2015-04-20T16:29:49Z

OK, I need to be passing a pointer-pointer to the Create* functions. I'll fix it now.

kaneplusplus · 2015-04-20T16:34:53Z

Yeah, that was it. Sorry and thanks for pointing it out. The fixed version is checked in.

sritchie73 · 2015-06-02T04:41:33Z

Nice! Any news on when this version will be available on CRAN?

cdeterman · 2015-06-02T12:00:34Z

@sritchie73 We are currently working on some additional small fixes (see issue #15 and #16 ). I imagine once we have these issues resolved so R CMD check passes without error we will update it on CRAN. For now you can use the dev version from this repo. Did you have any additional thoughts @kaneplusplus ?

kaneplusplus · 2015-06-02T16:19:46Z

Looking at #15 it doesn't seem like there was a consensus. I pointed out that it's nice when the behavior is similar across all platforms but I don't like Windows, I think development on Windows is miserable, and I think if you want to do any serious computing you should be on Linux. As a result we've traditionally done the minimum needed to get Windows building and I don't mind continuing with that policy.

For #16 I vote for \dontrun. The check environment is a mysterious and beyond checking packages, it's not used.

Are there objections or concerns for either?

cdeterman · 2015-06-02T16:34:45Z

If CRAN doesn't have an issue with the Windows limitations I have no issue with it moving forward. Windows definitely is a pain for things like this. I think within bigmemory we will need \dontrun at the start of most of the examples however but ultimately really isn't an issue.

phaverty · 2015-06-02T19:52:07Z

I'm OK with doing \dontrun on the troublesome examples. I'm also not super
concerned about losing Windows support. People have asked me about it from
time to time, but not so much that it is critical to have support.

Pete

Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty@gene.com

On Tue, Jun 2, 2015 at 9:34 AM, Charles Determan notifications@github.com
wrote:

If CRAN doesn't have an issue with the Windows limitations I have no issue
with it moving forward. Windows definitely is a pain for things like this.
I think within bigmemory we will need \dontrun at the start of most of the
examples however but ultimately really isn't an issue.

Reply to this email directly or view it on GitHub
#4 (comment)
.

get same version

kaneplusplus assigned kaneplusplus and unassigned kaneplusplus Apr 15, 2015

cdeterman closed this as completed Apr 20, 2015

kaneplusplus pushed a commit that referenced this issue Mar 26, 2017

Merge pull request #4 from kaneplusplus/master

e12c629

get same version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float type matrices? #4

float type matrices? #4

cdeterman commented Apr 13, 2015

cdeterman commented Apr 14, 2015

kaneplusplus commented Apr 15, 2015

cdeterman commented Apr 15, 2015

kaneplusplus commented Apr 15, 2015

cdeterman commented Apr 15, 2015

kaneplusplus commented Apr 15, 2015

kaneplusplus commented Apr 19, 2015

kaneplusplus commented Apr 20, 2015

kaneplusplus commented Apr 20, 2015

sritchie73 commented Jun 2, 2015

cdeterman commented Jun 2, 2015

kaneplusplus commented Jun 2, 2015

cdeterman commented Jun 2, 2015

phaverty commented Jun 2, 2015

float type matrices? #4

float type matrices? #4

Comments

cdeterman commented Apr 13, 2015

cdeterman commented Apr 14, 2015

kaneplusplus commented Apr 15, 2015

cdeterman commented Apr 15, 2015

kaneplusplus commented Apr 15, 2015

cdeterman commented Apr 15, 2015

kaneplusplus commented Apr 15, 2015

kaneplusplus commented Apr 19, 2015

kaneplusplus commented Apr 20, 2015

kaneplusplus commented Apr 20, 2015

sritchie73 commented Jun 2, 2015

cdeterman commented Jun 2, 2015

kaneplusplus commented Jun 2, 2015

cdeterman commented Jun 2, 2015

phaverty commented Jun 2, 2015