Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Use Cassandra-CLI safe time stamps #16

Closed
redsolar opened this Issue · 8 comments

2 participants

@redsolar

Cassandra CLI uses 13-digit timestamps, which are 10-digit unix Epoch + 3 decimal points represented as 13-digit integer.

If one is using Cassandra CLI + PHP/Thrift at the same time, and makes changes in CLI, then those fields err columns :) will have a 13-digit stamp, and all subsequent PHP modification requests to it will be ignored, as they will be deemed "older"

A solution would be to standardize on a special 13-digit time function, since PHP doesn't have one built in.

In my case, this is what I used, and ensured that all $this->setTime() requests converge to that method.

/**
 * Returns Cassandra-safe 13-digit time stamp, which is comprised of a traditional 10-digit timestamp and 3 decimal points converted to an integer
 *
 * @param int $timestamp (optional) traditional 10-digits or less unix epoch timestamp
 */
public static function time13($timestamp = NULL)
{
    $timestamp = intval($timestamp);
    if(strlen((string)$timestamp) > 10)
    {
        //assume already properly sized, can make strict == 13 if desired
        return $timestamp;
    }
    return($timestamp ? $timestamp*1000 : round(microtime(true)*1000, 3));
}
@mjpearson
Owner

Thanks a lot, it's a quick fix and a perfect fit with Column::bindTime() - will adjust unit tests and get it committed in the next few days. The 64-bit integer type for timestamp in thrift makes more sense now.

-michael

@mjpearson
Owner

It looks like thrift_protocol.so (thrift_protocol_write_binary) isn't honouring the 64 bit integer type and is sending a signed int32 timestamp.

Am seeing : => (column=column1, value=TEST DATA, timestamp=-379667704) via CLI. Using TBinaryProtocol doesn't look to have this problem. Have pulled most recent thrift source and recompiled (there were a few i64 diffs) with similar results.

Are you seeing this behaviour?

@redsolar

No, I am seeing correct timestamps.

Have you checked to see if you are using 64-bit Java runtime? Does CLI store timestamps as 13-digit value?
If you do say
cassandra> set Keyspace1.Standard1["testkey"]["testcolumn"] = "testvalue"
and then
cassandra> get Keyspace1.Standard1["testkey"]
What result do you get?

Check what your timestamp value is immediately before you insert() it, also add a var_dump($args) to send_insert() before $bin_accel in cassandra.php to see what's happening there.

Also check how your php was compiled (php -i | grep 64)
One of the lines should say something similar to
Host => x86_64-redhat-linux-gnu
Although I suspect your PHP is good, since you mentioned that things work with TBinaryProtocol.

Check which modules it's using. On an x86_64 system, the modules path will look something like this
extension_dir => /usr/lib64/php/modules => /usr/lib64/php/modules
(for RedHat EL/CentOS)

I do all my testing in PHP CLI for now, although I doubt it makes any real difference vs using an http daemon of some sort, but it's worth trying that route too in case you are using *httpd to insert/retrieve.

If you are using some redhat-like flavor of linux, I can send you my thrift_protocol.so to try as well, to see if maybe that helps.

@mjpearson
Owner

Hi, from the CLI and when using TBinaryProtocol it's writing correctly,

ie : => (column=column1, value=TEST DATA, timestamp=1266635621757)

...so looks to be a thrift_protocol.so issue (I'm running Ubuntu/JVM etc + package install of php with everything 32-bit).

TBinaryProtocol packs 64-bit microtimes via writeI64, which php_thrift_protocol.cpp doesn't look to be doing - it just casts to i64. I'll keep hacking away at it to find something lower level, until then I'll run time or microtime based on maxint size for php I think.

@redsolar

Ah, that might explain it. I am guessing thrift_protocol.so has to be 64-bit to properly write it. I bet if we looked at packed data transmitted by thrift_protocol we'd see that that's where we lose precision.

When you run ./configure on thrift_protocol, do you see
checking for int64_t... yes
checking for uint64_t... yes
?

I am pretty sure if those 2 are "no" then int64 binary writes won't work :.

So I guess what should be done is a check for 32/64 bit architecture, and 32-bit systems will have to be forced to use TBinaryProtocol instead of TBinaryProtocolAccelerated. What a shame, really, but I suppose the int64 issues aren't that new to PHP on 32-bit platforms :(

@redsolar

I'll see if i can set up a 32-bit CentOS here and test it to see if it may be isolated to specific packages, since we compile our own PHP for performance reasons.

@mjpearson
Owner

yeah making the .so reveals some precision warnings based on my box's 32-bit-ness. The original I64 cast in the module looks to be amongst the most recent patches but hasn't fixed the issue as casting a 32-bit int to 64 maintains its signing. I wouldn't think it's high on the priority list for the Thrift crew, so will look at patching myself as it can likely just be type checked and normalised by the zend api.

@redsolar

Indeed, confirmed it for Centos 5.4 x86 as well. I am seeing the same behavior.

I'll see if thrift folks are open to making a patch with a commit, but I agree, it probably won't be high on the priority list. Or maybe I can poke around thrift extension code, and see if I can make a packing patch there for 32-bit systems.

Still, I would recommend forking it out to consider the architecture in php, and using the .so on 64-bit systems, provided the other requirements are met. Difference in performance on thousands of consecutive writes is very sizeable, and considering the ubiquity of 64-bit hardware these days. And on 32-bit systems, I would simply use 10-digit timestamps, and warn that the results are not CLI-safe; and still push data through the .so.

Funny that that's how original thrift's Cassandra.php handles it anyway, I just accidentally discovered it after switching between CLI and php a few times, and not being able to write to CLI-changed columns. Took me a while to notice the extra 3 digits in timestamp :)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.