Skip to content

ColumnFamily.batch_insert doesnt handle column-specific validators correctly. #107

Closed
Anvil opened this Issue Jan 5, 2012 · 3 comments

2 participants

@Anvil
Anvil commented Jan 5, 2012

We've encountered some exception while using the ColumnFamily.batch_insert method.

And after some investigation, it appears there's some inconsistency surrounding the col_name parameter of the ColumnFamily._pack_value method.

Reading the columnfamily.py file, i've managed to get figure out that :
1. ColumnValidatorDict keys are actually packed,
2. the ColumnFamily._pack_value method actually doesn't pack the col_name parameter
3. ColumnFamily.get_indexed_slices and ColumnFamily.insert do pack column names before invoking _pack_value.
4. ColumnFamily._make_mutation_list which is called by batch_insert, and which expects a non-packed column name, calls _pack_value with an unpacked column name (twice).

The _make_mutation_list method does raise a TypeError exception if the column validator is not the same same as the default column validator.

Fixing this issue would be to either pack the column name in _make_mutation_list.

 def _make_mutation_list(self, columns, timestamp, ttl):
     _pack_name = self._pack_name
     _pack_value = self._pack_value
     _validate = lambda _: not isinstance(_[1], types.CassandraType)
     if not self.super:
-            return map(lambda (c, v): Mutation(self._make_cosc(_pack_name(c), _pack_value(v, _pack_name(c)), timestamp, ttl)),
+            return map(lambda (c, v): Mutation(self._make_cosc(_pack_name(c), _pack_value(v, c), timestamp, ttl)),
                    columns.iteritems())
     else:
         mut_list = []
         for super_col, subcs in columns.items():
-                subcols = map(lambda (c, v): self._make_column(_pack_name(c), _pack_value(v, _pack_name(c)), timestamp, ttl),
+                subcols = map(lambda (c, v): self._make_column(_pack_name(c), _pack_value(v, c), timestamp, ttl),
                           subcs.iteritems())
             mut_list.append(Mutation(self._make_cosc(_pack_name(super_col, True), subcols)))
         return mut_list

OR to make _pack_value do the column name packing job and never bother about that in the calling methods. (imho, this could make the code a bit simpler).

@thobbs thobbs closed this in 3deb3d9 Jan 5, 2012
@thobbs
pycassa member
thobbs commented Jan 5, 2012

Thanks for digging in to this!

I liked your suggestion of packing the column name in _pack_value better.

@Anvil
Anvil commented Jan 6, 2012

Another possibility would be to use unpacked column names as keys in ColumnValidatorDict. This could make the code a bit faster, since nobody would have to do column name packing anymore. But this could be a bit more complicated to implement.

@thobbs
pycassa member
thobbs commented Jan 6, 2012

Another possibility would be to use unpacked column names as keys in ColumnValidatorDict. This could make the code a bit faster, since nobody would have to do column name packing anymore. But this could be a bit more complicated to implement.

Originally I think I avoided doing this so that the user could easily set column_validators themself, but now that I think about it, you could unpack whatever they supply before storing it internally. You would need to do a bit of extra translation when setting (and maybe reading) the column validators, but of course that's much less common than the normal insert/fetch case. I'll definitely look into that, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.