Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

615 lines (473 sloc) 26.4 KB
Changes in Version 1.7.0
This release has a few relatively large changes in it: a new connection
pool stats collector, compatibility with Cassandra 0.7 through 1.1, and a
change in timezone behavior for datetimes.
Before upgrading, take special care to make sure datetimes that you pass to
pycassa (for TimeUUIDType or DateType data) are in UTC, and make sure your code
expects to get UTC datetimes back in return.
Likewise, the SystemManager changes should be backwards compatible, but there
may be minor differences, mostly in create_column_family() and
alter_column_family(). Be sure to test any code that works programmatically
with these.
* Added StatsLogger for tracking ConnectionPool metrics
* Full Cassandra 1.1 compatibility in SystemManager. To support this, all
column family or keyspace attributes that have existed since Cassandra 0.7 may
be used as keyword arguments for create_column_family() and
alter_column_family(). It is up to the user to know which attributes are
available and valid for their version of Cassandra. As part of this change, the
version-specific thrift-generated cassandra modules (pycassa.cassandra.c07,
pycassa.cassandra.c08, and pycassa.cassandra.c10) have been replaced by
pycassa.cassandra. A minor related change is that individual connections now
now longer ask for the node’s API version, and that information is no longer
stored as an attribute of the ConnectionWrapper.
Bug Fixes
* Fix xget() paging for non-string comparators
* Add batch_insert() to ColumnFamilyMap
* Use setattr instead of directly updating the object’s __dict__ in
* ColumnFamilyMap to avoid breaking descriptors
* Fix single-column counter increments with ColumnFamily.insert()
* Include AuthenticationException and AuthorizationException in the pycassa module
* Support counters in xget()
* Sort column families in pycassaShell for display
* Raise TypeError when bad keyword arguments are used when creating a ColumnFamily object
All datetime objects create by pycassa now use UTC as their timezone
rather than the local timezone. Likewise, naive datetime objects that
are passed to pycassa are now assumed to be in UTC time, but tz_info is
respected if set.
Specifically, the types of data that you may need to make adjustments
for when upgrading are TimeUUIDType and DateType (including OldPycassaDateType
and IntermediateDateType).
Changes in Version 1.6.0
This release adds a few minor features and several important bug fixes.
The most important change to take note of if you are using composite
comparators is the change to the default inclusive/exclusive behavior for slice
Other than that, this should be a smooth upgrade from 1.5.x.
* New script for easily building RPM packages
* Add request and parameter information to PoolListener callback
* Add ColumnFamily.xget(), a generator version of get() that automatically
pages over columns in reasonably sized chunks
* Add support for Int32Type, a 4-byte signed integer format
* Add constants for the highest and lowest possible TimeUUID values to
Bug Fixes
* Various 2.4 syntax errors
* Raise AllServersUnavailable if server_list is empty
* Handle custom types inside of composites
* Don’t erase comment when updating column families
* Match Cassandra’s sorting of TimeUUIDType values when the timestamps
* This could result in some columns being erroneously left off of the end
of column slices when datetime objects or timestamps were used for
column_start or column_finish.
* Use gevent’s queue in place of the stdlib version when gevent
monkeypatching has been applied.
* Avoid sub-microsecond loss of precision with TimeUUID timestamps when
using pycassa.util.convert_time_to_uuid()
* Make default slice ends inclusive when using CompositeType comparator
* Previously, the end of the slice was exclusive by default (as was the
start of the slice when column_reversed was True)
Changes in Version 1.5.1
This release only affects those of you using DateType data, which has been
supported since pycassa 1.2.0. If you are using DateType, it is very
important that you read this closely.
DateType data is internally stored as an 8 byte integer timestamp. Since
version 1.2.0 of pycassa, the timestamp stored has counted the number of
microseconds since the unix epoch. The actual format that Cassandra
standardizes on is milliseconds since the epoch.
If you are only using pycassa, you probably won’t have noticed any problems
with this. However, if you try to use cassandra-cli, sstable2json, Hector,
or any other client that supports DateType, DateType data written by pycassa
will appear to be far in the future. Similarly, DateType data written by
other clients will appear to be in the past when loaded by pycassa.
This release changes the default DateType behavior to comply with the
standard, millisecond-based format. If you use DateType, and you upgrade to
this release without making any modifications, you will have problems.
Unfortunately, this is a bit of a tricky situation to resolve, but the
appropriate actions to take are detailed below.
To temporarily continue using the old behavior, a new class has been
created: pycassa.types.OldPycassaDateType. This will read and write DateType
data exactly the same as pycassa 1.2.0 to 1.5.0 did.
If you want to convert your data to the new format, the other new class,
pycassa.types.IntermediateDateType, may be useful. It can read either the
new or old format correctly (unless you have used dates close to 1970 with
the new format) and will write only the new format. The best case for using
this is if you have DateType validated columns that don’t have a secondary
index on them.
To tell pycassa to use OldPycassaDateType or IntermediateDateType, use the
ColumnFamily attributes that control types: column_name_class,
key_validation_class, column_validators, and so on. Here’s an example:
from pycassa.types import OldPycassaDateType, IntermediateDateType
from pycassa.column_family import ColumnFamily
from pycassa.pool import ConnectionPool
pool = ConnectionPool('MyKeyspace', [''])
# Our tweet timeline has a comparator_type of DateType
tweet_timeline_cf = ColumnFamily(pool, 'tweets')
tweet_timeline_cf.column_name_class = OldPycassaDateType()
# Our tweet timeline has a comparator_type of DateType
users_cf = ColumnFamily(pool, 'users')
users_cf.column_validators['join_date'] = IntermediateDateType()
If you’re using DateType for the key_validation_class, column names, column
values with a secondary index on them, or are using the DateType validated
column as a non-indexed part of an index clause with get_indexed_slices()
(eg. “where state = ‘TX’ and join_date > 2012”), you need to be more careful
about the conversion process, and IntermediateDateType probably isn’t a good
In most of cases, if you want to switch to the new date format, a manual
migration script to convert all existing DateType data to the new format
will be needed. In particular, if you convert keys, column names, or indexed
columns on a live data set, be very careful how you go about it. If you need
any assistance or suggestions at all with migrating your data, please feel
free to send an email to; I would be glad to help.
Changes in Version 1.5.0
The main change to be aware of for this release is the new no-retry behavior
for counter operations. If you have been maintaining a separate connection
pool with retries disabled for usage with counters, you may discontinue that
practice after upgrading.
By default, counter operations will not be retried automatically. This
makes it easier to use a single connection pool without worrying about
Bug Fixes
* Don’t remove entire row when an empty list is supplied for the columns
parameter of remove() or the batch remove methods.
* Add python-setuptools to debian build dependencies
* Batch remove() was not removing subcolumns when the specified supercolumn
was 0 or other “falsey” values
* Don’t request an extra row when reading fewer than buffer_size rows with
get_range() or get_indexed_slices().
* Remove pool_type from logs, which showed up as None in recent versions
* Logs were erroneously showing the same server for retries of failed
operations even when the actual server being queried had changed
Changes in Version 1.4.0
This release is primarily a bugfix release with a couple
of minor features and removed deprecated items.
* Accept column_validation_classes when creating or altering
column families with SystemManager
* Ignore UNREACHABLE nodes when waiting for schema version
Bug Fixes
* Remove accidental print statement in SystemManager
* Raise TypeError when unexpected types are used for
comparator or validator types when creating or altering
a Column Family
* Fix packing of column values using column-specific validators
during batch inserts when the column name is changed by packing
* Always return timestamps from inserts
* Fix NameError when timestamps are used where a DateType is
* Fix NameError in python 2.4 when unpacking DateType objects
* Handle reading composites with trailing components
* Upgrade to fix broken setuptools link
Removed Deprecated Items
The following items have been removed:
* pycassa.connect()
* pycassa.connect_thread_local()
* ConnectionPool.status()
* ConnectionPool.recreate()
Changes in Version 1.3.0
This release adds full compatibility with Cassandra 1.0 and removes support
for schema manipulation in Cassandra 0.7.
In this release, schema manipulation should work with Cassandra 0.8 and 1.0,
but not 0.7. The data API should continue to work with all three versions.
Bug Fixes
* Don’t ignore columns parameter in ColumnFamilyMap.insert()
* Handle empty instance fields in ColumnFamilyMap.insert()
* Use the same default for timeout in pycassa.connect() as ConnectionPool
* Fix typo which caused a different exception to be thrown when an
AllServersUnavailable exception was raised
* IPython 0.11 compatibility in pycassaShell
* Correct dependency declaration in
* Add UUIDType to supported types
The filter_empty parameter was added to get_range() with a default of True;
this allows empty rows to be kept if desired
Changes in Version 1.2.1
This is strictly a bug-fix release addressing a few issues created in 1.2.0.
Bug Fixes
* Correctly check for Counters in ColumnFamily when setting
* Pass kwargs in ColumnFamilyMap to ColumnFamily
* Avoid potential UnboundLocal in ConnectionPool.execute() when get() fails
* Fix ez_setup dependency/bundling so that package installations using
easy_install or pip don’t fail without ez_setup installed
Changes in Version 1.2.0
This should be a fairly smooth upgrade from pycassa 1.1. The
primary changes that may introduce minor incompatibilities are
the changes to ColumnFamilyMap and the automatic skipping of
"ghost ranges" in .ColumnFamily.get_range().
* Add ConnectionPool.fill()
* Add FloatType, DoubleType, DateType, and BooleanType support.
* Add CompositeType support for static composites. See "Composite Types"
for more details.
* Add timestamp, ttl to ColumnFamilyMap.insert() params
* Support variable-length integers with IntegerType. This allows more
space-efficient small integers as well as integers that exceed the size
of a long.
* Make ColumnFamilyMap a subclass of ColumnFamily instead of using one
as a component. This allows all of the normal adjustments normally done
to a ColumnFamily to be done to a ColumnFamilyMap instead. See "Class
Mapping with Column Family Map" for examples of using the new version.
* Expose the following ConnectionPool attributes, allowing them to be
altered after creation: max_overflow, pool_timeout, recycle,
max_retries, and logging_name. Previously, these were all supplied as
constructor arguments. Now, the preferred way to set them is to alter
the attributes after creation. (However, they may still be set in the
constructor by using keyword arguments.)
* Automatically skip "ghost ranges" in ColumnFamily.get_range().
Rows without any columns will not be returned by the generator,
and these rows will not count towards the supplied row_count.
Bug Fixes
* Add connections to ConnectionPool more readily when prefill is False.
Before this change, if the ConnectionPool was created with prefill=False,
connections would only be added to the pool when there was concurrent
demand for connections. After this change, if prefill=False and
pool_size=N, the first N operations will each result in a new connection
being added to the pool.
* Close connection and adjust the ConnectionPool‘s connection count after a
TApplicationException. This exception generally indicates programmer error,
so it’s not extremely common.
* Handle typed keys that evaluate to False
* ConnectionPool.recreate()
* ConnectionPool.status()
* Better failure messages for ConnectionPool failures
* More efficient packing and unpacking
* More efficient multi-column inserts in ColumnFamily.insert() and
* Prefer Python 2.7’s collections.OrderedDict over the bundled version when
Changes in Version 1.1.1
* Add max_count and column_reversed params to get_count()
* Add max_count and column_reversed params to multiget_count()
Bug Fixes
* Don’t retry operations after a TApplicationException. This exception is
reserved for programmatic errors (such as a bad API parameters), so
retries are not needed.
* If the read_consistency_level kwarg was used in a ColumnFamily
constructor, it would be ignored, resulting in a default read
consistency level of ONE. This did not affect the read consistency
level if it was specified in any other way, including per-method or by
setting the read_consistency_level attribute.
Changes in Version 1.1.0
This release adds compatibility with Cassandra 0.8, including support for
counters and key_validation_class. This release is backwards-compatible with
Cassandra 0.7, and can support running against a mixed cluster of both
Cassandra 0.7 and 0.8.
Changes related to Cassandra 0.8
* Addition of COUNTER_COLUMN_TYPE to system_manager.
* Several new column family attributes, including key_validation_class,
replicate_on_write, merge_shards_chance, row_cache_provider, and key_alias.
* The new ColumnFamily.add() and ColumnFamily.remove_counter() methods.
* Support for counters in pycassa.batch and ColumnFamily.batch_insert().
* Autopacking of keys based on key_validation_class.
Other Features
* ColumnFamily.multiget() now has a buffer_size parameter
* ColumnFamily.multiget_count() now returns rows in the order that the
keys were passed in, similar to how multiget() behaves. It also uses
the dict_class attribute for the containing class instead of always
using a dict.
* Autpacking behavior is now more transparent and configurable, allowing
the user to get functionality similar to the CLI’s assume command, whereby
items are packed and unpacked as though they were a certain data type,
even if Cassandra does not use a matching comparator type or validation
class. This behavior can be controlled through the following attributes:
- ColumnFamily.column_name_class
- ColumnFamily.super_column_name_class
- ColumnFamily.key_validation_class
- ColumnFamily.default_validation_class
- ColumnFamily.column_validators
* A ColumnFamily may reload its schema to handle changes in validation
classes with ColumnFamily.load_schema().
Bug Fixes
There were several related issues with overlow in ConnectionPool:
* Connection failures when a ConnectionPool was in a state of overflow
would not result in adjustment of the overflow counter, eventually
leading the ConnectionPool to refuse to create new connections.
* Settings of -1 for ConnectionPool.overflow erroneously caused overflow
to be disabled.
* If overflow was enabled in conjunction with prefill being disabled,
the effective overflow limit was raised to max_overflow + pool_size.
* Overflow is now disabled by default in ConnectionPool.
* ColumnFamilyMap now sets the underlying ColumnFamily‘s
autopack_names and autopack_values attributes to False upon construction.
* Documentation and tests will no longer be included in the
packaged tarballs.
Removed Deprecated Items
The following deprecated items have been removed:
* ColumnFamilyMap.get_count()
* The instance parameter from ColumnFamilyMap.get_indexed_slices()
* The Int64 Column type.
* SystemManager.get_keyspace_description()
Athough not technically deprecated, most ColumnFamily constructor
arguments should instead be set by setting the corresponding
attribute on the ColumnFamily after construction. However, all
previous constructor arguments will continue to be supported if
passed as keyword arguments.
Changes in Version 1.0.8
* Pack IndexExpression values in get_indexed_slices() that are supplied
through the IndexClause instead of just the instance parameter.
* Column names and values which use Cassandra’s IntegerType are unpacked
as though they are in a BigInteger-like format. This is (backwards)
compatible with the format that pycassa uses to pack IntegerType data.
This fixes an incompatibility with the format that cassandra-cli and
other clients use to pack IntegerType data.
* Restore Python 2.5 compatibility that was broken through out of order
keyword arguments in ConnectionWrapper.
* Pack column_start and column_finish arguments in ColumnFamily *get*()
methods when the super_column parameter is used.
* Issue a DeprecationWarning when a method, parameter, or class that has
been deprecated is used. Most of these have been deprecated for several
releases, but no warnings were issued until now.
* Deprecations are now split into separate sections for each release in the changelog.
Deprecated in Version 1.0.8
* The instance parameter of ColumnFamilyMap.get_indexed_slices()
Changes in Version 1.0.7
* Catch KeyError in pycassa.columnfamily.ColumnFamily.multiget() empty row
removal. If the same non-existent key was passed multiple times, a
KeyError was raised when trying to remove it from the OrderedDictionary
after the first removal. The KeyError is caught and ignored now.
* Handle connection failures during retries. When a connection fails, it
tries to create a new connection to replace itself. Exceptions during
this process were not properly handled; they are now handled and count
towards the retry count for the current operation.
* Close connection when a MaximumRetryException is raised. Normally a
connection is closed when an operation it is performing fails, but this
was not happening for the final failure that triggers the
Changes in Version 1.0.6
* Add EOFError to the list of exceptions that cause a connection swap and
* Improved autopacking efficiency for AsciiType, UTF8Type, and BytesType
* Preserve sub-second timestamp precision in datetime arguments for
insertion or slice bounds where a TimeUUID is expected. Previously,
precision below a second was lost.
* In a MaximumRetryException‘s message, include details about the last
Exception that caused the MaximumRetryException to be raised
* pycassa.pool.ConnectionPool.status() now always reports a non-negative
overflow; 0 is now used when there is not currently any overflow
* Created pycassa.types.Long as a replacement for pycassa.types.Int64.
Long uses big-endian encoding, which is compatible with Cassandra’s LongType,
while Int64 used little-endian encoding.
Deprecated in Version 1.0.6
* pycassa.types.Int64 has been deprecated in favor of pycassa.types.Long
Changes in Version 1.0.5
* Assume port 9160 if only a hostname is given
* Remove super_column param from pycassa.columnfamily.ColumnFamily.get_indexed_slices()
* Enable failover on functions that previously lacked it
* Increase base backoff time to 0.01 seconds
* Add a timeout parameter to pycassa.system_manager.SystemManger
* Return timestamp on single-column inserts
Changes in Version 1.0.4
* Fixed threadlocal issues that broke multithreading
* Fix bug in pycassa.columnfamily.ColumnFamily.remove() when a super_column
argument is supplied
* Fix minor PoolLogger logging bugs
* Added pycassa.system_manager.SystemManager.describe_partitioner()
* Added pycassa.system_manager.SystemManager.describe_snitch()
* Added pycassa.system_manager.SystemManager.get_keyspace_properties()
* Moved pycassa.system_manager.SystemManager.describe_keyspace() and
pycassa.system_manager.SystemManager.describe_column_family() to
pycassaShell describe_keyspace() and describe_column_family()
Deprecated in Version 1.0.4
* Renamed pycassa.system_manager.SystemManager.get_keyspace_description()
to pycassa.system_manager.SystemManager.get_keyspace_column_families()
and deprecated the previous name
Changes in Version 1.0.3
* Fixed supercolumn slice bug in get()
* pycassaShell now runs scripts with execfile to allow for multiline statements
* 2.4 compatability fixes
Changes in Version 1.0.2
* Failover handles a greater set of potential failures
* pycassaShell now loads/reloads pycassa.columnfamily.ColumnFamily instances when the underlying column family is created or updated
* Added an option to pycassaShell to run a script after startup
* Added pycassa.system_manager.SystemManager.list_keyspaces()
Changes in Version 1.0.1
* Allow pycassaShell to be run without specifying a keyspace
* Added pycassa.system_manager.SystemManager.describe_schema_versions()
Changes in Version 1.0.0
* Created the SystemManager class to allow for keyspace, column family, and
index creation, modification, and deletion. These operations are no longer
provided by a Connection class.
* Updated pycassaShell to use the SystemManager class
* Improved retry behavior, including exponential backoff and proper resetting
of the retry attempt counter
* Condensed connection pooling classes into only pycassa.pool.ConnectionPool
to provide a simpler API
* Changed pycassa.connection.connect() to return a connection pool
* Use more performant Thrift API methods for insert() and get() where possible
* Bundled OrderedDict and set it as the default dictionary class for column families
* Provide better TypeError feedback when columns are the wrong type
* Use Thrift API 19.4.0
Deprecated in Version 1.0.0
* ColumnFamilyMap.get_count() has been deprecated. Use
ColumnFamily.get_count() instead.
Changes in Version 0.5.4
* Allow for more backward and forward compatibility
* Mark a server as being down more quickly in Connection
Changes in Version 0.5.3
* Added PooledColumnFamily, which makes it easy to use connection pooling
automatically with a ColumnFamily.
Changes in Version 0.5.2
* Support for adding/updating/dropping Keyspaces and CFs in pycassa.connection.Connection
* get_range() optimization and more configurable batch size
* batch get_indexed_slices() similar to get_range()
* Reorganized pycassa logging
* More efficient packing of data types
* Fix error condition that results in infinite recursion
* Limit pooling retries to only appropriate exceptions
* Use Thrift API 19.3.0
Changes in Version 0.5.1
* Automatically detect if a column family is a standard column family or a super column family
* multiget_count() support
* Allow preservation of key order in multiget() if an ordered dictionary is used
* Convert timestamps to v1 UUIDs where appropriate
* pycassaShell documentation
* Use Thrift API 17.1.0
Changes in Version 0.5.0
* Connection Pooling support: pycassa.pool
* Started moving logging to pycassa.logger
* Use Thrift API 14.0.0
Changes in Version 0.4.3
* Autopack on CF’s default_validation_class
* Use Thrift API 13.0.0
Changes in Version 0.4.2
* Added batch mutations interface: pycassa.batch
* Made bundled thrift-gen code a subpackage of pycassa
* Don’t attempt to reencode already encoded UTF8 strings
Changes in Version 0.4.1
* Added batch_insert()
* Redifined insert() in terms of batch_insert()
* Fixed UTF8 autopacking
* Convert datetime slice args to uuids when appropriate
* Changed how thrift-gen code is bundled
* Assert that the major version of the thrift API is the same on the client
and on the server
* Use Thrift API 12.0.0
Changes in Version 0.4.0
* Added pycassaShell, a simple interactive shell
* Converted the test config from xml to yaml
* fixed overflow error on get_count()
* Only insert columns which exist in the model object
* Make ColumnFamilyMap not ignore the ColumnFamily’s dict_class
* Specify keyspace as argument to connect()
* Add support for framed transport and default to using it
* Added autopacking for column names and values
* Added support for secondary indexes with get_indexed_slices() and
* Added truncate()
* Use Thrift API 11.0.0
Jump to Line
Something went wrong with that request. Please try again.