Clickhouse backend support django builtin fields and clickhouse specific fields.
Note: You should always use clickhouse specific fields in new projects. Support for django built-in fields is only for compatibility with existing third-party apps.
Note: ForeignKey, ManyToManyField or even OneToOneField could be used with clickhouse backend. But no database level constraints will be added, so there could be some consistency problems.
The following django fields are supported:
Class | DB Type | Pythonic Type | Comments |
---|---|---|---|
django.db.models.SmallAutoField | Int64 | int | clickhouse.idworker.snowflake.SnowflakeIDWorker will generate value automatically. |
django.db.models.AutoField | Int64 | int | clickhouse.idworker.snowflake.SnowflakeIDWorker will generate value automatically. |
django.db.models.BigAutoField | Int64 | int | clickhouse.idworker.snowflake.SnowflakeIDWorker will generate value automatically. |
django.db.models.CharField | FixedString | str | Encoded as UTF-8 when written to ClickHouse |
django.db.models.TextField | String | str | Encoded as UTF-8 when written to ClickHouse |
django.db.models.BinaryField | String | bytes or str | |
django.db.models.SlugField | String | ||
django.db.models.FileField | String | ||
django.db.models.FilePathField | String | ||
django.db.models.DateField | Date32 | datetime.date | Range 1900-01-01 to 2299-12-31; Date exceed this range will be stored as min value or max value. |
django.db.models.DateTimeField | DateTime64 | datetime.datetime | Range 1900-01-01 00:00:00, 2299-12-31 23:59:59.999999; Timezone aware; Datetime exceed this range will be stored as min value or max value. |
django.db.models.SmallIntegerField | Int16 | int | Range -32768 to 32767 |
django.db.models.IntegerField | Int32 | int | Range -2147483648 to 2147483647 |
django.db.models.BigIntegerField | Int64 | int | Range -9223372036854775808 to 9223372036854775807 |
django.db.models.SmallIntegerField | UInt16 | int | Range 0 to 32767 |
django.db.models.IntegerField | UInt32 | int | Range 0 to 2147483647 |
django.db.models.BigIntegerField | UInt64 | int | Range 0 to 9223372036854775807 |
django.db.models.FloatField | Float64 | float | |
django.db.models.DecimalField | Decimal | Decimal | Pythonic values are rounded to fit the scale of the database field |
django.db.models.UUIDField | UUID | uuid.UUID | |
django.db.models.IPAddressField | IPv4 | ipaddress.IPv4Address | |
django.db.models.BooleanField | Bool | ||
NullableField | Nullable | When null=True |
If a clickhouse data type supports LowCardinality, there will be a low_cardinality
parameter in the corresponding model field.
If a clickhouse data type does not support Nullable, an error will occur when performing django checks if you set null=True
in model field.
Passing null=True
will make a field Nullable.
Passing low_cardinality=True
will make a field LowCardinality.
Name of field class is always concat clickhouse data type name to Field
.
For example, DateTime64 field is named as DateTime64Field
.
All clickhouse fields are imported from clickhouse_backend.models
Supported data types are:
- Float32/Float64
- Int8/Int16/Int32/Int64/Int128/Int256
- UInt8/UInt16/UInt32/UInt64/UInt128/UInt256
- Date/Date32
- DateTime/DateTime64
- String/FixedString(N)
- Enum/Enum8/Enum16
- Array(T)
- Bool
- UUID
- Decimal
- IPv4/IPv6
- LowCardinality(T)
- Tuple(T1, T2, ...)
- Map(key, value)
- JSON
Fields importing path: clickhouse_backend.models.[U]Int(8|16|32|64|128|256)Field
For example, UInt8 type is imported from clickhouse_backend.models.UInt8Field
Both Nullable and LowCardinality are supported for all integer types.
All UInt types will have correct range validators.
For example, clickhouse_backend.models.UInt16Field
have a range from 0 to 65535.
As a contrast, django.db.models.PositiveSmallIntegerField
have a range from 0 to 32767, causing half range waisted.
Fields importing path: clickhouse_backend.models.Float(32|64)Field
For example, Float32 type is imported from clickhouse_backend.models.Float32Field
Both Nullable and LowCardinality are supported for Float32Field and Float64Field.
Field importing path: clickhouse_backend.models.DecimalField
Nullable is supported but LowCardinality is not supported.
Field importing path: clickhouse_backend.models.BoolField
Both Nullable and LowCardinality are supported.
Field importing path: clickhouse_backend.models.StringField
Both Nullable and LowCardinality are supported.
Clickhouse String type is more like bytes type in python. StringField support bytes or str type when assign value. Python string will be UTF-8 encoded when stored in clickhouse.
Note: max_length
will have odd behavior when you provide bytes type.
For example, if max_length=4
, '世界'
is a valid value,
but the corresponding encoded bytes b'\xe4\xb8\x96\xe7\x95\x8c'
is an invalid value because it's length is 6.
Field importing path: clickhouse_backend.models.FixedStringField
A max_bytes
parameter is required as the bytes length of FixedString.
Both Nullable and LowCardinality are supported.
Clickhouse FixedString type is more like bytes type in python. FixedStringField support bytes or str type when assign value. Python string will be UTF-8 encoded when stored in clickhouse.
Note: max_length
will have odd behavior when you provide bytes type.
For example, if max_length=4
, '世界'
is a valid value,
but the corresponding encoded bytes b'\xe4\xb8\x96\xe7\x95\x8c'
is an invalid value because it's length is 6.
Field importing path: clickhouse_backend.models.UUIDField
Both Nullable and LowCardinality are supported.
But creating columns of type LowCardinality(Nullable(UUID)) is prohibited by default due to expected negative impact on performance. It can be enabled with the "allow_suspicious_low_cardinality_types" setting.
DATABASES = {
'default': {
'ENGINE': 'clickhouse_backend.backend',
'OPTIONS': {
'settings': {
'allow_suspicious_low_cardinality_types': 1,
}
}
}
}
Fields importing path: clickhouse_backend.models.Date[32]Field
Dates query is supported by DateField and Date32Field.
All date lookup are supported by DateField and Date32Field.
Both Nullable and LowCardinality are supported.
But creating columns of type LowCardinality(DateTime) or LowCardinality(DateTime32) is prohibited by default due to expected negative impact on performance. It can be enabled with the "allow_suspicious_low_cardinality_types" setting.
DATABASES = {
'default': {
'ENGINE': 'clickhouse_backend.backend',
'OPTIONS': {
'settings': {
'allow_suspicious_low_cardinality_types': 1,
}
}
}
}
Clickhouse support integer or float when assign a Date or Date32 column. But supporting integer or float may cause strange behavior, as integer and float are treated as unix timestamps, clickhouse store them as date in the server timezone, which may not be what you want. So this feature is not implemented currently.
Fields importing path: clickhouse_backend.models.DateTime[64]Field
DateTime64Field have a precision
parameter which default to 6.
Dates query and datetimes query are supported by DateTimeField and DateTime64Field.
All date lookup are supported by DateTimeField and DateTime64Field.
Both Nullable and LowCardinality are supported by DateTime. But LowCardinality is not supported by DateTime64.
Clickhouse support integer or float when assign a DateTime or DateTime64 column. This feature is also implemented by clickhouse backend.
Float value is treated as unix timestamp in DateTime and DateTime64.
Int value is treated as unix timestamp in DateTime.
But meaning of int value depending on the precision of DateTime64.
For example, DateTime64(3)
treats int vale as milliseconds from unix epoch,
DateTime64(6)
treats int vale as microseconds from unix epoch.
Fields importing path: clickhouse_backend.models.Enum[8|16]Field
Nullable is supported but LowCardinality is not supported.
The choices
parameter is required when define an Enum field,
and choices
must be an iterable containing only (int, str) tuples.
and int in choices
must not exceed range.
EnumField and Enum16Field range from -32768 to 32767, Enum8Field range from -128 to 127.
Use return_int
(default True
) to control whether to get an int or str value when querying from the database.
If an invalid choices is provided, error is raised when execute django checks.
When assign value to an EnumField, both choice value and choice label are supported.
When query from database, str value is returned.
Usage example:
from clickhouse_backend import models
from django.db.models import IntegerChoices
class EnumTest(models.ClickhouseModel):
class Fruit(IntegerChoices):
banana = 1, 'banana'
pear = 2, 'pear'
apple = 3, 'apple'
enum = models.EnumField(choices=Fruit.choices, return_int=False)
def __str__(self):
return str(self.enum)
EnumTest.objects.bulk_create([
EnumTest(enum=1),
EnumTest(enum='pear'),
EnumTest(enum=b'apple')
])
# [<EnumTest: 1>, <EnumTest: pear>, <EnumTest: b'apple'>]
EnumTest.objects.filter(enum=1)
# <QuerySet [<EnumTest: banana>]>
EnumTest.objects.filter(enum='pear')
# <QuerySet [<EnumTest: pear>]>
EnumTest.objects.filter(enum=b'apple')
# <QuerySet [<EnumTest: apple>]>
EnumTest.objects.filter(enum__gt=1)
# <QuerySet [<EnumTest: pear>, <EnumTest: apple>]>
EnumTest.objects.filter(enum__gte='pear')
# <QuerySet [<EnumTest: pear>, <EnumTest: apple>]>
EnumTest.objects.filter(enum__contains='ana')
# <QuerySet [<EnumTest: banana>]>
Fields importing path: clickhouse_backend.models.IPv(4|6)Field
Both Nullable and LowCardinality are supported.
IPv4Field supported ipaddress.IPv4Address
or str when assign value.
IPv6Field supported ipaddress.IPv4Address
, ipaddress.IPv6Address
or str when assign value.
When query from the database, str is returned.
The GenericIPAddressField exists to provide same behavior as django builtin GenericIPAddressField.
In the database, GenericIPAddressField is stored as IPv6. So it behave mostly like IPv6Field
except that GenericIPAddressField will try to return ipv4 mapped address when unpack_ipv4=True
.
Usage Example:
from clickhouse_backend import models
class IPTest(models.ClickhouseModel):
ipv4 = models.IPv4Field()
ipv6 = models.IPv6Field()
ip = models.GenericIPAddressField(unpack_ipv4=True)
def __str__(self):
return str(self.ip)
IPTest.objects.create([
IPTest(ipv4='1.2.3.4',ipv6='1.2.3.4',ip='1.2.3.4'),
IPTest(ipv4='2.3.4.5',ipv6='2.3.4.5',ip='2.3.4.5'),
])
# [<IPTest: 1.2.3.4>, <IPTest: 2.3.4.5>]
IPTest.objects.filter(ipv4='1.2.3.4')
# <QuerySet [<IPTest: 1.2.3.4>]>
IPTest.objects.filter(ipv6='1.2.3.4')
# <QuerySet [<IPTest: 1.2.3.4>]>
IPTest.objects.filter(ip='1.2.3.4')
# <QuerySet [<IPTest: 1.2.3.4>]>
IPTest.objects.filter(ipv6__gt='1.2.3.4')
# <QuerySet [<IPTest: 2.3.4.5>]>
ip = IPTest.objects.get(ipv6__contains='4.5')
ip.ipv4, ip.ipv6, ip.ip
# ('2.3.4.5', '::ffff:203:405', '2.3.4.5')
Field importing path: clickhouse_backend.models.ArrayField
Neither Nullable nor LowCardinality is supported.
A position base_field
parameter is required when using ArrayField. The base field can be any model field instance
including ArrayField, TupleField and MapField.
An optional size
parameter can be provided to limit length of array value.
from clickhouse_backend import models
class NestedArrayModel(models.ClickhouseModel):
array = models.ArrayField(
base_field=models.ArrayField(
base_field=models.ArrayField(
models.UInt32Field()
)
)
)
NestedArrayModel.objects.create(
array=[
[[12, 13, 0, 1], [12]],
[[12, 13, 0, 1], [12], [13, 14]]
]
)
Contains checks whether one array is a subset of another.
NestedArrayModel.objects.filter(array__contains=[[[12, 13, 0, 1], [12]]]).exists()
# True
Contained by is the reverse lookup of contains
NestedArrayModel.objects.filter(array__contained_by=[
[[12, 13, 0, 1], [12]],
[[12, 13, 0, 1], [12], [13, 14]],
[[1]]
]).exists()
# True
Exact lookup is also supported.
NestedArrayModel.objects.filter(
array=[
[[12, 13, 0, 1], [12]],
[[12, 13, 0, 1], [12], [13, 14]]
]
).exists()
# True
Overlap checks whether two arrays have intersection by some elements.
NestedArrayModel.objects.filter(array__overlap=[
[[12, 13, 0, 1], [12]],
[[1]]
]).exists()
# True
Any checks whether one array has the specific element.
NestedArrayModel.objects.filter(array__any=[[12, 13, 0, 1], [12]]).exists()
# True
Returns the number of items in the array.
NestedArrayModel.objects.filter(array__len=2).exists()
# True
Returns element by index.
NestedArrayModel.objects.filter(array__1=[[12, 13, 0, 1], [12], [13, 14]]).exists()
# True
NestedArrayModel.objects.filter(array__1__2=[13, 14]).exists()
# True
NestedArrayModel.objects.filter(array__1__2__0=13).exists()
# True
Returns elements by range.
NestedArrayModel.objects.filter(array__1_2=[[[12, 13, 0, 1], [12], [13, 14]]]).exists()
# True
NestedArrayModel.objects.filter(array__1__0__0_2=[12, 13]).exists()
# True
Field importing path: clickhouse_backend.models.TupleField
Neither Nullable nor LowCardinality is supported.
A position base_fields
parameter is required when using TupleField.
base_fields
must be an iterable containing only(not both) field instances or (field name, field instance) tuples, and field name must be valid python identifier.
The base field can be any model field instance including ArrayField, TupleField and MapField.
When query from the database, TupleFile get tuple or named tuple.
Usage example:
from clickhouse_backend import models
class TupleModel(models.ClickhouseModel):
tuple = models.TupleField(
base_fields=[
models.Int8Field(),
models.StringField(),
models.GenericIPAddressField(unpack_ipv4=True)
]
)
class NamedTupleModel(models.ClickhouseModel):
tuple = models.TupleField(base_fields=[
("int", models.Int8Field()),
("str", models.StringField()),
("ip", models.GenericIPAddressField(unpack_ipv4=True))
])
v = [100, "test", "::ffff:3.4.5.6"]
TupleModel.objects.create(tuple=v)
NamedTupleModel.objects.create(tuple=v)
TupleModel.objects.get().tuple
# (100, 'test', '3.4.5.6')
NamedTupleModel.objects.get().tuple
# Tuple(int=100, str='test', ip='3.4.5.6')
# index lookup
TupleModel.objects.filter(tuple__1="test").exists()
# True
NamedTupleModel.objects.filter(tuple__str="test").exists()
# True
NamedTupleModel.objects.filter(tuple__ip__startswith="3.4").exists()
# True
Field importing path: clickhouse_backend.models.MapField
Neither Nullable nor LowCardinality is supported.
Position key_fields
and value_field
parameter are required when using MapField.
The value field can be any model field instance including ArrayField, TupleField and MapField.
Valid key fields are:
- Int8Field
- Int16Field
- Int32Field
- Int64Field
- Int128Field
- Int256Field
- UInt8Field
- UInt16Field
- UInt32Field
- UInt64Field
- UInt128Field
- UInt256Field
- BooleanField
- StringField
- FixedStringField
- UUIDField
- DateField
- Date32Field
- DateTimeField
- DateTime64Field
- EnumField
- Enum8Field
- Enum16Field
- IPv4Field
- IPv6Field
- GenericIPAddressField
When query from the database, MapField get dict.
from clickhouse_backend import models
class MapModel(models.ClickhouseModel):
map = models.MapField(
models.StringField(low_cardinality=True),
models.GenericIPAddressField(unpack_ipv4=True)
)
MapModel.objects.create(
map={
"baidu": "39.156.66.10",
"bing.com": "13.107.21.200",
"google.com": "172.217.163.46"
}
)
Test whether a map value contains the key.
MapModel.objects.filter(map__has_key="baidu").exists()
# True
Return elements number in the map.
MapModel.objects.values('map__len')
# <QuerySet [{'map__len': 3}]>
Return map keys.
MapModel.objects.values('map__keys')
# <QuerySet [{'map__keys': ['baidu', 'bing.com', 'google.com']}]>
Return map values.
MapModel.objects.values('map__values')
# <QuerySet [{'map__values': ['39.156.66.10', '13.107.21.200', '172.217.163.46']}]>
Get the value of specific key.
MapModel.objects.values('map__baidu')
# <QuerySet [{'map__baidu': '39.156.66.10'}]>
# If the map key is conflict with other lookup names
# or if the map key contains spaces or punctuations.
# An explict transform can be used.
from clickhouse_backend import models
from clickhouse_backend.models.fields.map import KeyTransform
MapModel.objects.annotate(
value=KeyTransform("len", models.StringField(), models.GenericIPAddressField(), "map")
).values('value')
# here return a default empty value of IPv6
# <QuerySet [{'value': '::'}]>
MapModel.objects.annotate(
value=KeyTransform("bing.com", models.StringField(), models.GenericIPAddressField(), "map")
).values('value')
# <QuerySet [{'value': '::ffff:d6b:15c8'}]>
Field importing path: clickhouse_backend.models.JSONField
.
Neither Nullable nor LowCardinality is supported.
When query from the database, JSONField get dict or list.
The JSON data type is an experimental feature. To use it, set allow_experimental_object_type = 1
in the database settings.
For example:
DATABASES = {
'default': {
'ENGINE': 'clickhouse_backend.backend',
'OPTIONS': {
'settings': {
'allow_experimental_object_type': 1,
}
}
}
}
Currently only key lookup is supported.
from clickhouse_backend import models
class JSONModel(models.ClickhouseModel):
json = models.JSONField()
v = {'a': [1, 2, 3], 'b': [{'c': 1}, {'d': 2}], 'c': {'d': 'e'}}
instance = JSONModel.objects.create(json=v)
instance.refresh_from_db()
instance.json
# {'a': [1, 2, 3], 'b': [{'c': 1, 'd': 0}, {'c': 0, 'd': 2}], 'c': {'d': 'e'}}
Note JSONField value may change after saved to database. This is because clickhouse internally store JSON as Tuple and Array. Clickhouse try best to store JSON in a uniform schema. Sometimes when you insert a JSON value that is not compatible with existing schema, clickhouse will fail.
Get the value of specific key.
JSONModel.objects.values('json__a')
# <QuerySet [{'json__a': [1, 2, 3]}]>
JSONModel.objects.values('json__b__0__c')
# <QuerySet [{'json__b__0__c': 1}]>
JSONModel.objects.values('json__c__d')
# <QuerySet [{'json__c__d': 'e'}]>
JSONModel.objects.values('json__c')
# <QuerySet [{'json__c': {'d': 'e'}}]>
JSONModel.objects.filter(json__c={'any_key': 'e'}).exists()
# True
JSONModel.objects.filter(json__c=('e',)).exists()
# True
Note the strange behaviors of the last two examples. That's because clickhouse store python dict as named tuple,
named tuples are just normal tuples when compared, the name
is not taken into account.