# Overview

In this tutorial, we cover basic data structures and data types in Koda and show conversions from/to corresponding Python primitives/objects.

Koda provides convenient APIs to create and work with Koda primitives, Lists, Dicts and Objects as an alternative to Python primitives, lists, dicts and objects of dataclasses. It allows easy conversion between Koda primitives/items and Python primitives/objects. Koda primitives/lists/dicts/objects are represented by an unified interface **DataItem**.

In [2]:
from koladata import kd

# Data Types

The most basic data structure in Koda is **Koda Item** which represents a single item. An item can be primitive or non-primitive. The Python class for Koda Item is **DataItem**.

## Primitive Types

In Koda, there are 8 primitive types which are also referred to as **dtypes**.

-   `INT32`
-   `INT64`
-   `FLOAT32`
-   `FLOAT64`
-   `TEXT`
-   `BYTES`
-   `BOOLEAN`: can have three states: `True`, `False` and `None`
-   `MASK`: represents **presence** and can have two states: `present` and
    `missing`

To create a primitive DataItem, we use `kd.item(py_primitive)` except for `MASK` dtype which does not have native representation in Python. For example,

In [None]:
kd.item(1)

DataItem(1, schema: INT32)

In [None]:
kd.item(2.0)

DataItem(2.0, schema: FLOAT32)

In [None]:
kd.item('string')

DataItem('string', schema: TEXT)

In [None]:
kd.item(b'bytes')

DataItem(b'bytes', schema: BYTES)

In [None]:
kd.item(True)

DataItem(True, schema: BOOLEAN)

The two values of `MASK` (`present` and `missing`) can be accessed directly as `kd.present` and `kd.missing`.

In [None]:
kd.present

DataItem(present, schema: MASK)

In [None]:
kd.missing

DataItem(None, schema: MASK)

To distinguish `INT32` vs `INT64` and `FLOAT32` vs `FLOAT64`, we can specify a `dtype` argument in `kd.item(py_primitive, dtype=)`. The primitive types are accessiable from `kd` module. For example, `kd.INT32`, `kd.FLOAT64`.

In [None]:
kd.item(1, dtype=kd.INT64)

DataItem(1, schema: INT64)

We can have a missing Koda Item with `dtype`. To create a missing Koda item, we can use `kd.item(None, dtype=)`. Note `kd.missing` is equivalent to `kd.item(None, dtype=kd.MASK)` which has `MASK` type.

In [None]:
kd.item(None, dtype=kd.INT32)

DataItem(None, schema: INT32)

In [None]:
kd.item(None, dtype=kd.BOOLEAN)

DataItem(None, schema: BOOLEAN)

In [None]:
kd.item(None, dtype=kd.MASK)

DataItem(None, schema: MASK)

### Converting from Koda Primitives to Python Primitives



To convert from Koda primitive DataItem to Python primitive, there are two ways.

The first way is to use `kd_item.to_py()`. As `MASK` type is not natively supported in Python, we normally don't convert a mask to Python.

In [None]:
kd.item(1).to_py()

1

In [None]:
kd.item(2.0).to_py()

2.0

In [None]:
kd.item('string').to_py()

'string'

In [None]:
kd.item(b'bytes').to_py()

b'bytes'

In [None]:
kd.item(True).to_py()

True

As Python does not distinguish `INT32` vs `INT64` and `FLOAT32` vs `FLOAT64`, `INT32`/`INT64` and `FLOAT32`/`FLOAT64` are converted to `int` and `float` respectively.

In [None]:
kd.item(1, dtype=kd.INT64).to_py()

1

In [None]:
kd.item(1, dtype=kd.INT64).to_py() == kd.item(1).to_py()

True

The second way is to use Python native APIs. E.g. `int()`, `float()`, `str()`, `bool()`.

In [None]:
int(kd.item(1))

1

In [5]:
# raises ValueError: Only INT32/INT64 DataItem can be passed to built-in int
# int(kd.item(1.0))

In [None]:
float(kd.item(2.0))

2.0

In [None]:
# TODO(b/364041909)
str(kd.item('string'))

"'string'"

In [None]:
# str() supports all primitives
str(2.0)

'2.0'

It is **important** to note that `bool()` only supports `MASK` dtype but not `BOOL` dtype. See the MASK vs BOOLEAN section to learn more.

In [None]:
bool(kd.present)

True

In [None]:
bool(kd.missing)

False

In [None]:
# raises ValueError: Cannot cast a non-MASK DataItem to bool
# bool(kd.item(True))

### MASK vs BOOLEAN

Koda supports sparsity natively. The type representing
presence/sparsity is `MASK` which can only have two values: `present` and
`missing`. The reason `BOOLEAN` is not used for presence is the confusions
around
[three-valued boolean logic](https://en.wikipedia.org/wiki/Three-valued_logic). The problem is that three-valued boolean logic is not intuitive to everyone.
Consider the following code

In [None]:
# Should it be kd.item(True) or kd.item(None)?
# ~kd.item(None, dtype=kd.BOOLEAN)

# Should it be kd.item(None) because any operations with None should be None?
# Should it be kd.item(False) if we follow the three-valued boolean logic?
# kd.item(None, dtype=kd.BOOLEAN) & kd.item(False)

# Should it be kd.item(None) because any operations with None should be None?
# Should it be kd.item(True) if we follow the three-valued boolean logic?
# kd.item(None, dtype=kd.BOOLEAN) | kd.item(True)

Another example of the negation operation:

In [None]:
# 'a' is a DataSlice which will be introduced later.
# You can treat it as a "vector of primitives" for now.
a = kd.slice([1, None, 3])
a

DataSlice([1, None, 3], schema: INT32, shape: JaggedShape(3))

Let's say we want to select items which are less than or equal to 2. It is clear that we should only get `1`.

In [None]:
kd.select(a, a <= 2)

DataSlice([1], schema: INT32, shape: JaggedShape(1))

What about the following code? Should it be `[1, None]` or just `[1]`?

In [None]:
kd.select(a, ~(a > 2))

DataSlice([1, None], schema: INT32, shape: JaggedShape(2))

To avoid this issue, Kola uses `MASK` to represent presence which works exactly
the same as normal Python True/False. For example

In [None]:
kd.present | kd.missing # kd.present
kd.missing | kd.present # kd.present

kd.missing & kd.present # kd.missing
kd.missing & kd.present # kd.missing

~kd.missing # kd.present
~kd.present # kd.missing

DataItem(None, schema: MASK)

It is important to note that `MASK` is normally used to represent two-value booleans unless three-value booleans are needed explicitly.

When a `MASK` DataItem is used when a Python bool is expected, it is implicitly
converted to a Python bool. For example,

In [None]:
assert kd.present

print(repr(kd.item(1) == 1))
if kd.item(1) == 1:
  print('condition passes')

print(repr(kd.item(2) == 1))
if not kd.item(2) == 1:
  print('condition passes')

DataItem(present, schema: MASK)
condition passes
DataItem(None, schema: MASK)
condition passes


To avoid using `BOOLEAN` DataItems by mistake when `MASK` DataItems should be
used, implicit conversion from `BOOLEAN` DataItem to python bool is disallowed.
However, we can explicitly convert `BOOLEAN` DataItem to `MASK` DataItem by
comparing with `True`. For example,

In [None]:
# raises ValueError: Cannot cast a non-MASK DataItem to bool.
# if kd.item(True):
#   pass

In [None]:
# We can explicitly convert it to MASK by comparing it with True
if kd.item(True) == True:
  print('condition passes')

condition passes


To explicitly convert a `MASK` to a `BOOLEAN`, we can use `kd.cond()`. For
example,

In [None]:
# Treat kd.missing as False
kd.cond(kd.present, True, False)

DataItem(True, schema: BOOLEAN)

In [None]:
# Treat kd.missing as False
kd.cond(kd.missing, True, False)

DataItem(False, schema: BOOLEAN)

In [None]:
# Treat kd.missing as None
# DIFF
kd.cond(kd.present, True, False)

DataItem(True, schema: BOOLEAN)

## List

**List** is a special built-in type in Koda. Similar to Python `list`, a Koda list contains a group of **ordered** items.

### Creating Lists

Koda Lists can be created directly from Python lists using `kd.list(py_list)`. As Koda does not support `tuple` natively, Python tuples are treated as lists too.

In [None]:
# Creates an empty List
kd.list()

DataItem(List[], schema: LIST[OBJECT], bag_id: $a61a)

In [None]:
# Creates a List from a Python list
kd.list([1, 2, 3, 4])

DataItem(List[1, 2, 3, 4], schema: LIST[INT32], bag_id: $0e2b)

In [None]:
# Creates a List from a nested Python list
kd.list([[1, 2], [3], [4, 5]])

DataItem(List[List[1, 2], List[3], List[4, 5]], schema: LIST[LIST[INT32]], bag_id: $43e9)

In [None]:
# Creates a List from a tuple
kd.list((1, 2, 3, 4))

DataItem(List[1, 2, 3, 4], schema: LIST[INT32], bag_id: $9a59)

In [None]:
# Creates a List from a list of tuples
kd.list([(1, 2), (3,), (4, 5)])

DataItem(List[List[1, 2], List[3], List[4, 5]], schema: LIST[LIST[INT32]], bag_id: $f891)

In [None]:
kd.list([1, 2, 3])

DataItem(List[1, 2, 3], schema: LIST[INT32], bag_id: $07ad)

In [None]:
# TODO(b/364164215)
kd.from_py([1, 2, 3])

DataItem(080baec3ec4d8ef30000000000000006:0, schema: OBJECT, bag_id: $7d64)

In [None]:
kd.from_py({1: 2})

DataItem(Dict{1=2}, schema: OBJECT, bag_id: $ed53)

If the Python list contains dicts, we can also use `kd.from_py()`.

In [None]:
# TODO(b/323305977)
# kd.from_py([{1: 2}, {3: 4}])

The Python list can also contain Koda primitives, Lists, Dicts, or Objects.

In [None]:
kd.list([kd.item(1), kd.item(2)])
# which is equivalent to
kd.list([1, 2])

DataItem(List[1, 2], schema: LIST[INT32], bag_id: $6ea7)

In [None]:
kd.list([kd.list([1, 2]), kd.list([3, 4])])
# which is equivalent to
kd.list([[1, 2], [3, 4]])

DataItem(List[List[1, 2], List[3, 4]], schema: LIST[LIST[INT32]], bag_id: $0ec1)

In [None]:
# See the Dict section below to learn about Koda Dict
kd.list([kd.dict({1: 2}), kd.dict({3: 4})])

DataItem(List[Dict{1=2}, Dict{3=4}], schema: LIST[DICT{INT32, INT32}], bag_id: $c740)

In [None]:
# See the Object section below to learn about Koda Object
kd.list([kd.obj(a=1), kd.obj(a=2)])

DataItem(List[Obj(a=1), Obj(a=2)], schema: LIST[OBJECT], bag_id: $e4c2)

It is possible to get a List DataItem with missing item. While we cannot create it directly using `kd.list()`, we can **mask** any List with `kd.missing`. Note that the List DataItem with missing item keeps the List schema.

In [None]:
kd.list([1, 2, 3]) & kd.missing

DataItem(None, schema: LIST[INT32], bag_id: $3ee9)

### Indexing and Slicing a List

To index a Koda List, we use `[index]` syntax similar to indexing a Python list.

It is **important** to note that using an **out-of-bound** index returns a `None` item rather than raising an exception. It is because Koda is designed to support slicing on **vectorized** Lists where they have different sizes. We will cover vectorization in following tutorials in details.

In [None]:
l1 = kd.list([1, 2, 3])
l1[1]

DataItem(2, schema: INT32, bag_id: $002c)

In [None]:
# Use an out-of-bound index
l1[3]

DataItem(None, schema: INT32, bag_id: $002c)

In [None]:
# Get list size
kd.list_size(l1)

DataItem(3, schema: INT64)

To slice a Koda List, we use `[slice]` syntax similar to slicing a Python list.

It is **important** to note that slicing a list returns a **DataSlice** rather than a list, different from Python. A DataSlice is a slice of Koda items and used for vectorization of DataItems. A Koda List is like a reference of the list itself while a DataSlice is a vector of list items. We will cover DataSlice in following tutorials in details.

In [None]:
l1[:2]

DataSlice([1, 2], schema: INT32, shape: JaggedShape(2), bag_id: $002c)

In [None]:
# To get all items, use [:]
# Later, we will explain this is called list explosion
l1[:]

DataSlice([1, 2, 3], schema: INT32, shape: JaggedShape(3), bag_id: $002c)

In [None]:
# Slicing using an out-of-bound index does not raise exception
l1[:4]

DataSlice([1, 2, 3], schema: INT32, shape: JaggedShape(3), bag_id: $002c)

To convert a DataSlice back to a list, we can use `kd.list(slice)`.

In [None]:
kd.list(l1[:2])

DataItem(List[1, 2], schema: LIST[INT32], bag_id: $d12b)

### Modifying a List

To add items, we can use `append` to add one item or multiple items represented as a DataSlice. We will cover DataSlice in following tutorial later.

In [None]:
l2 = kd.list([1, 2, 3])
l2

DataItem(List[1, 2, 3], schema: LIST[INT32], bag_id: $1409)

In [None]:
# Append one item to the list
l2.append(4)
l2

DataItem(List[1, 2, 3, 4], schema: LIST[INT32], bag_id: $1409)

In [None]:
# Append multiple items
l2.append(kd.slice([5, 6, 7]))
l2

DataItem(List[1, 2, 3, 4, 5, 6, 7], schema: LIST[INT32], bag_id: $1409)

In [None]:
# Modify the first item
l2[0] = 8
l2

DataItem(List[8, 2, 3, 4, 5, 6, 7], schema: LIST[INT32], bag_id: $1409)

In [None]:
# Change the items from #3 to the end to [7, 8]
# DIFF
l2[4:] = kd.slice([7, 8])
l2

DataItem(List[8, 2, 3, 4, 7, 8], schema: LIST[INT32], bag_id: $1409)

To delete items, we can use `del` or assign corresponding slice to `None`. Or use `pop()` to remove the last item and `clear` to remove all items.

In [None]:
# Delete items
del l2[2]
l2

DataItem(List[8, 2, 4, 7, 8], schema: LIST[INT32], bag_id: $1409)

In [None]:
del l2[4:]
l2

DataItem(List[8, 2, 4, 7], schema: LIST[INT32], bag_id: $1409)

In [None]:
# Delete all items
del l2[:]
l2

DataItem(List[], schema: LIST[INT32], bag_id: $1409)

Setting item or slice of items to Python `None` is equivalent to setting the corresponding items to **missing** in the List.

In [None]:
l3 = kd.list([1, 2, 3, 4])
l3[0] = None
l3

DataItem(List[None, 2, 3, 4], schema: LIST[INT32], bag_id: $01b5)

In [None]:
# TODO: l3[:2] = None
l3[:2] = kd.slice([None, None])
l3

DataItem(List[None, None, 3, 4], schema: LIST[INT32], bag_id: $01b5)

In [None]:
# TODO(b/364167311)
# Setting all items to None
# l3[:] = None

Let's create a list of four items where the third item is missing. Note that the missing item is rendered as `None` in the `repr` format.

In [None]:
l4 = kd.list([1, 2, None, 4])
l4

DataItem(List[1, 2, None, 4], schema: LIST[INT32], bag_id: $46bd)

Now, what if we want to delete the first item by setting it to `None`?

In [None]:
l4[2] = None
l4

DataItem(List[1, 2, None, 4], schema: LIST[INT32], bag_id: $46bd)

To correctly understand this behavior, it is important to distinguish the Python `None` vs Koda `missing` rendered as `None`.

When converting from a Python list, a Python `None` is converted to a Koda `missing`. To delete an item, we set it to Python `None`. To set an item to `missing`, we need to explicitly assign it to a missing item with the right dtype.

In [None]:
l4[1] = kd.item(None, dtype=kd.INT32)
l4

DataItem(List[1, None, None, 4], schema: LIST[INT32], bag_id: $46bd)

In [None]:
# Or
l4[2] = (l4[2] & kd.missing)
l4

DataItem(List[1, None, None, 4], schema: LIST[INT32], bag_id: $46bd)

### Python-like APIs

Koda List also supports APIs similar to Python list.

It is **important** to note that these APIs only works for Koda List DataItem but not for List DataSlice. We will cover DataSlice in following tutorial in details.

In [None]:
l4 = kd.list([1, 2, 3])
len(l4)

3

In [None]:
l4

DataItem(List[1, 2, 3], schema: LIST[INT32], bag_id: $8d49)

In [None]:
2 in l4

True

In [None]:
4 in l4

False

In [None]:
for i in l4:
  print(i)

1
2
3


In [None]:
# Remove the last items
l4.pop()
l4

DataItem(List[1, 2], schema: LIST[INT32], bag_id: $8d49)

In [None]:
# Delete all items
l4.clear()
l4

DataItem(List[], schema: LIST[INT32], bag_id: $8d49)

### Converting to Python list

To convert a Koda List to a Python list, we can use `to_py()`.

In [None]:
l = kd.list([[1, 2], [3], [4, 5]])

In [None]:
l.to_py()

DataItem(List[List[1, 2], List[3], List[4, 5]], schema: LIST[LIST[INT32]], bag_id: $9650)

## Dict

**Dict** is a special built-in type in Koda. Similar to Python `dict`, a Koda Dict contains a set of key/value pairs and the cost of key lookup is `O(1)`.

### Creating Dicts

Koda Dicts can be created directly from Python dicts using `kd.dict(py_dict)`.

In [None]:
# Creates an empty Dict
kd.dict()

DataItem(Dict{}, schema: DICT{OBJECT, OBJECT}, bag_id: $b84f)

In [None]:
# Creates a Dict from a Python dict
kd.dict({'a': 1, 'b': 2})

DataItem(Dict{'b'=2, 'a'=1}, schema: DICT{TEXT, INT32}, bag_id: $17f8)

In [None]:
# Dict values are automatically wrapped to Kola List/Dict
kd.dict({1: [2, 3], 4: [5, 6]})

DataItem(Dict{1=List[2, 3], 4=List[5, 6]}, schema: DICT{INT32, LIST[INT32]}, bag_id: $662b)

In [None]:
# Dict values are automatically wrapped to Kola List/Dict
kd.dict({1: {2: 3}, 4: {5: 6}})

DataItem(Dict{1=Dict{2=3}, 4=Dict{5=6}}, schema: DICT{INT32, DICT{INT32, INT32}}, bag_id: $f7eb)

The Python dict can also contain Koda primitives, Lists, Dicts, or Objects.

In [None]:
kd.dict({kd.item(1): kd.item(2)})
# which is equivalent to
kd.dict({1: 2})

DataItem(Dict{1=2}, schema: DICT{INT32, INT32}, bag_id: $19ea)

In [None]:
# The key is a Koda List.
# Actually, it is an ItemId representing a Koda List. We will cover ItemId later.
kd.dict({kd.list([1, 2]): kd.list([3, 4])})

DataItem(Dict{List[1, 2]=List[3, 4]}, schema: DICT{LIST[INT32], LIST[INT32]}, bag_id: $cebc)

In [None]:
# The key is a Koda Dict.
kd.dict({kd.dict({1: 2}): kd.dict({3: 4})})

DataItem(Dict{Dict{1=2}=Dict{3=4}}, schema: DICT{DICT{INT32, INT32}, DICT{INT32, INT32}}, bag_id: $baaf)

In [None]:
# The key is a Koda Object.
kd.dict({kd.obj(a=1): kd.obj(b=2)})

DataItem(Dict{Obj(a=1)=Obj(b=2)}, schema: DICT{OBJECT, OBJECT}, bag_id: $d0b2)

It is possible to get a Dict DataItem with missing item. While we cannot create it directly using kd.dict(), we can mask any Dict with kd.missing. Note that the Dict DataItem with missing item keeps the Dict schema.

In [None]:
kd.dict({'a': 1, 'b': 2}) & kd.missing

DataItem(None, schema: DICT{TEXT, INT32}, bag_id: $fd8d)

### Looking up in a Koda Dict

To look up in a Koda Dict, we use `[key]` syntax similar to look up in a Python dict.

It is **important** to note that using a key which is not in the Dict returns a `None` item rather than raising an exception. It is because Koda is designed to support look-up on **vectorized** Dicts where they may or may not contain the key. We will cover vectorization in following tutorials in details.

In [None]:
d1 = kd.dict({'a': 1, 'b': 2})
d1['a']

DataItem(1, schema: INT32, bag_id: $940c)

In [None]:
# Look up by a key which is not in the Dict
d1['c']

DataItem(None, schema: INT32, bag_id: $940c)

In [None]:
# Get dict size
kd.dict_size(d1)

DataItem(2, schema: INT64)

In [None]:
# Get all keys
d1.get_keys()

DataSlice(['b', 'a'], schema: TEXT, shape: JaggedShape(2), bag_id: $940c)

In [None]:
# Get all values
keys = d1.get_keys()
d1[keys]

DataSlice([2, 1], schema: INT32, shape: JaggedShape(2), bag_id: $940c)

### Modifying a Dict

To add a new key/value pair or modify an existing key/value pair, we can just do `dict[key] = value`. Or remove a key/value pair, we can do `del dict[key]` or `dict[key] = None`.

In [None]:
d2 = kd.dict({'a': 1, 'b': 2})
d2

DataItem(Dict{'a'=1, 'b'=2}, schema: DICT{TEXT, INT32}, bag_id: $73cf)

In [None]:
d2['a'] = 3
d2

DataItem(Dict{'a'=3, 'b'=2}, schema: DICT{TEXT, INT32}, bag_id: $73cf)

In [None]:
# raises KodaError: the schema for Dict value is incompatible.
# d2['b'] = 4.0

In [None]:
d2['c'] = 3
d2

DataItem(Dict{'a'=3, 'b'=2, 'c'=3}, schema: DICT{TEXT, INT32}, bag_id: $73cf)

In [None]:
del d2['a']
d2

DataItem(Dict{'b'=2, 'c'=3}, schema: DICT{TEXT, INT32}, bag_id: $73cf)

Koda Dict does not distinguish missing value and missing key: setting the value to a missing value or Python `None` deleting the key/value pair from the Dict.

In [None]:
d2 = kd.dict({'a': 1, 'b': 2})

d2['b'] = None
d2

DataItem(Dict{'a'=1}, schema: DICT{TEXT, INT32}, bag_id: $d5f4)

In [None]:
d2['a'] = kd.item(None, dtype=kd.INT32)
d2

DataItem(Dict{}, schema: DICT{TEXT, INT32}, bag_id: $d5f4)

### Python-like APIs

Koda Dict also supports APIs similar to Python dict.

It is **important** to note that these APIs only works for Koda Dict DataItem but not for Dict DataSlice. We will cover DataSlice in following tutorial in details.

In [None]:
d3 = kd.dict({'a': 1, 'b': 2})
len(d3)

2

In [None]:
d3

DataItem(Dict{'a'=1, 'b'=2}, schema: DICT{TEXT, INT32}, bag_id: $0a19)

In [None]:
'a' in d3

True

In [None]:
'c' in d3

False

In [None]:
for k in d3:
  if k == 'a':
    print(d3[k])

1


In [None]:
# Remove key/value pair item by key
d3.pop('a')
d3

DataItem(Dict{'b'=2}, schema: DICT{TEXT, INT32}, bag_id: $0a19)

In [None]:
# Remove all key/value pairs
d3.clear()
d3

DataItem(Dict{}, schema: DICT{TEXT, INT32}, bag_id: $0a19)

### Converting from Python Dict

To convert a Koda Dict to a Python dict, we can use `to_py()`.

In [None]:
d = kd.dict({1: [2, 3], 4: [5, 6]})
d

DataItem(Dict{1=List[2, 3], 4=List[5, 6]}, schema: DICT{INT32, LIST[INT32]}, bag_id: $5353)

In [None]:
d.to_py()

DataItem(Dict{1=List[2, 3], 4=List[5, 6]}, schema: DICT{INT32, LIST[INT32]}, bag_id: $5353)

## Entity

A Koda Entity consists of a set of attribute-values. It is similar to a class without member methods (e.g Python dataclass).

### Creating Entities

Koda Entities can be created from Python keyword arguments using `kd.new(**kwargs)`.

In [None]:
# Creates an empty Entity
kd.new()

DataItem(Entity():$000baec3ec4d8ef30000000000000038:0, schema: SCHEMA(), bag_id: $c16c)

In [None]:
# Creates a Entity
kd.new(a=1, b=2)

DataItem(Entity(a=1, b=2), schema: SCHEMA(a=INT32, b=INT32), bag_id: $7b91)

In [None]:
# 'a' is a List and 'b' is a Dict
kd.new(a=[1, 2], b={3: 4})

DataItem(Entity(a=List[1, 2], b=Dict{3=4}), schema: SCHEMA(a=LIST[INT32], b=DICT{INT32, INT32}), bag_id: $d561)

It is possible to get a Entity DataItem with a missing item. While we cannot create it directly using `kd.new()`, we can mask any Entity with `kd.missing`. Note that the schema does not change.

In [None]:
kd.new(a=1, b=2) & kd.missing

DataItem(None, schema: SCHEMA(a=INT32, b=INT32), bag_id: $9a1a)

### `kd.new` as the Universal Convertor

Conceptually, Koda Lists and Dicts can be considerred as Entities as they can be thought as a set of attributes/values too. Therefore, `kd.new` can be used as the universal convertor to convert a Python object (including list/dict) to a corresponding Koda Dict/List/Entity.

In [None]:
kd.new([1, 2, 3])

DataItem(List[1, 2, 3], schema: LIST[INT32], bag_id: $3974)

In [None]:
kd.new({1: 2})

DataItem(Dict{1=2}, schema: DICT{INT32, INT32}, bag_id: $e525)

In [None]:
kd.new(a=1, b=2)

DataItem(Entity(a=1, b=2), schema: SCHEMA(a=INT32, b=INT32), bag_id: $8557)

### Accessing Entity Attributes

To access an entity attribute, we use `.attr_name` syntax similar to accessing a Python object attribute.

It is **important** to note that using an attribute which is not in the Entity does raise an exception.

In [None]:
o1 = kd.new(a=1, b=2)
o1.a

DataItem(1, schema: INT32, bag_id: $566d)

In [None]:
# raises ValueError: the attribute 'c' is missing;
# o1.c

In [None]:
dir(o1)

['a', 'b']

### Modifying an Entity

To modify an existing attribute, we can just do `entity.attr = value`. Modifying the attribute to a different type raises an exception unless the schema is updated at the same time (i.e. set `update_schema=True`).

Adding a new attribute raises an exception unless the schema is updated at the same time

To delete an attribute, we can do `del entity.attr` or `entity.attr = None`.

In [None]:
o2 = kd.new(a=1, b=2)
o2

DataItem(Entity(a=1, b=2), schema: SCHEMA(a=INT32, b=INT32), bag_id: $36dd)

In [None]:
# raises ValueError: the attribute 'c' is missing on the schema.;
# o2.c = 3

In [None]:
o2.set_attrs(c=3, update_schema=True)

In [None]:
# raises KodaError: the schema for attribute 'a' is incompatible.
# o2.a = 2.0

In [None]:
o2.set_attrs(a=2.0, update_schema=True)

Note that deleting an attribute does not delete the corresponding schema. Thus, the `repr` shows the attribute as `None`. E.g. `a=None` in the following example.

In [None]:
del o2.a
o2

DataItem(Entity(a=None, b=2, c=3), schema: SCHEMA(a=FLOAT32, b=INT32, c=INT32), bag_id: $36dd)

In [None]:
o2.c = None
o2

DataItem(Entity(a=None, b=2, c=None), schema: SCHEMA(a=FLOAT32, b=INT32, c=INT32), bag_id: $36dd)

### Converting to Python Object

As Python does not support Entity, converting a Koda Entity creates the same Python object as converted from a Koda Object.

In [None]:
o3 = kd.new(a=1, b='2')
py_obj = o3.to_py()
py_obj

DataItem(Entity(a=1, b='2'), schema: SCHEMA(a=INT32, b=TEXT), bag_id: $4f70)

In [None]:
# internal_as_py returns an ItemId
o3.internal_as_py()

DataItem(Entity(a=1, b='2'), schema: SCHEMA(a=INT32, b=TEXT), bag_id: $4f70)

In [None]:
# TODO(b/330119958)
# o3.to_py(obj_as_dict=True)

## Object

There is a special type of Entity called **Koda Object**. The main difference between Objects and Entities is Objects store their own schemas directly as the `__schema__` attribute similar to Python objects store their classes as the `__class__` attribute while Entities' schema is stored at the DataItem/DataSlice level similar to C++ objects' class is stored at `vector` level (e.g. `vector<MyClass>`). We will cover it in detail in the [Koda Schema tutorial](go/koda-tutorial-schema).

### Creating Objects

Koda Objects can be created from Python keyword arguments using `kd.obj(**kwargs)`.

In [None]:
# Creates an empty Object
kd.obj()

DataItem(Obj():$000baec3ec4d8ef3000000000000004b:0, schema: OBJECT, bag_id: $c88f)

In [None]:
# Creates an Object
kd.obj(a=1, b=2)

DataItem(Obj(a=1, b=2), schema: OBJECT, bag_id: $a253)

In [None]:
# Creates an Object.
# 'a' is a List and 'b' is a Dict
kd.obj(a=[1, 2], b={3: 4})

DataItem(Obj(a=List[1, 2], b=Dict{3=4}), schema: OBJECT, bag_id: $cbe8)

The Python keyword arguments can also contain Koda primitives, Lists, Dicts, or Objects.

In [None]:
kd.obj(a=kd.item(1), b=kd.item(2.0))
# which is equivalent to
kd.obj(a=1, b=2.0)

DataItem(Obj(a=1, b=2.0), schema: OBJECT, bag_id: $552e)

In [None]:
kd.obj(a=kd.list([1, 2]), b=kd.dict({3: 4}))
# which is equivalent to
kd.obj(a=[1, 2], b={3: 4})

DataItem(Obj(a=List[1, 2], b=Dict{3=4}), schema: OBJECT, bag_id: $e431)

In [None]:
kd.obj(a=kd.obj(x=1), b=kd.obj(x=2))

DataItem(Obj(a=Obj(x=1), b=Obj(x=2)), schema: OBJECT, bag_id: $6df4)

It is possible to get a Object DataItem with missing item. While we cannot create it directly using kd.obj(), we can mask any Object with kd.missing. Note that the Object DataItem with missing item keeps the Object schema.

In [None]:
kd.obj(a=1, b=2) & kd.missing

DataItem(None, schema: OBJECT, bag_id: $7759)

### Accessing Object Attributes

To access an object attribute, we use `.attr_name` syntax similar to accessing a Python object attribute.

It is **important** to note that using an attribute which is not in the Object does raise an exception.

In [None]:
o1 = kd.obj(a=1, b=2)
o1.a

DataItem(1, schema: INT32, bag_id: $80fd)

In [None]:
# raises ValueError: the attribute 'c' is missing;
# o1.c

In [None]:
# To find all attributes
dir(o1)

['a', 'b']

### Modifying an Object

To add a new attribute or modify an existing attribute, we can just do `obj.attr = value`. If `value` has a different schema than the existing value, it changes the underlying schema automatically rather than raising an exception. This is the main difference between Entity and Object. To delete an attribute, we can do `del obj.attr` or `obj.attr = None`.

In [None]:
o2 = kd.obj(a=1, b=2)
o2

DataItem(Obj(a=1, b=2), schema: OBJECT, bag_id: $05e4)

In [None]:
o2.c = 3
o2

DataItem(Obj(a=1, b=2, c=3), schema: OBJECT, bag_id: $05e4)

In [None]:
# Modify to a different dtype
o2.c = 4.0
o2

DataItem(Obj(a=1, b=2, c=4.0), schema: OBJECT, bag_id: $05e4)

In [None]:
# Delete an attribite
del o2.a
o2

DataItem(Obj(b=2, c=4.0), schema: OBJECT, bag_id: $05e4)

In [None]:
# Set attribute to None

o2 = kd.obj(a=1, b=2)

o2.b = None
o2

DataItem(Obj(a=1, b=None), schema: OBJECT, bag_id: $5d7e)

### Converting to Python Object

To convert a Koda Object to a Python object, we can use `to_py()`.

By default, it implicitly creates a new Python `dataclass` and an object of such class. When `obj_as_dict` is set to True, Python dict is used to represent object attribute/value pairs.

In [None]:
o3 = kd.obj(a=1, b='2')
py_obj = o3.to_py()
py_obj

DataItem(Obj(a=1, b='2'), schema: OBJECT, bag_id: $194b)

In [None]:
import dataclasses

dataclasses.is_dataclass(py_obj)

False

In [None]:
# TODO(b/330119958)
# o3.to_py(obj_as_dict=True)

# Demo: Game of Life

In this section, we demonstrate how to implement [Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) using both pure Python and Koda library. We also compare performance of pure Python and Koda.

The universe of the Game of Life is an infinite, two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, live or dead (or populated and unpopulated, respectively). Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

-  Any live cell with fewer than two live neighbours dies, as if by underpopulation.
-  Any live cell with two or three live neighbours lives on to the next generation.
-  Any live cell with more than three live neighbours dies, as if by overpopulation.
-  Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

In [None]:
from IPython.display import clear_output
import time

## Python Version

In [None]:
def CreateState(M, N, init_alives):
  """Creates the initial state.

  Args:
    M: number of rows
    N: number of columns
    init_alives: list of (x, y) coordinate tuple

  Returns:
    initial state
  """
  state = [['.'] * N for i in range(M)]
  for (i, j) in init_alives:
    state[i][j] = '$'
  return state

def NextState(state):
  """Finds out alive cells for the next iteration and update the state.

  Args:
    state: current state

  Returns:
    next state
  """
  M = len(state)
  N = len(state[0])
  new_state = [['.'] * N for i in range(M)]

  # Loop through all cells in the grid
  for i in range(M):
    for j in range(N):
      # Calculate the count of alive neighbors
      alive_count = 0
      for x in range(i-1, i+2):
        for y in range(j-1, j+2):
          if x>=0 and x<M and y>=0 and y<N and (x!=i or y!=j) and state[x][y]=='$':
            alive_count += 1
      # Determine the alive state of current cell
      if alive_count==3 or (alive_count==2 and state[i][j]=='$'):
        new_state[i][j] = '$'
  return new_state

def PrintState(state):
  "Prints out the current state as grid."
  for i in range(len(state)):
    print(''.join(state[i]))

def PrintStateLoop(state, n, wait):
  """Prints out the state for each iteration.

  Args:
    state: initial state
    n: number of iterations
    wait: time to wait between iterations
  """
  PrintState(state)
  for i in range(n):
    clear_output(wait=True)
    state = NextState(state)
    PrintState(state)
    time.sleep(wait)

Let's initialize the state and play the game with 20x20 grid and 50 steps.

In [None]:
state = CreateState(20, 20, [(6, 1), (6, 2), (6, 3), (8, 2), (5, 3)])
PrintStateLoop(state, 50, 0.1)

..$$................
.$..$...............
.$$$................
.$$.........$$..$$..
............$$......
....$$$....$..$$$...
...$...$.......$....
...$...$.......$....
...$...$............
....$$$.............
....................
....................
....................
....................
....................
....................
....................
....................
....................
....................


## Koda Version using Python-like APIs

In [None]:
def CreateState(M, N, init_alives):
  """Creates the initial state.

  Args:
    M: number of rows
    N: number of columns
    init_alives: list of (x, y) coordinate tuple

  Returns:
    initial state
  """
  state = kd.list([['.'] * N for i in range(M)])   # Modified to use kd.list
  for (i, j) in init_alives:
    state[i][j] = '$'
  return state

def NextState(state):
  """Finds out alive cells for the next iteration and update the state.

  Args:
    state: current state

  Returns:
    next state
  """
  M = len(state)
  N = len(state[0])
  new_state = kd.list([['.'] * N for i in range(M)])   # Modified to use kd.list

  # Loop through all cells in the grid
  for i in range(M):
    for j in range(N):
      # Calculate the count of alive neighbors
      alive_count = 0
      for x in range(i-1, i+2):
        for y in range(j-1, j+2):
          if x>=0 and x<M and y>=0 and y<N and (x!=i or y!=j) and state[x][y]=='$':
            alive_count += 1
      # Determine the alive state of current cell
      if alive_count==3 or (alive_count==2 and state[i][j]=='$'):
        new_state[i][j] = '$'
  return new_state

def PrintState(state):
  "Prints out the current state as grid."
  for i in range(len(state)):
    print(''.join(state[i][:].to_py()))  # Modified to use kd.list

def PrintStateLoop(state, n, wait):
  """Prints out the state for each iteration.

  Args:
    state: initial state
    n: number of iterations
    wait: time to wait between iterations
  """
  PrintState(state)
  for i in range(n):
    clear_output(wait=True)
    state = NextState(state)
    PrintState(state)
    time.sleep(wait)

As we can see, the Koda version is almost exactly the same as the Python version except three lines marked with `# Modified to xxx`.

Let's run it with the same setup.

In [None]:
state = CreateState(20, 20, [(6, 1), (6, 2), (6, 3), (8, 2), (5, 3)])
PrintStateLoop(state, 50, 0.1)

..$$................
.$..$...............
.$$$................
.$$.........$$..$$..
............$$......
....$$$....$..$$$...
...$...$.......$....
...$...$.......$....
...$...$............
....$$$.............
....................
....................
....................
....................
....................
....................
....................
....................
....................
....................


As we see, the Koda version runs extremely slow. This is because it does a lot of conversion between native Python data structure and Koda data structure.

## Native Koda Implementation

The code below shows how to implement it using Koda native APIs. That is, there is no back-and-forward conversion between Python and Koda and we leverage the vectorized computation of Koda.

As it uses APIs and concepts we haven't covered yet, you may ignore the actual code for now.

In [None]:
def CreateState(M, N, init_alives):
  """Creates the initial state.

  Args:
    M: number of rows
    N: number of columns
    init_alives: list of (x, y) coordinate tuple

  Returns:
    initial state as a DataSlice
  """
  state = kd.obj()
  # Creates a grid with M rows and M*N cells.
  # Each cell has a `alive` attribute with MASK dtype initialized to kd.missing
  state.rows = kd.list(kd.new(cells=kd.list(kd.new(alive=kd.slice(kd.missing).repeat(M).repeat(N)))))
  # Set initial alive cells.
  for (i, j) in init_alives:
    state.rows[i].cells[j].alive = kd.present
  return state

# Add neighbor cells for all cells.
def AddNeighbors(state):
  """Adds neighbor cells for all cells.

  Args:
    state: initial state as a DataSlice

  Returns:
    state with neighbor cells added
  """
  # Set row 'x' and column 'y' indices.
  row = state.rows[:]  # One-dimension DataSlice containing row ItemIds
  row.set_attr('x', kd.index(row), update_schema=True) # One-dimension DataSlice containing row indices
  row_cell = row.cells[:] # Two-dimension DataSlice containing cell ItemIds
  row_cell.set_attr('y', kd.index(row_cell), update_schema=True) # Two-dimension DataSlice containing cell indices

  cell = row_cell
  cell.set_attr('neighbors', kd.list_like(cell), update_schema=True)
  # Loop through neighbors.
  for i in range(-1, 2):
    for j in range(-1, 2):
      # Exclude the current cells themselves
      if i != 0 or j != 0:
        # nx is an one-dimension DataSlice containing row indices + i
        # ny is a two-dimension DataSlice containing cell indices + j
        nx, ny = row.x + i, cell.y + j
        # ncell is a two-dimension DataSlice containing cell ItemIds at (curr_x + i, curr_y + j)
        # Note that we need to exclude out of bound nx < 0 and ny < 0.
        # We don't need nx < M and ny < N because when list explision operator [] returns None for out of bound indices
        # However, list explision operator [] works with negative indices (e.g. -1, -2) and we need to exclude nx/ny < 0
        ncell = state.rows[nx & (nx >= 0)].cells[ny & (ny >= 0)]
        # Add neighbors.
        cell.neighbors.append(ncell)

  # In the end, cell.neighbors[:] is a three-dimension DataSlice containing ItemIds for all cells' neighbor cells

def NextState(cells, neighbors):
  "Finds out alive cells for the next iteration and update the state."
  # Count how many neighbors are alive
  # Note that alive has MASK type. agg_count returns the number of items with kd.present
  neighbor_count = kd.agg_count(neighbors.alive)
  # Update alive
  cells.alive = (neighbor_count == 3) | ((neighbor_count == 2) & cells.alive)

def PrintState(state):
  "Prints out the current state as grid."
  # Set '$' or '.' depending on the 'alive' value and join them together
  alive_tokens = kd.cond(state.rows[:].cells[:].alive, '$', '.')
  print(kd.strings.agg_join(kd.strings.agg_join(alive_tokens), '\n').to_py())

def PrintStateLoop(state, n, wait):
  """Prints out the state for each iteration.

  Args:
    state: initial state
    n: number of iterations
    wait: time to wait between iterations
  """
  AddNeighbors(state)
  cells = state.rows[:].cells[:].flatten()
  # Create the neighbor dataslice only once
  neighbors = cells.neighbors[:]

  for i in range(n):
    clear_output(wait=True)
    NextState(cells, neighbors)
    PrintState(state)
    time.sleep(wait)

state = CreateState(20, 20, [(6, 1), (6, 2), (6, 3), (8, 2), (5, 3)])
PrintStateLoop(state, 50, 0.1)

..$$................
.$..$...............
.$$$................
.$$.........$$..$$..
............$$......
....$$$....$..$$$...
...$...$.......$....
...$...$.......$....
...$...$............
....$$$.............
....................
....................
....................
....................
....................
....................
....................
....................
....................
....................


This native version runs at a speed similar to native Python. However, as we increase the size from 20x20 grid to 100x100 grid, the Koda version can be 10x faster than native Python as it can benefit more from the vectorizied computation.