# Array type specifications

Instead of specifying a schema and building data to match it, type specifications are inferred from the structure of nested awkward arrays. These types are presented to the user as `awkward.type.Type` objects, which may be thought of as a generalization of Numpy's `shape`, `dtype`, and `masked` parameters.

Not all awkward arrays make a difference in type: a `ChunkedArray` of `X`, for instance, simulates a plain array of `X`. There are five structures that should be distinguishable to a high-level user:

   * **jaggedness:** some arrays contain arbitrary length subarrays
   * **tables:** some arrays are indexed by an enumerated set of strings, rather than integers ("product types")
   * **union:** some arrays represent tagged unions ("sum types")
   * **optional:** some arrays are masked, representing unions with the N/A singleton
   * **self-references:** some subarray components refer to cousins or ancestors on the tree of nested arrays

In [1]:
import numpy
from awkward.type import *

We can get the `awkward.type.Type` of any Numpy or awkward array with `from_array` (calls an awkward array's `.type` property).

In [2]:
from_array(numpy.arange(15))

ArrayType(15, dtype('int64'))

The `__repr__` string provides a constructor you could use to make the type yourself. However, it's not the easiest way to read complex types. Instead, use the `__str__` string.

In [3]:
print(from_array(numpy.arange(15)))

[0, 15) -> int64


Numpy arrays can be represented by the simplest `Type`: a linear composition of functions from finite-domain integers to either a `numpy.dtype` or another such function.

The function we're referring to is the square brackets (`__getitem__`). The Numpy array below has shape `(3, 5)` and dtype `float64`, so

   * you can pass an integer from 0 (inclusive) to 3 (exclusive) to get...
   * something that you can pass an integer from 0 (inclusive) to 5 (exclusive) to get...
   * a float64.

Although you might normally pass several indexes to a Numpy array at a time (separated by columns, as an argument list), you _can_ pass one integer at a time to get a new object with one less dimension. Each of these objects may be thought of as a function of one argument, returning a function or a `numpy.dtype`. This is known as [currying](https://en.wikipedia.org/wiki/Currying).

In [4]:
print(from_array(numpy.arange(15).reshape(3, 5).view(float)))

[0, 3) -> [0, 5) -> float64


Jagged arrays go beyond Numpy arrays in that the size of subarrays is not fixed. Some subarrays may be empty, some may have size 1, some may have size 2, and some may have size 1 million. We represent that as a type with an infinite integer domain.

Below is a jagged array of 10 subarrays, where the subarrays may have any sizes. (The type specification encodes the fixed size 10 but uses a single placeholder for all the unspecified sizes. These arrays can be very large!)

In [5]:
print(ArrayType(10, numpy.inf, float))

[0, 10) -> [0, inf) -> float64


Below is a jagged array whose contents are a Numpy array of fixed shape `(3, 5)`.

In [6]:
print(ArrayType(10, numpy.inf, 3, 5, float))

[0, 10) -> [0, inf) -> [0, 3) -> [0, 5) -> float64


Below is a jagged array of jagged arrays of jagged arrays.

In [7]:
print(ArrayType(10, numpy.inf, numpy.inf, numpy.inf, float))

[0, 10) -> [0, inf) -> [0, inf) -> [0, inf) -> float64


Below is a jagged array whose index (`starts` and `stops`) has shape `(3, 5)`.

In [8]:
print(ArrayType(3, 5, numpy.inf, float))

[0, 3) -> [0, 5) -> [0, inf) -> float64


Numpy has structured arrays, which can be indexed by enumerated strings. This becomes a `Table` in awkward-array, which has an `awkward.type.TableType`.

In [9]:
print(from_array(numpy.array([(0, 0.0), (1, 1.1), (2, 2.2), (3, 3.3), (4, 4.4), (5, 5.5), (6, 6.6), (7, 7.7), (8, 8.8), (9, 9.9)], dtype=[("one", int), ("two", float)])))

[0, 10) -> 'one' -> int64
           'two' -> float64


We can construct the same thing by hand using the `&` operator (or the `awkward.type.TableType` constructor directly).

In [11]:
print(ArrayType(10, ArrayType("one", int) & ArrayType("two", float)))

[0, 10) -> 'one' -> int64
           'two' -> float64


Unlike Numpy arrays, 

In [12]:
print(ArrayType(10, ArrayType("one", numpy.inf, int) & ArrayType("two", float)))

[0, 10) -> 'one' -> [0, inf) -> int64
           'two' -> float64


In [13]:
tree = ArrayType("node_value", int)
tree["children"] = ArrayType(numpy.inf, tree)
print(tree)

T0 := 'node_value' -> int64
      'children'   -> [0, inf) -> T0


In [14]:
tree = ArrayType("node_value", int)
tree["left"] = OptionType(tree)
tree["right"] = OptionType(tree)
print(tree)

T0 := 'node_value' -> int64
      'left'       -> ?(T0)
      'right'      -> ?(T0)


In [15]:
print(ArrayType(3, int) | ArrayType(5, float))

([0, 3) -> int64   |
 [0, 5) -> float64 )


In [16]:
print(ArrayType("one", int) & ArrayType("two", float) | ArrayType("uno", bool) & ArrayType("dos", int) & ArrayType("tres", float))

('one' -> int64
 'two' -> float64  |
 'uno'  -> bool
 'dos'  -> int64
 'tres' -> float64 )


In [17]:
print(ArrayType(10, ArrayType("one", int) & ArrayType("two", float)) | ArrayType("uno", bool) & ArrayType("dos", int) & ArrayType("tres", float) | ArrayType(5, 3, numpy.inf, float))

([0, 10) -> 'one' -> int64
            'two' -> float64            |
 'uno'  -> bool
 'dos'  -> int64
 'tres' -> float64                      |
 [0, 5) -> [0, 3) -> [0, inf) -> float64 )
