### 7.1 Types

#### 7.1.1 Basic DataTypes

A type can be specified in three equivalent ways: a char, a short and a symbol.
https://code.kx.com/q4m3/7_Transforming_Data/

boolean	`boolean	B	1h
int	`int	I	6h
long	`long	J	7h
real	`real	E	8h
float	`float	F	9h
char	`char	C	10h
symbol	`	S	11h


#### 7.1.2 The type Operator

The non-atomic monadic function (type) can be applied to any entity in q to return its data type expressed as a short. It is a "feature" of q that the data type of an atom is negative whereas the type of a simple list is positive.

In [1]:
type 2
type 2 3
type `a
type `a`n`c
type `anc
type "z"
type "zxc"

-7h


7h


-11h


11h


-11h


-10h


10h


infinities and nulls have their respective types.

In [2]:
type 0w
type 0W
type -0w
type -0W
type 0N
type `

-9h


-7h


-9h


-7h


-7h


-11h


The type of any general list is 0

In [3]:
type 1 2 3
type (1; 2; 3; `a)

7h


0h


The type of any dictionary, including a keyed table, is 99h

In [4]:
type (`a`b`c!10 20 30)
type ([k:`a`b`c] v:10 20 30)

99h


99h


The type of any table is 98h

In [5]:
type ([] c1:`a`b`c; c2:10 20 30)

98h


#### 7.1.3 Type of a Variable

The type of a variable is the type of the value associated with the variable's name.

In [6]:
a:42
type a
a:`afff
type a

-7h


-11h


**Global variables are stored in ordinary q dictionaries with special names. For example, the symbol `. is the name of the default global dictionary. Start a fresh q session and observe the life of this dictionary using value.**

In [7]:
value `.

help | {[gh;h;x]if[10=type u:gh[h]x;-2 u]}[{[h;x]$[i.isf x;h x;i.isw x;h x`.;..
print| {x y;}[{[f;x]embedPy[f;x]}[foreign]enlist]
a    | `afff


### 7.2 Cast

Since q is dynamically typed, casting occurs at run-time using the dyadic operator $, which is atomic in both operands. The right operand is the source value and the left operand specifies the target type. There are three ways to specify the target type, indicated by the first three columns of the type table at the beginning of this chapter.

- A (positive) numeric short type value
- A char type value
- A type name symbol


#### 7.2.1 Casts that Widen

In these examples, no information is lost in the cast, as the target type is wider than the source type. 

Here are examples **using the short type** specification in the target.

In [8]:
7h$42i / int to long
6h$42 / long to int
9h$42 / long to float

42


42i


42f


It is arguably more readable to **use the type char.** 

In [9]:
"j"$42i
"i"$42
"f"$42

42


42i


42f


most readable to use the symbolic type name.

In [10]:
`int$42
`long$42i
`float$42

42i


42


42f


#### 7.2.2 Casts across Disparate Types

- **char <-> integers**


The underlying value of a char is its position in the ASCII collation sequence, so **we can cast char to and from integers, provided the integer is less than 256.**

In [11]:
`char$42
`int$"*"
`long$"\n"

"*"


42i


10


- date <-> int

The underlying value of a date is its count of days from the millennium, so we can cast to and from an int

In [12]:
`date$0

`int$2001.01.01

2000.01.01


366i


- timestamp <-> long

The underlying value of a timespan is its count of nanoseconds from midnight, so we can cast it to and from long.

In [13]:
`long$12:00:00.0000000000
`timespan$0

43200000000000


0D00:00:00.000000000


#### 7.2.3 Casts that Narrow

Some casts lose information. This includes the usual suspects of float to integer and wider integers to narrower ones.

In [14]:
`long$12.345
`short$123456789

12


32767h


In [15]:
`boolean$123
`boolean$0.00

1b


0b


We can also **extract constituents from complex types.**

This is to be preferred over dot notation since the latter does not work inside functions.

In [16]:
`date$2015.01.02D10:20:30.123456789
`year$2015.01.02
`month$2015.01.02
`mm$2015.01.02
`dd$2015.01.02
`hh$10:20:30.123456789
`minute$10:20:30.123456789
`uu$10:20:30.123456789
`second$10:20:30.123456789
`ss$10:20:30.123456789

2015.01.02


2015i


2015.01m


1i


2i


10i


10:20


20i


10:20:30


30i


#### 7.2.4 Casting Integral Infinities

When integral infinities are cast to integers of wider type, they are their underlying bit patterns, reinterpreted. Since these bit patterns are legitimate values for the wider type, the cast results in a finite value.

In [17]:
`int$0Wh

`int$-0Wh

`long$0Wi

`long$-0Wi


32767i


-32767i


2147483647


-2147483647


#### 7.2.5 Coercing Types

Casting can be used to coerce type-safe assignment. Recall that assignment into a simple list must strictly match the type.

This situation can arise when the list and the assignment value are created dynamically. Coerce the type by casting it to that of the target, provided of course that the cast is legitimate.

In [18]:
L:10 20 30 40
type L
/ L[1]:42h error
/ L,:43h

L[1]:(type L)$42h  / coercing
L
L,:(type L)$43h    / coercing
L

7h


10 42 30 40


10 42 30 40 43


#### 7.2.6 Cast is Atomic

In [19]:
`float$(42j; 42i; 42j)    / Cast is atomic in the right operand.
`short`int`long$42        / Cast is atomic in the left operand.
`short`int`long$10 20 30  / Cast is atomic in both operands simultaneously.

42 42 42f


42h
42i
42


10h
20i
30


### 7.3 Data to and from Text

#### 7.3.1 Data to Strings

The function **string** can be applied to any q entity to produce a text representation suitable for console display or storage in a file. Here are the key features of string.  ( Note: no dollar sign $ )

- The result is always a list of char, never a single char. Thus you will see singleton char lists from single digits.
- The result contains no q type indicators or other decorations. In general, the result is the most compact representation of the input, which may not actually be convertible (i.e., parsed) back to the original value.
- Applying string to an actual string (i.e., list of char) probably will not give you what you want.

In [20]:
string 42
string 4
string 42i
string "42"
string `42
a:2.0
string a
f:{x*x}
string f

string (1 2 3; 10 20 30)

"42"


,"4"


"42"


,"4"
,"2"


"42"


,"2"


"{x*x}"


,"1" ,"2" ,"3"
"10" "20" "30"


Use string to convert a list (or column) of symbols to strings.

In [21]:
string `Life`the`Universe`and`Everything

"Life"
"the"
"Universe"
"and"
"Everything"


#### 7.3.2 Creating Symbols from Strings

Casting from a string (i.e., a list of char) to a symbol is a foolproof way to create symbols. It is the only way to create symbols with embedded blanks or other special characters that cannot be entered into a literal symbol. To cast a char or a string to a symbol, use **`$**

In [22]:
`$"abc"
`$"abc def"

`abc


`abc def


Do not use `symbol$ for this as it generates an error. This is a common qbie mistake.

The source string is left- and right-trimmed during the cast. The author knows no workaround to force leading or trailing blanks into a symbol.

In [23]:
string `$" abc "

"abc"


The monadic `$ is atomic and will thus convert an entire list (or column) of strings to symbols.

In [24]:
`$("Life";"the";"Universe";"and";"Everything")

`Life`the`Universe`and`Everything


### 7.3.3 Parsing Data from Strings

- **number from string**

The $ operator is overloaded to parse strings into data of any type exactly as the q interpreter does. This overload is invoked by using an uppercase type char as the target left operand and a string in the right operand. If the specified parse cannot be performed, a null of the target type is returned – i.e., missing or bad data – instead of an exception.

In [25]:
"J"$"42"
"F"$"42"

42


42f


- **date from string**

Date parsing is flexible with respect to the format of the date..

In [26]:
"D"$"12.31.2014"
"D"$"12-31-2014"
"D"$"12/31/2014"

2014.12.31


2014.12.31


2014.12.31


- **function from string**

To create a function from a string, use the built-in value, which is the q interpreter or parse, which is the parse step of the interpreter.

In [27]:
f1: value "{x*x}"
f2: parse "{x*x}"

f1 3
f2 4

9


16


### 7.4 Creating Typed Empty Lists

In [28]:
L:()
type L / empty list
L
L,:42  / list changed type to long
type L
/ L,:3.14  / error as 3.14 is float

0h




7h


To avoid this, cast the empty list using the name of the desired type, which makes it an empty simple list of that type. Now only atoms of the specified type can be appended in place.

In [29]:
c1:`float$()
/ c1,:42 / error as 42 is long
c1,:42.0
c1,:3.14
c1

42 3.14


Notice that an operation that yields a simple list retains the type on an empty result. This yields a succinct idiom to create typed empty lists.

In [30]:
0#0
0#0.0
0#`

`long$()


`float$()


`symbol$()


There is no way in q to type nested empty lists.

### 7.5 Enumerations

**Enumeration is mapping of Symbols to Integers**

One of the purpose ( along with traditional) is **Data Normalization.**

Broadly speaking, data normalization seeks to eliminate duplication, retaining only the minimum required data. In the archetypal example, suppose you know that you will have a list of text entries taken from a fixed and reasonably short set of values – e.g., stock exchange ticker symbols. Storing a long list of such strings verbatim presents two problems.

- Values of variable length complicate storage management and make retrieval inefficient.
- There is potentially much duplication of data arising from repeated values. This is hard to keep in sync when values change.

The key ingredients are a (presumably repetitive) list v of symbols drawn from a unique list of symbols u. As in the case of ticker symbols, it may be that we know the list u in advance.

In [31]:
u:`g`aapl`msft`ibm
v:1000000?u
v

`g`g`msft`aapl`msft`aapl`msft`ibm`msft`aapl`g`ibm`aapl`msft`msft`aapl`g`aapl`..


In [32]:
k:u?v  / list of indices of v element in u
k

0 0 2 1 2 1 2 3 2 1 0 3 1 2 2 1 0 1 2 0 1 1 2 3 0 1 2 1 2 1 0 0 1 2 1 2 3 3 0..


In [33]:
u[k]
v~u[k] / v is composition of u and k

`g`g`msft`aapl`msft`aapl`msft`ibm`msft`aapl`g`ibm`aapl`msft`msft`aapl`g`aapl`..


1b


#### 7.5.3 Enumerating Symbols

**The process of converting a list of symbols to the equivalent list of indices described in the previous section is called enumeration in q. ** It uses (yet another overload of) $ with the name of the variable holding the unique symbols as the left operand and a list of symbols drawn from that domain on the right.

Under the covers, $ does the indexing operation in the previous section and then replaces each symbol with its index.

In [34]:
`u$v

`u$`g`g`msft`aapl`msft`aapl`msft`ibm`msft`aapl`g`ibm`aapl`msft`msft`aapl`g`aa..


You can recover the underlying integer values (i.e., k above) by casting to an integer.

In [35]:
`int$(`u$v)

0 0 2 1 2 1 2 3 2 1 0 3 1 2 2 1 0 1 2 0 1 1 2 3 0 1 2 1 2 1 0 0 1 2 1 2 3 3 0..


Let's summarize. The basic form of an enumerated symbol is,

**`u$v**

where u is a simple list of unique symbols and v is either an atom appearing in u or a (possibly nested) list of such. We call u the domain of the enumeration and the projection `u$ is enumeration over u. Under the covers, applying the enumeration `u$ to a vector v produces the index list k as above.

For this style of enumeration, all potential values must be in the list u; otherwise you will get a 'cast error when trying to enumerate.

**sym**

When working with tables in kdb+, by convention all symbol columns in all tables are enumerated over a common domain sym. You will hear this referred to as the **sym list or the sym file**, depending on where it resides.

In [35]:
sym

[0;31msym[0m: [0;31msym[0m

#### 7.5.4 Using Enumerated Symbols

In [36]:
sym:`g`aapl`msft`ibm
v:1000000?sym
v
ev:`sym$v
ev

`msft`g`msft`g`aapl`ibm`msft`g`g`g`msft`msft`aapl`g`g`ibm`g`aapl`ibm`g`aapl`i..


`sym$`msft`g`msft`g`aapl`ibm`msft`g`g`g`msft`msft`aapl`g`g`ibm`g`aapl`ibm`g`a..


The enumerated ev can be substituted for the original v in nearly all situations.

In [37]:
v[3]
ev[3]

`g


`sym$`g


While the enumerated version is item-wise equal to the original, the entities are not identical.

In [38]:
all v=ev
v~ev    / type matters

1b


0b


#### 7.5.5 Type of Enumerations

Each enumeration is assigned a new numeric data type, beginning with 20h. Starting with q version 3.2, the type 20h is reserved for the conventional enumeration domain sym, whether you use it or not (you should). The types of other enumerations you create will begin with 21h and proceed sequentially. The convention of negative type for atoms and positive type for simple lists still holds.

In [39]:
sym:`b`c`a
type `sym$100?sym

20h


In [40]:
sym1:`g`aapl`msft`ibm
type `sym1$1000000?sym1

20h


Enumerations with different domains are distinct, even when all the constituents are the same.

#### 7.5.6 Updating an Enumerated List

The normalization provided by an enumeration reduces updating all occurrences of a given value to a single operation. This can have significant performance implications for large lists with many repetitions. Continuing with our example above, suppose the list u contains the items in a stock index and we wish to change one of the constituents. A **single update** to u suffices.

In [41]:
sym:`g`aapl`msft`ibm
ev:`sym$`g`g`msft`ibm`aapl`aapl`msft`ibm`msft`g`ibm`g
sym[0]:`bbq
ev   / `g replaced to `bbq in all slots in ev


`sym$`bbq`bbq`msft`ibm`aapl`aapl`msft`ibm`msft`bbq`ibm`bbq


In contrast, to make the equivalent update to v requires changing **every** occurrence.

**sym list should not be updated manually!!!**

#### 7.5.7 Dynamically Appending to an Enumeration Domain

One situation in which an enumeration is more complicated than working with the denormalized data is when you want to add a new value.

For list its a single operation, but for enumeration value first should be added into domain. 

In [42]:
sym:`a`b`c`d
v:10000?sym
v
ev:`sym$v
ev

`d`d`b`d`d`c`a`b`b`b`a`b`d`b`b`c`c`a`c`a`a`a`b`a`a`d`c`a`c`a`c`d`d`b`d`c`d`b`..


`sym$`d`d`b`d`d`c`a`b`b`b`a`b`d`b`b`c`c`a`c`a`a`a`b`a`a`d`c`a`c`a`c`d`d`b`d`c..


In [47]:
v,:`h / list

/ enumeration
sym,:`h
ev,:`h 
ev

`sym$`d`d`b`d`d`c`a`b`b`b`a`b`d`b`b`c`c`a`c`a`a`a`b`a`a`d`c`a`c`a`c`d`d`b`d`c..


Fortunately, q has anticipated this situation.

If you cannot know the full extent of the enumeration domain in advance, you can use (yet another overload of) ? to create the domain on the fly. The syntax of ? is the same as the enumeration overload of $ – i.e., the name of a (unique) list of symbols as left operand and a source symbol or list of symbols as right operand.

This application of ? has the side effect of first checking to see if the source symbols are in the domain named by the left operand, and appends any that aren't. In any case, it returns the enumerated version of the source just like $.

In [48]:
sym:()
/ `sym$`aaa  / error as aaa is not in sym
`sym?`aaa
sym


`sym$`aaa


,`aaa


Adding value to enumeration idiom:

In [53]:
sym:`a`b`c`d
v:`d`b`d`a`c`d
v
ev:`sym$v
ev
`sym?`twtr
ev,:`sym?`twtr
ev
sym

`d`b`d`a`c`d


`sym$`d`b`d`a`c`d


`sym$`twtr


`sym$`d`b`d`a`c`d`twtr


`a`b`c`d`twtr


#### 7.5.8 Resolving an Enumeration

An enumerated symbol can be substituted for its equivalent symbol value in most expressions. However, there are some situations in which you need non-enumerated values. One case is converting from one enumeration domain to another, which happens when copying from one kdb+ database to another or in merging two databases.

Given an enumerated symbol, or a list of such, you can recover the un-enumerated value(s) by applying the built-in value.

In [55]:
sym:`g`aapl`msft`ibm
v:1000000?sym
ev:`sym$v
value ev
v~value ev

`msft`aapl`ibm`msft`g`msft`aapl`ibm`ibm`g`ibm`g`g`ibm`aapl`msft`msft`g`g`ibm`..


1b
