Roughly speaking, kdb+ is what happens when q tables are persisted and then mapped back into memory for operations. 

### 14.1 Tables in Memory and Serialization

It is possible to maintain a table entirely in memory, provided you have enough physical memory to hold it. There is one problem with this from a database perspective:

An in-memory table is ephemeral – meaning that all modifications are lost if the q process dies.
One solution is to serialize the table to persistent storage using set or similar mechanisms. In this section we recapitulate material from previous chapters from this perspective.

#### 14.1.1 Tables and Keyed Tables

**A table is the flip of a column dictionary, in which address slots are reversed but no data is moved during the transpose.** For example, here is a table with two simple list columns.

In [1]:
flip `s`v!(`a`b`c;100 200 300)

s v  
-----
a 100
b 200
c 300


Table definition syntax permits tables to be defined in more readable format.

In [2]:
([] s:`a`b`c; v:100 200 300)

s v  
-----
a 100
b 200
c 300


The schema of a table has the same form but with empty columns – i.e., no rows.

In [3]:
([] s:`symbol$(); v:`int$())

s v
---


It is good practice to prototype the empty lists in a schema; unfortunately, this is not possible if the corresponding columns are not simple lists.

The type of any table is **98h** and the function **meta** summarizes the column names, types and attributes in a result keyed table.

In [4]:
meta ([] s:`symbol$(); v:`int$())

c| t f a
-| -----
s| s    
v| i    


**A keyed table** is a dictionary that establishes positional correspondence between a table of (presumably unique) keys and a table of values.

In [5]:
([] id:1001 1002 1003)!([] s:`a`b`c; v:100 200 300)

id  | s v  
----| -----
1001| a 100
1002| b 200
1003| c 300


Table definition syntax is more compact.

In [6]:
([id:1001 1002 1003] s:`a`b`c; v:100 200 300)

id  | s v  
----| -----
1001| a 100
1002| b 200
1003| c 300


The type of any keyed table is **99h**, since it is a dictionary and meta applies exactly as with tables.

#### 14.1.2 Foreign Keys and Link Columns

**A foreign key** is one or more table columns that are enumerated over the key component of a keyed table. For example, the column ID in the table below is a foreign key over kt. Note that the foreign-key column is identified by the name of its target table in the result of meta.

In [17]:
kt:([id:1001 1002 1003] s:`a`b`c; v:100 200 300) / keyed
t:([]; id:`kt$1002 1001 1003 1001; q:100 101 102 103)
t

id   q  
--------
1002 100
1001 101
1003 102
1001 103


In [18]:
meta t

c | t f  a
--| ------
id| j kt  
q | j     


A query on a table having a foreign key can access columns in the keyed table via dot notation.

In [9]:
select id.v, q from t

v   q  
-------
200 100
100 101
300 102
100 103


**A link column** is similar to a foreign key, in that its entries are indices of rows in a table, but you must perform the lookup manually. The advantages of link columns are:

- The target can be a table or keyed table.
- The target can be the table containing the link column.
- Link columns can be splayed or partitioned, whereas foreign keys cannot.


Here is the previous foreign-key example redone with a link column against a table.

In [11]:
tk:([] id:1001 1002 100; s:`a`b`c; v:100 200 300) / not keyed

In [19]:
t:([]; id:`tk!(exec id from tk)?1002 1001 1003 1001; q:100 101 102 103)
t

id q  
------
1  100
0  101
3  102
0  103


In [20]:
meta t

c | t f  a
--| ------
id| j tk  
q | j     


Here is an example that uses a link column to implement a **hierarchical structure in a table**. The column pid is a link column that relates a row to its parent row.

In [22]:
tree:([] id:0 1 2 3 4; pid:`tree!0N 0 0 1 1; v:100 200 300 400 500)
tree

id pid v  
----------
0      100
1  0   200
2  0   300
3  1   400
4  1   500


In [23]:
select from tree where pid=0 / find children of root

id pid v  
----------
1  0   200
2  0   300


#### 14.1.3 Serializing Tables

It is possible to persist any table (or keyed table) using the general q serialization/deserialization capability of **set and get**. There is no restriction on table or column types.

In [14]:
`:ex_tables/table1 set ([] s:`a`b`c; v:100 200 300)

`:ex_tables/table1


In [15]:
show t2:get `:ex_tables/table1 / C:/Dev/kdb_q/notebooks/q_for_mortals/t1


s v  
-----
a 100
b 200
c 300


You can serialize foreign keys and link columns and bring them back into memory.

#### 14.1.4 Operating on Serialized Tables

You operate on a serialized table by loading it into memory with get or \l

In [16]:
t1:0
\l ex_tables/table1
table1

`table1


s v  
-----
a 100
b 200
c 300


In [17]:
t2: get `:ex_tables/table1
t2

s v  
-----
a 100
b 200
c 300


Alternatively, you can perform a query on a serialized table by specifying its file handle as the table name.

In [18]:
select from `:ex_tables/table1 where s in `a`c

s v  
-----
a 100
c 300


In [19]:
`:ex_tables/table1 upsert (`x;42)

`:ex_tables/table1


In [20]:
select from `:ex_tables/table1

s v  
-----
a 100
b 200
c 300
x 42 


Similar operations are available on keyed tables.

The limitation to using a serialized table or keyed table is that, behind the scenes, the operations load it into memory and write it back out. Amongst other things, this means that anyone wanting to work with it must be able to fit it into memory in its entirety.

### 14.2 Splayed Tables

In the previous section, we saw that it is possible to persist tables using serialization. From a database perspective there are (at least) two issues with operating on serialized tables due to the fact that the entire table is loaded into memory.

- The entire table must fit into memory on each user’s machine.
- Operations against the persisted table will be slow due to reloading the entire table each time.

When a table is too large to fit into memory as a single entity, we can persist its components into a directory. This is called **splaying** the table because the table is pulled apart into its constituent columns 

Splaying solves the memory/reload issue because a splayed table is mapped into memory; columns are loaded on demand then memory is released when no longer needed. Tables with many columns especially benefit from splaying since most queries refer to only a handful of columns and only those columns will actually be loaded.

A splayed table corresponds to a directory whose name is the table name. Each column list of the table is serialized into a file whose name is the column name.

A list of the symbolic column names is serialized to the hidden file .din the directory to record column order. This is the only metadata stored by kdb+; all other metadata is read from directory and file names.

Customarily the splayed directory is created one level down from a directory that serves as the root of the database.


- /root
- /tablename <- splayed table directory
- .d <- file with column names
- column1name <- column data file
- column2name <- column data file
…

#### 14.2.1 Creating Splayed Tables

**Make sure you include the trailing **/ **in the file handle; otherwise, you will serialize the table into a single file.**

In [1]:
`:ex_splayed/table2/ set ([] v1:10 20 30; v2:1.1 2.2 3.3)

`:ex_splayed/table2/


It is also possible to create a splayed table with upsert, or with the equivalent generalized application, using the file handle as the table name. When the file does not exist, these act like set.

In [3]:
`:ex_splayed/table3/ upsert ([] v1:10 20 30; v2:1.1 2.2 3.3)

`:ex_splayed/table3/


Reading the constituents of the splayed directory with get demonstrates that they are simply serialized q entities.

In [5]:
get `:ex_splayed/table3/v1
get `:ex_splayed/table3/v2
get `:ex_splayed/table3/.d

10 20 30


1.1 2.2 3.3


`v1`v2


**Restrictions:**

1) Tables can be splayed. Keyed tables cannot.
   Hence, FK can't be splayed, but linked columns can be.
   
2) Only columns that are simple lists or compound lists can be splayed. By compound list we mean a list of simple lists of uniform type.

3) All symbol columns must be enumerated.

#### 14.2.2 Splayed Tables with Symbol Columns

The convention for symbol columns in splayed (and partitioned) tables is that all symbol columns in all tables are enumerated over the list **sym**, which is serialized into the root directory.

**.Q.en** utility can be used to enumerate symbols

In [17]:
`:ex_splayed/table4/ set .Q.en[`:ex_splayed;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)

`:ex_splayed/table4/


In [4]:
\l ex_splayed / loading entire db: sym file + tables

In [5]:
sym

`a`b`c`x`y`z


In [9]:
select from table3

v1 v2 
------
10 1.1
20 2.2
30 3.3


In [10]:
select from table4

s1 v  s2
--------
a  10 x 
b  20 y 
c  30 z 


#### 14.2.3 Splayed Tables with Nested Columns

The only nested columns that can be splayed are what we call compound lists – i.e., **lists of simple lists of uniform type** - all elements in column are lists of the elements of the same type. **The most common example is a list of strings, which is a list of lists of char**. A compound column is indicated by an upper case letter in the result of meta. 

In [4]:
show cc:([] ci:(1 2 3; enlist 4; 5 6); cstr:("abc";enlist"d";"ef"))
meta cc

ci    cstr 
-----------
1 2 3 "abc"
,4    ,"d" 
5 6   "ef" 


c   | t f a
----| -----
ci  | J    
cstr| C    


In [5]:
cc1:([] c:(1;1,1;`1))
meta cc1

c| t f a
-| -----
c|      


- One question that always arises when designing a kdb+ database is **whether to store text data as symbols or strings**. The advantage of symbols is that they have atomic semantics and, since they are integers under the covers once they are enumerated, processing is quite fast. The main issue with symbols is that if you make all text into symbols, your sym list gets enormous and the advantages of enumeration disappear.


- In contrast, strings do not pollute the sym list with one-off instances and are reasonably fast. The disadvantage is that they are not first class and you must revert to teenage years by using like to match them.


- Only make text columns into symbols when the fields will be drawn from a small, reasonably stable domain and there is significant repetition in their use. When in doubt, start with a string column. It is much easier to convert a string column to symbols that it is to remove symbols from the sym list.


- A text column that is drawn from a fixed list or a lookup table is an ideal candidate. So are invariant keys, provided the key domain is small and will not grow unreasonably. On the other hand, fields such as comments or notes should always be strings.

#### 14.2.4 Basic Operations on Splayed Tables

In [27]:
`:ex_splayed14_2_4/table5/ set .Q.en[`:/ex_splayed14_2_4;] ([] s1:`a`b`cc; v:10 20 30; s2:`x`y`zz)

`:ex_splayed14_2_4/table5/


To operate on a splayed table you can map it into memory in one of two ways:

- You can specify a splayed table on the q startup command immediately after the q executable.
- Alternatively, you can use \l to map the table.

**After that step table is not loaded, but mapped into memory** 

The illusion that the table is actually in memory after it is mapped is convincing. Many fundamental table operations work on splayed tables.

In [28]:
\l ex_splayed14_2_4

In [29]:
sym
table5

`a`b`c`x`y`z`cc`zz


s1 v  s2
--------
a  10 x 
b  20 y 
cc 30 zz


You cannot use dot notation to extract columns from a splayed table but you can extract a column with symbol indexing.

In [30]:
table5 `s1

`sym$`a`b`cc


In [30]:
table5.s1

[0;31ms1[0m: [0;31ms1[0m

- You can use both select and exec templates on a splayed table.
- This contrasts with partitioned tables where you can only use select.

#### 14.2.5 Operations on a Splayed Directory

As of this writing (Sep 2015), the table operations available against the file handle of a splayed table are:
- select
- exec
- upsert
- xasc
- `attr# (apply an attribute).

In [30]:
select from `:ex_splayed14_2_4/table5

[0;31mex_splayed14_2_4/table5. OS reports: The system cannot find the path specified.[0m: [0;31mex_splayed14_2_4/table5. OS reports: The system cannot find the path specified.[0m

In [3]:
exec v from `:ex_splayed14_2_4/table5

10 20 30


In [10]:
`v xdesc `:ex_splayed14_2_4/table5
select from `:ex_splayed14_2_4/table5

`:ex_splayed14_2_6/table5


s1 v  s2
--------
cc 30 zz
b  20 y 
a  10 x 


We point out a source of confusion to qbies. Specifically, the behavior of **update on a splayed table that has been mapped into memory.** 

In [31]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_5/table6/ set .Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_5;] ([] s1:`a`b`cc; v:10 20 30; s2:`x`y`zz)

`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_5/table6/


In [32]:
\l C:\Dev\kdb_q\notebooks\q_for_mortals\ex_splayed14_2_5
table6

s1 v  s2
--------
a  10 x 
b  20 y 
cc 30 zz


In [33]:
update v:300 from `table6 where s1=`cc
table6

`table6


s1 v   s2
---------
a  10  x 
b  20  y 
cc 300 zz


Update was not persisted!

Updates applied to a mapped table are only visible in the workspace and are not reflected on disk. 

In [34]:
\l C:\Dev\kdb_q\notebooks\q_for_mortals\ex_splayed14_2_5
table6

s1 v  s2
--------
a  10 x 
b  20 y 
cc 30 zz


###### fundamental limitation of kdb+.

It is not possible to use built-in operations to update data in persisted splayed tables.

** Kdb+ is intended to store data that is not updated or deleted once it has been written.** We shall see in the next section how to append to a splayed table, which makes it possible to process updates and deletes in a bitemporal fashion, but this capability is not available out of the box.



#### 14.2.6 Appending to a Splayed Table

Since upsert acts as insert on regular (non-keyed) tables and only non-keyed tables can be splayed, **we use upsert with the splayed directory name in order to append records to a splayed table on disk**. This is a good thing, since insert doesn’t work on splayed tables. Also, because symbol columns must be enumerated for splayed tables, it is best to make rows into tables.

In [1]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6/table7/ set .Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
select from `:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6/table7/

`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6/table7/


s1 v  s2
--------
a  10 x 
b  20 y 
c  30 z 


In [2]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6/table7 upsert .Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6;] ([] s1:`d`e; v:40 50; s2:`u`v)

`:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6/table7


In [3]:
select from `:C:/Dev/kdb_q/notebooks/q_for_mortals/ex_splayed14_2_6/table7

s1 v  s2
--------
a  10 x 
b  20 y 
c  30 z 
d  40 u 
e  50 v 


#### 14.2.7 Manual Operations on a Splayed Directory

Although there are no built-in operations to update splayed tables on disk, you can perform such operations by manipulating the serialized files.



The examples shown here should be used with caution, as none of the operations are atomic; they are simply file-system manipulation. Even read-only users could see inconsistent data, so things are best done when no other users are accessing the database.

#### 14.2.9 Splayed Tables with Link Columns

We previously pointed out that you cannot splay a keyed table, and therefore cannot have a foreign-key relation between splayed tables. However, you can splay a link column and then use dot notation just as you would with a foreign key. You must do the work of creating the index yourself, just as with link columns with tables in memory.

In our first example, we create the link at the same time as we splay the tables. This is the same as creating the link on in-memory tables and then splaying them.

In [41]:
t1:([] c1:`c`b`a; c2: 10 20 30)
t2:([] c3:`a`b`a`c; c4: 1. 2. 3. 4.)
update t1lnk:`t1!t1[`c1]?t2[`c3] from `t2
`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9/t1/ set `.Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9;] t1
`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9/t2/ set `.Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9;] t2


`t2


`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9/t1/


`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9/t2/


In [42]:
\l C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9

In [43]:
t1
t2
meta t2

c1 c2
-----
c  10
b  20
a  30


c3 c4 t1lnk
-----------
a  1  2    
b  2  1    
a  3  2    
c  4  0    


c    | t f  a
-----| ------
c3   | s     
c4   | f     
t1lnk| j t1  


In [44]:
select c3,t1lnk.c2,c4 from t2

c3 c2 c4
--------
a  30 1 
b  20 2 
a  30 3 
c  10 4 


**Example 2.**
Now we redo this example, assuming that the tables have already been splayed. You could map the database into memory but let’s work directly with the files. We have the additional step of appending the link columns to the .d file for t2.

In [46]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t3/ set .Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0;] ([] c1:`c`b`a; c2: 10 20 30)
`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/ set .Q.en[`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0;] ([] c3:`a`b`a`c; c4: 1. 2. 3. 4.)

`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t3/


`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/


In [47]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/t1link set `t1!(get `:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t3/c1)?get `:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/c3

`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/t1link


In [49]:
.[`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/.d;();,;`t1link]  / append link columns to .d

`:C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0/t4/.d


In [50]:
\l C:/Dev/kdb_q/notebooks/q_for_mortals/14_2_9_0

In [51]:
meta t4

c     | t f  a
------| ------
c3    | s     
c4    | f     
t1link| j t1  


### 14.3 Partitioned Tables

Some timeseries data is so large that even the individual columns may not fit into memory – for example, daily trades and quotes for an entire exchange. In this case, we can further decompose the table by slicing horizontally – called **partitioning in kdb+**. For example, the solution for trades and quotes is to slice into daily portions. The result is a collection of daily splayed directories, one for each day for which data exists.

_Tip_ <br>
**All partitioned tables are splayed but not all splayed tables are partitioned.**

#### 14.3.1 Partitions

**A partitioned table is a splayed table that is further decomposed by grouping records having common values along a column of special type.** The allowable special column types have the property that the underlying value is an **integer**: date, month, year and long.

The slice of records having a given value is splayed into a directory, called a partition, whose name is that common value. In the canonical finance example, historical trades (or quotes) are stored in daily partition directories – remember a q date is an integer under the covers.

/root <br>
    /partitionvalue1<br>
        /tablename<br>
            .d<br>
            column1name<br>
            column2name<br>
            …<br>
    /partitionvalue2<br>
        /tablename<br>
            .d<br>
            column1name<br>
            column2name<br>
            …<br>
        …<br>

#### 14.3.2 Partition Domain

We call the type of the virtual column for the partition the **partition domain**. As noted previously, the partition domain must have an underlying integral value.

You cannot use a symbol column as a partition domain, even if the symbols are enumerated.

Date, month, year are fine.
Any integer (bin) is fine.

Entire database has the same partition domain (as per folder structure).

A kdb+ database can only have a single partition domain. This means that you must create separate databases if you need partitions of different granularity. For example, you cannot have daily and monthly partitions in one database.

#### 14.3.3 Creating Partitioned Tables

Virtual column gets its name automatically based on partition domain.

Creating partition after partition.


In [9]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db/2015.01.01/t/ set ([] ti:09:30:00 09:31:00; p:101 102f)

`:C:/Dev/kdb_q/notebooks/q_for_mortals/db/2015.01.01/t/


In [11]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db/2015.01.02/t/ set ([] ti:09:30:00 09:31:00; p:101.5 102.5)

`:C:/Dev/kdb_q/notebooks/q_for_mortals/db/2015.01.02/t/


In [14]:
\l C:/Dev/kdb_q/notebooks/q_for_mortals/db

In [15]:
show t

date       ti       p    
-------------------------
2015.01.01 09:30:00 101  
2015.01.01 09:31:00 102  
2015.01.02 09:30:00 101.5
2015.01.02 09:31:00 102.5


The table appears to be in the workspace, along with the virtual date columns, but this is an illusion. It is actually mapped into memory. **The request to display t forces all columns for all days to be loaded into memory.**

**Always qualify the partition column in the first where sub-phrase in any query against a partitioned table. If you do not, you will cause all partitions to be loaded into memory and will probably live-lock the server. **

In [16]:
select from t where date within 2015.01.01 2015.01.02

date       ti       p    
-------------------------
2015.01.01 09:30:00 101  
2015.01.01 09:31:00 102  
2015.01.02 09:30:00 101.5
2015.01.02 09:31:00 102.5


#### 14.3.4 Working with Partitioned Tables

exec doesn't work on partitioned tables

workaround is, q exec … from select … from …

**The select template, or the equivalent functional form, is the way to access data for a partitioned table.** We have already seen how to retrieve the records for consecutive days. Here is the query to retrieve a day’s slice.

In [18]:
select from t where date=2015.01.01
select from t where date=first date
select from t where date=max date

date       ti       p  
-----------------------
2015.01.01 09:30:00 101
2015.01.01 09:31:00 102


date       ti       p  
-----------------------
2015.01.01 09:30:00 101
2015.01.01 09:31:00 102


date       ti       p    
-------------------------
2015.01.02 09:30:00 101.5
2015.01.02 09:31:00 102.5


**Always place the partition column constraint first.**

In [19]:
select from t where date=2015.01.01, ti<09:30:30

date       ti       p  
-----------------------
2015.01.01 09:30:00 101


You can group by the partition column.

In [20]:
select hi:max p, lo:min p by date from t where date within 2015.01.01 2015.01.02

date      | hi    lo   
----------| -----------
2015.01.01| 102   101  
2015.01.02| 102.5 101.5


In [21]:
meta t / shows virtual column

c   | t f a
----| -----
date| d    
ti  | v    
p   | f    


#### 14.3.5 The Virtual Column i in Partitioned Tables

In a partitioned table, the **virtual column i** does not refer to absolute row number as it does with in-memory and splayed tables. 

Instead, **it refers to the relative row number within a partition**. Thus, a constraint on i alone would apply across all partitions and the result will contain that row in each partition slice – probably not what you want and almost certainly a bad idea 

In [22]:
select from t where date in 2015.01.01 2015.01.02, i=0

date       ti       p    
-------------------------
2015.01.01 09:30:00 101  
2015.01.02 09:30:00 101.5


The following queries retrieve the first and last records in the table, respectively.

In [23]:
select from t where date=first date, i=0 / first record in table
select from t where date=max date, i=max i / last record in table

date       ti       p  
-----------------------
2015.01.01 09:30:00 101


date       ti       p    
-------------------------
2015.01.02 09:31:00 102.5


#### 14.3.6 Query Execution on Partitioned Tables

Recall that the motivation for partitions was to avoid loading entire columns into memory. Behind the scenes, kdb+ achieves this as follows.

- Analyze the where phrase to determine which partition slices are targeted by the query
- Process the remaining where sub-phrases to determine the column sub-domains that must be loaded.
- Process the query separately against the requisite partition slices to obtain partial results. If q started with slaves then partitions will be processed concurrently, otherwise sequentially.
- Combine the partial results to obtain the final result.

#### 14.3.7 Map-Reduce

It is easy to see how map-reduce applies to a query against a partitioned table, since the table is a list of records sliced into sub-lists by the partitioning. The challenge is to decompose the query into a map step and a reduce step. The solution depends on whether the query involves **aggregation**.

**If there is no aggregation,** the result of the query on each partition is simply the computed columns for the list of the records in the partition slice matching the constraint. In other words, produce a partial result table by computing the columns of the query across partitions. Because all the partial result tables conform, union the partial result tables in order of their virtual partition column values. In summary: fan the query across the partitions and union the ordered results.

Things are more interesting **when the query contains aggregation**. For aggregates that kdb+ recognizes as map-reducible, it applies the map operation across partition slices to obtain partial results. It then applies the reduce operation across the partial result tables to obtain the final result table.


At the time of this writing (Sep 2015), the aggregates that kdb+ can decompose with map-reduce are: **avg, cor, count, cov, dev, distinct, first, last, max, med, min, prd, sum, var, wavg, wsum.**

#### 14.3.8 Multiple Partitioned Tables

Recall that there can be only one partition domain in a given kdb+ root – i.e., daily, monthly, yearly or long. However, multiple tables can share this partitioning.

Although not all potential partition values need be populated, any value that is populated must contain slices for all tables.

Creation of 2 partitioned tables:

In [24]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.01/t/ set .Q.en[`:/db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101 33f)
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.02/t/ set .Q.en[`:/db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;p:101.5 33.5)
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.01/q/ set .Q.en[`:/db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft;b:100.75 32.75; a:101.25 33.25f)
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.02/q/ set .Q.en[`:/db;] ([] ti:09:30:00 09:30:00; sym:`ibm`msft;b:101.25 33.25; a:101.75 33.75)

`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.01/t/


`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.02/t/


`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.01/q/


`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2015.01.02/q/


In [25]:
\l C:/Dev/kdb_q/notebooks/q_for_mortals/db2

In [26]:
select from t where date within 2015.01.01 2015.01.02
select from q where date within 2015.01.01 2015.01.02

date       ti       sym  p    
------------------------------
2015.01.01 09:30:00 ibm  101  
2015.01.01 09:31:00 msft 33   
2015.01.02 09:30:00 ibm  101.5
2015.01.02 09:31:00 msft 33.5 


date       ti       sym  b      a     
--------------------------------------
2015.01.01 09:30:00 ibm  100.75 101.25
2015.01.01 09:31:00 msft 32.75  33.25 
2015.01.02 09:30:00 ibm  101.25 101.75
2015.01.02 09:30:00 msft 33.25  33.75 


Next we add a historical slice for q on 2014.12.31 but neglect to add the corresponding slice for t. Things seem fine when we map the root and query q on the newly added date.

In [27]:
`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2014.12.31/q/ set .Q.en[`:/db;] ([] ti:09:30:00 09:31:00; sym:`ibm`msft; b:101. 33.; a:101.5 33.5f)
\l C:/Dev/kdb_q/notebooks/q_for_mortals/db2
select from q where date=2014.12.31

`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2014.12.31/q/


date       ti       sym  b   a    
----------------------------------
2014.12.31 09:30:00 ibm  101 101.5
2014.12.31 09:31:00 msft 33  33.5 


In [27]:
select from t where date=2014.12.31 / no partition for t

[0;31m./2014.12.31/t/ti. OS reports: The system cannot find the path specified.[0m: [0;31m./2014.12.31/t/ti. OS reports: The system cannot find the path specified.[0m

In [None]:
 we use the utility .Q.chk that fills all missing slices with empty tables from the most recent partition. We remap and find things are fine.

In [28]:
.Q.chk `:C:/Dev/kdb_q/notebooks/q_for_mortals/db2
\l C:/Dev/kdb_q/notebooks/q_for_mortals/db2
select from t where date=2014.12.31

()
()
,`:C:/Dev/kdb_q/notebooks/q_for_mortals/db2/2014.12.31


date ti sym p
-------------


If you neglect to place a table slice in the most recent partition, the table will effectively disappear from your database since kdb+ inspects only that partition to determine which tables are present.

### 14.4. Segmented tables

**Segmentation** is an additional level of structure on top of partitioning. Segmentation spreads a partitioned table’s records across multiple directories that have the same structure as the root directory in a partitioned database. Each pseudo-root, called a segment, is thus a directory that contains a collection of partition directories. The segment directories are presumably on independent I/O channels so that data retrieval can occur in parallel.

You can use any criteria to decompose partition slices, as long as the results are conforming record subsets that are disjoint and complete – i.e., they reconstitute the original table with no omissions or duplication. The decomposition can be along rows, along partitions or by some combination thereof, but it cannot occur only along columns since all records must conform across the decomposition.


Big Picture (3): .We view a segmented table as a three-dimensional persisted form: the table is cut vertically by splaying, sliced horizontally by partitions and is additionally segmented across physical locations. **The primary purpose of the third dimension is to allow operations against the tables to take advantage of parallel I/O and concurrent processing.**

In contrast to the partitioned table layout in which partitions reside under the root, the segment directories must not reside under the root. The only portion of a segmented table (other than the sym file for enumerated symbol columns) that lives in the root is a file called par.txt containing the paths of the physical locations of the segments, one segment path per line.