### 11.1 Binary Data

In q, files come in two flavors: text and binary. Routines to process text data have ‘0’ in their names, whereas routines to process binary data have ‘1’. A text file is considered to be a list of strings – i.e., a list of char lists – and a binary file is a list of byte lists. While all text files can also be processed as binary data, not all binary data represents text. As mentioned above, file operations use handles.

#### 11.1.1 File Handles

**A file handle** is a **symbol** that represents the name of a directory or file on persistent storage. A symbolic file handle starts with a colon : and has the form,

**`:[path]name**

It is generally easier to work with paths and names as strings so that blanks and other special characters can be handled easily. While `$ converts a string to a symbol, it can be awkward to include the leading : required in the symbolic handle. The operator **hsym**, which inserts a leading colon into a symbol, serves this purpose.

In [None]:
hsym `$"data/file name.csv"

#### 11.1.2 hcount and hdel

**hcount** returns a long representing **the size of the file in bytes** as reported by the OS.

In [None]:
hcount `:data/aa.txt

The next one-and-done is **hdel**, which instructs the OS **to remove the file** specified by its symbolic handle operand.

#### 11.1.3 Serializing and Deserializing q Entities

**Every q entity can be serialized and persisted to storage.**

The magic is done by (an overload of) the dyadic **set**, whose left operand is a file handle and right operand is the entity to be written. The result is the symbolic handle of the written file. The file is automatically closed once the write is complete.

In [None]:
`:data/a set 42
`:data/b set 1 2 3
`:data/c set (1 2 3)!`a`b`c

The behavior of set is to create the file if it does not exist and overwrite it if it does. It will also create the directory path if it does not exist.

A serialized q data file can be read using (an overload of) the monadic **get**, whose argument is a symbolic file handle and whose result is the q entity contained in the data file.

In [None]:
get `:data/a
get `:data/b
get `:data/c

An equivalent way to read a data file is with (an overload of) **value**.

In [None]:
value `:data/c

Alternatively, you can use the command **\l** to load a data file into memory and assign it to a variable with the same name as the file. 
**Here you do not use a file handle; rather, specify the path to the file without any decoration.**

#### 11.1.4 Binary Data Files

Open a data file handle with **hopen**, whose result is a function called the open handle. This function should be stored in a variable, traditionally h, which is functionally applied to data to write it to the file. We will explain the result of applying the open handle shortly. We begin with a file containing serialized q data and show how to append to it.

In [5]:
`:data/L set 10 20 30
get `:data/L

`:data/L


10 20 30


##### Append to existing file

In [6]:
h:hopen `:data/L
h[42]      / append 42
h 100 200  / append 100 200
h

252i


252i


252i


In [7]:
hclose h
get `:data/L

10 20 30 42 100 200


Always apply **hclose** to the open handle to close it and flush any data that might be buffered. Failure to do so may cause your program to run out of file handles unnecessarily.

##### Append to new file  ( doesnt work)

In [19]:
h:hopen `:data/L2 

In [20]:
h  / returns int which is open handle itself

1224i


Append can be applied to this int :)

In [21]:
h[100 200 300]

1224i


In [22]:
hclose h

In [22]:
get `:data/L2 / doesnt work - only works when file created by set before

[0;31mdata/L2[0m: [0;31mdata/L2[0m

#### 11.1.5 Writing and Reading Binary ( read1 / 1:)

Apply **read1** on a file handle to read any file into q as a list of bytes. For example, we can read the previously serialized value L as bytes.

In [23]:
read1 `:/data/L set 10 20 30 / This shows the internal representation of the serialized q entity. 

0xfe2007000000000003000000000000000a0000000000000014000000000000001e000000000..


If you want to write raw binary data, as opposed to the internal representation of a q entity containing the data, use the infelicitously named **1:**. It takes a symbolic file handle as its left argument and a simple byte list as its right argument. Bytes in the right operand are essentially streamed to the file.



In [24]:
`:data/answer.bin 1: 0x06072a
read1 `:data/answer.bin

`:data/answer.bin


0x06072a


#### 11.1.6 Using Dot Amend

Fundamentalists can use dot amend in place of set to serialize q entities to files. **To write the file**, or overwrite an existing file, use **assign :**

In [27]:
.[`:data/raw; (); :; 1001 1002 1003]

`:data/raw


In [28]:
get `:/data/raw

1001 1002 1003


**To append** to an existing file use **,**

In [30]:
.[`:data/raw; (); ,; 42]

`:data/raw


In [31]:
get `:data/raw

1001 1002 1003 42


### 11.2 Save and Load on Tables

In its simplest form, **save** serializes a table in a global variable to a binary file having the same name as the variable. It overwrites an existing file.

In [32]:
t9:([] c1:`a`b`c; c2:10 20 30; c3:1.1 2.2 3.3)
save `:data/t9
get `:data/t9


`:data/t9


c1 c2 c3 
---------
a  10 1.1
b  20 2.2
c  30 3.3


This is equivalent to using **set** above with the table name as file name.

**load** is the inverse of save meaning that it reads a serialized table from a file into a variable with the same name as the file. It creates the variable in the workspace or overwrites it if it already exists.

In [33]:
load `:data/t9
t9

`t9


c1 c2 c3 
---------
a  10 1.1
b  20 2.2
c  30 3.3


#### Saving table in the file

**You can also use save to write a table to a text file.** You determine the format of the text with the file extension in the file handle.

All the following versions of save can also be performed with the more general 0:

###### Save the table with .txt extension to obtain tab-delimited records.

In [36]:
save `:data/t9.txt

`:data/t9.txt


###### Save the table with .csv extension to obtain comma-separated values

In [35]:
save `:data/t9.csv

`:data/t9.csv


###### Save the table with .xml extension to obtain XML records. 

There is no direct way to read XML into q although libraries have been contributed – see code.kx.com.

In [37]:
save `:data/t9.xml

`:data/t9.xml


###### Save the table with .xls extension obtain an Excel spreadsheet. 

In [38]:
save `:data/t9.xls

`:data/t9.xls


#### 11.3 Splayed Tables

To splay a table, use **set** and **specify a directory as the target location indicated by a trailing slash / in the left operand**.

In [39]:
`:data/tsplay/ set ([] c1:10 20 30; c2:1.1 2.2 3.3)

`:data/tsplay/


Nearly all the metadata regarding the splayed table can be read from the file system – i.e., the name of table from directory and names of the columns from the files. The one missing bit is the order of the columns, which is stored as a serialized list in the hidden .d file.

In [40]:
get hsym `$"data/tsplay/.d"

`c1`c2


**There are restrictions on tables that can be splayed.**

- All columns must be simple or compound lists. The latter means a list of simple lists of uniform type. An arbitrary general list column cannot be splayed.
- Symbol columns must be enumerated.

The convention for enumerating symbols in splayed tables is to enumerate all symbol columns in all tables over the domain **sym** and store the resulting sym list in the root directory – i.e., one level above the directory holding the splayed table. You can do this manually but practically no one does.



Normally folks use one of the .Q utilities, in spite of the official Kx admonition not to use them. For example, here we use **.Q.en**.

In [44]:
`:data/tsplay2/ set .Q.en[`:/db; ([] c1:`a`b`c`f`ggg; c2:10 20 30 40 50)]

`:data/tsplay2/


### 11.4 Text Data

We have seen that q views a record in a binary data file as a list of bytes. Similarly, a record in a text file is viewed as a list of char – i.e., a string. Thus reading a text file results in a list of strings and you pass a list of strings to write to a text file.

#### 11.4.1 Reading and Writing Text Files

Read a text file with the monadic **read0** that takes a symbolic file handle argument. The result is a list of strings, one for each line in the file. 

In [46]:
read0 `:data/solong.txt / strings
read1 `:data/solong.txt / binaries

"So long"
"and thanks"
"for all the fish"


0x536f206c6f6e670d0a616e64207468616e6b730d0a666f7220616c6c207468652066697368


Or you can read the data as binary and cast the result to char. Observe that the data is a simple list of char so the newline character does not cause line breaks in the console display.

In [47]:
"c"$read1 `:data/solong.txt

"So long\r\nand thanks\r\nfor all the fish"


To write string as text, use the (infelicitously named)** dyadic 0:**, which takes a file handle in the left operand and a list of strings in the right operand. It creates the directory path if necessary and overwrites the file if it already exists.

In [48]:
`:data/solong1.txt 0: ("Life"; "The Universe"; "And Everything")

`:data/solong1.txt


In [49]:
read0 `:data/solong1.txt

"Life"
"The Universe"
"And Everything"


#### 11.4.2 Using hopen and hclose

Just as with a binary data file, a symbolic text file handle can be opened with hopen. The result is again an int that is conventionally stored in the variable h and is used with function application syntax to write data. The difference is that instead of using plain h to write binary data, you use **neg[h]** to write strings as text. 

In [52]:
h:hopen `:data/new7.txt
neg[h] enlist "This"
neg[h] ("and"; "that")
hclose h       / apply hclose to h, not to neg[h]
read0 `:data/new7.txt

-1196i


-1196i


"This"
"and"
"that"


If the file already exists, opening with hopen and applying the open handle will append rather than overwrite.

In [54]:
h:hopen `:data/new7.txt
neg[h] ("and"; "more")
hclose h
read0 `:data/new7.txt

-1196i


"This"
"and"
"that"
"and"
"more"


#### 11.4.3 Preparing Text

**save and load** are built-in functions for saving tables as text files in §11.2. ( srore into file with the same name as table).

When you need to control the filename, **you can write the table yourself with 0:, but then you must prepare the table columns as formatted text.** A separate overload of 0: is available for this purpose. A confusing naming convention, to say the least.

In this use, **0: has as left operand a char delimiter and as right operand a table or list of columns.** Observe the use of the pre-defined constant csv, which is simply ",".

In [55]:
t:([] c1:`a`b`c; c2:1 2 3)

In [56]:
"\t" 0: t

"c1\tc2"
"a\t1"
"b\t2"
"c\t3"


In [57]:
"|" 0: t

"c1|c2"
"a|1"
"b|2"
"c|3"


In [58]:
csv

","


In [59]:
csv 0: t

"c1,c2"
"a,1"
"b,2"
"c,3"


In [60]:
`:data/t6.csv 0: csv 0: t

`:data/t6.csv


In the last snippet we applied 0: with two different meanings: to prepare and then write text. There is one more for parsing !

### 11.5 Parsing Records

Dyadic forms of 0: and 1: parse individual fields according to data type from text or binary records. 

#### 11.5.1 Fixed-Width Records

The dyadic form of 0: and 1: for reading fixed length files is,

**(Lt;Lw) 0:f**

(Lt;Lw) 1:f

The left operand is a nested list containing two items: 
- Lt is a simple list of char containing one letter per field; 
- Lw is a simple list of int containing one integer width per field. 

The sum of the field widths in Lw should equal the width of the record. The result of the function is a list of lists, one list arising from each field.

In [63]:
("JFS D";4 8 10 7 10) 0: `:data/fw1.txt  / long 4;float 8;symbol 10;skip 7 chars;date 10

1001       1002       1003      
98         42.001     44.123    
ABCDEF1234 GHUJKL0123 nopqrs9876
2015.01.01 2015.01.02 2015.01.03


In [64]:
flip `c1`c2`c3`c4!("JFS D";4 8 10 7 10) 0: `:data/fw1.txt

c1   c2     c3         c4        
---------------------------------
1001 98     ABCDEF1234 2015.01.01
1002 42.001 GHUJKL0123 2015.01.02
1003 44.123 nopqrs9876 2015.01.03


Also note that it is possible to **parse a list of strings using the same format**, since they represent text records in memory.

In [65]:
fixed: read0 `:data/fw1.txt
("JFS D";4 8 10 7 10) 0: fixed

1001       1002       1003      
98         42.001     44.123    
ABCDEF1234 GHUJKL0123 nopqrs9876
2015.01.01 2015.01.02 2015.01.03


##### Offset

The more general form for the right operand f for 0: is,

**(hfile;i;n)**

where 
      - hfile is a symbolic file handle, 
      - i is the offset into the file to begin reading and 
      - n is the number of bytes to read. 

This is useful for sampling a file or for large files that cannot be read into memory in a single gulp.

#### 11.5.2 Variable Length Records ( Delimited)

The dyadic form of 0: and 1: for reading variable length, delimited files is

(Lt;D) 0:f

(Lt;D) 1:f

The left operand is a list comprising two lists. 
Lt is a simple list of char containing one type letter per corresponding field. 
D is either a char representing the delimiting character or an enlisted char.

##### No header

**Parsing with a delimiter char "," results in a list of column lists.** As with parsing fixed format recodes, it is easy to make the result into a table.

In [66]:
flip `c1`c2`c3!("JSF"; ",") 0: read0 `:data/Simple.csv

c1   c2          c3    
-----------------------
1001 DBT12345678 98.6  
1002 EQT98765432 24.75 
1004 CCR00000001 121.23


Observe that it is possible to retrieve the second field as a **string instead of a symbol** using "*" as the data type specifier,

In [67]:
("J*F"; ",") 0: read0 `:data/Simple.csv

1001          1002          1004         
"DBT12345678" "EQT98765432" "CCR00000001"
98.6          24.75         121.23       


##### With header

**Reading with an enlisted "," delimiter results in a table.**

In [68]:
("JSF"; enlist ",") 0: `:data/Titles.csv

id   ticker      price 
-----------------------
1001 DBT12345678 98.6  
1002 EQT98765432 24.7  
1004 CCR00000001 121.23


#### 11.5.3 Key-Value Records

The **operator 0:** can also be used to process text representing key-value pairs. 

In this situation, the left operand is a three-character string Pf that specifies the pair format. 

- The first char of Pf can be "S" to indicate the key is a string or "I" to indicate the key is an integer. 
- The second char indicates the key-value separator. 
- The third char indicates the pair delimiter.

In [69]:
"S=;" 0: "one=1;two=2;three=3"

one  two  three
,"1" ,"2" ,"3" 


In [70]:
"S:/" 0: "one:1/two:2/three:3"

one  two  three
,"1" ,"2" ,"3" 


In [71]:
"I=;" 0: "1=one;2=two;3=three"

1     2     3      
"one" "two" "three"


To make a table:

In [72]:
flip `k`v!"I=;" 0: "1=one;2=two;3=three"

k v      
---------
1 "one"  
2 "two"  
3 "three"


### 11.6 Interprocess Communication

Server started with

q -p 5042

#### 11.6.1 Communication Handle

Symbolic **communication handles** look similar to file handles but they specify resources on the network. A communication handle has the form,

**`:[server]:port**

server can be name of host, IP, URL

In [73]:
`::5042 / refers to the same machine
`:localhost:5042

`::5042


`:localhost:5042


#### 11.6.2 Opening a Connection Handle

As with a file handle, 
- apply hopen to a communication handle to obtain an open connection handle that is used as a function. 
- As before, the value is an int that is traditionally stored in the variable h. 
- Also as with file I/O, the behavior of this function differs between using the original positive handle or its negation.

In [74]:
h:hopen `::5042
h "a:6*7"
h "a"
hclose h

42


On the server you can run 

q) a

42

#### 11.6.3 Remote Execution

When you open a connection to a q process, you have the full capability of that process available remotely. Apply the connection handle to any q expression in a string and it will be evaluated on the server.

Allowing quoted q strings to be executed on a server makes the server susceptible to all manner of breaches. Good practice does not permit this on a production server. You can mitigate this by having your server process accept only requests whose first item is a symbol (see below), which you should verify is the name of a function you have decided to expose.

**An alternative format for remote execution is to apply the connection handler to a list of the form**

**(f;arg1;arg2;...)**

Here **f is a client-side expression** that evaluates to a map that will be applied on the server. It can be:

- The value of, or variable associated to, a map on the client
- The symbolic name of a map on the server.

We use the term map here to be any q expression that can be evaluated as function application – e.g., 
- a list on an index, 
- a dictionary on a key or 
- a function on an argument. Most commonly f is a function

The remaining items **arg1, arg2, …** are optional values sent along to the server for the evaluation. These are arguments when f is a function, indices when it is a list, or keys when it is a dictionary.

Application of the connection handle to such a list sends the list to the server where it is evaluated. Any result is sent back to the client, where it is presented as the result of the connection handle application. By simply applying the naked handle, this sequence of steps is **synchronous**, meaning that execution of the q session on the client blocks until the result of the server evaluation is returned.

##### Sending function definition and arguments to server for execution

We first consider the first case when **f is a map on the client side**. In this situation the function (list, dictionary, etc.) is actually transported to the server along with the supplied arguments, where it is applied.

In [75]:
h:hopen`::5042 / client

In [76]:
h ({x*y};4;5)

20


In [77]:
f:{x*y}
h (f; 6; 7)

42


**Note 1.**  global variables referred to in the transported function will need to be present remotely in the exact contexts in effect when the function was defined. This can be avoided by restricting f to be a pure function that does not refer to any global entities.

**Note 2.** Allowing a function to be sent to the server for remote execution is as dangerous as sending quoted q strings. The function can access resources on the server and instigate an attack. Good practice does not permit this in production environments.

##### Executing function existing on the server

The function to be executed remotely must already be defined on the server and you pass its name and arguments via the connection handle.

On the server,


q)g:{x*y} / server



In [78]:
h (`g; 6; 7) / client

42


Now consider the case when the remote function performs an operation on a table and returns the result. This is the q analogue of a **remote stored procedure**.

Server:

q)t:([] c1:`a`b`c; c2:1 2 3) / server

q)f:{[x] select c2 from t where c1=x}

In [79]:
h (`f; `b) / client

c2
--
2 


The difference from SQL stored procedures is that the remote procedure can be any q function on the server, making the full power of q available remotely.



#### 11.6.4 Synchronous and Asynchronous Messages

Use the **negation of the open connection handle** to send an asynchronous message to the server. 

Server:

q)sq:{0N!x*x} / server

In [80]:
neg[h] (`sq; 5) / client. 25 displayed on the server console

**Note.** When sending asynchronous messages, always send an empty “chaser” message immediately before applying hclose to the open handle. If you do not do this, buffered messages may not be sent when the connection is closed.

#### 11.6.5 Processing Messages

Assuming that you have passed the server either a function from the client side or the name of a function on the server side, the appropriate function is evaluated on the server. During evaluation, **the communication handle of the remote process is available in the system variable .z.w ( “who” called). For an asynchronous call, this can be used to send messages back to the server during the function application on the server.**

Both the client and the server have connection handles when a connection between them is opened. However, these handles are assigned independently and their int values are not equal in general.

Here is a simple example showing how to use **.z.w** to send a message back to the client. On the server, we define a function that displays its received parameter and then asynchronously calls mycallback with the passed argument incremented.

Server

f:{show "Received ",string x; neg[.z.w] (`mycallback; x+1)}

In [81]:
mycallback:{show "Returned ",string x;}
neg[h] (`f; 42)