<h1>Requirements for today</h1>
1. Thinking cap
2. Downloading the data files

# Introduction to kdb+ - Day One

## Contents

1.	Data types 
2.	Lists
3.	Tables 
4.	Reading from a csv 
5.	qSQL 
6.	embedPy (optional)

## 1 Data types

There are q data types to cover almost all of the potential uses you may require however a few are used more frequently. The data type of any object in kdb+ can be determined using the type function. This will return a number which corresponds to the data type in the table below. 
https://code.kx.com/q/ref/datatypes/

In [1]:
/show datatypes:flip{(x;.Q.t x;key'[x$\:()])}5h$where" "<>20#.Q.t

type 42     /7 is a long.
type 42 42  /list of longs. 
type 42.0   /9 is a float
type 42.0 42.0

-7h


7h


-9h


9h


### Long

The most common data type used for integer values in q is long. The long data type is a <b>64-bit signed integer</b> and is represented by the letter j. When we type a number in q without specifically indicating its type, it will automatically be created as a long. 

In [2]:
42

type 42

42


-7h


The minimum and maximum numbers that can be represented in a long data type are -9223372036854775806 and 9223372036854775806 respectively. Null long types are represented by 0Nj or 0N, and positive or negative infinity are represented by 0Wj and -0Wj or 0W and -0W.

### Float

The most common data type used for representing floating point numbers are floats. This is an 8-byte value conforming to the IEEE floating-point specification. Floats in q are represented by the letter f.

In [3]:
42.0

type 42.0

type 40+2.5

type 40%2

42f


-9h


-9h


-9h


Null float types are represented by 0Nf or 0n, and positive or negative infinity are represented by 0Wf and -0Wf or 0w and -0w.

### Boolean

Boolean values in q are stored in a single byte and are represented by the letter b. One way to generate a boolean value is to test for equality.

In [19]:
42=40
/Right to left evaluation
42=40+2

0b


1b


### Date

The date data type is regularly used in q and is represented by the letter d. Under the covers dates are all integer values and are the number of days since the millenium, positive numbers for post and negative numbers for pre. This means that we can add days to a date using simple arithmetic.



In [4]:
type 2000.01.01

"i"$2000.01.01

"i"$1999.12.31

"i"$2000.01.02

2000.01.01+5

-14h


0i


-1i


1i


2000.01.06


### Timespan

Similarly to date data types, timespans are also represented by integers under the covers, showing the number of nanoseconds since midnight. They are represented by the letter n.

In [6]:
type 12:00:00.000000000

"j"$12:00:00.000000000
"j"$00:00:00.000000000

-16h


43200000000000


0


### Timestamp

Timestamps are the concatenation of dates and timespans and can be created by entering the value into the console literally, or by adding a timespan to a date. They are represented by the letter p.

In [5]:
type 2018.01.01D12:00:00.000000000

2018.01.01+12:00:00.000000000

/Capitalised is your device representation, lower-cased is GMT
.z.P
.z.p
.z.T
.z.t
.z.D
.z.d

-12h


2018.01.01D12:00:00.000000000


2019.04.29D11:21:16.044805000


2019.04.29D03:21:16.044860000


11:21:16.044


03:21:16.044


2019.04.29


2019.04.29


### Symbol

There are 2 data types in q for handling textual data. Here we will look at the symbol data type, represented by the letter s. Symbols are akin to VARCHAR or string data types in other languages, however strings are very different in q. While they represented a series of letters, symbols in q are viewed as a single item and are highly optimised for performance. They have different behaviour when they are saved into kdb+ compared to strings. Symbols are defined by using back quote (back tick in kdb speak) before a series of letters.

In [34]:
type `ABC

type `A`B`C
/character
type "A"
/string=list of characters
type "ABC"

-11h


11h


-10h


10h


In [41]:
//casting datatypes using $
`$string `ABC`DEF  //to convert to string, use "string" function
string `ABC`DEF
("ABC";"DEF")
(1 2 3 4 5;til 5)

`ABC`DEF


"ABC"
"DEF"


"ABC"
"DEF"


1 2 3 4 5
0 1 2 3 4


### Practice questions

1)	Get the type of each of the following: (Done!)<br>
> a.	234 <br>
b.	234f <br>
c.	2010.05.10 <br>
d.	12342342332i – what is wrong? How can we fix this? <br>
e.	2010.05 – how can we force q to interpret this as a month? <br>
f.	type 500.05 – what is the type of the output of the type command? <br>
g.	3 1 2 <br>
h.	3 1 2f <br>
> i.	(3f;1;2j) <br>

2)	Cast 16:30:00 to int/float/minute/second <br>
3)	Cast the string “12:30:00” to time/minute/second <br>
4)	Cast 123.23 to real <br>
5)	Extract the month, week and year from the date 2015.06.03 <br>
6)	Get the current time/date/minute/second/hour <br>
7)	Cast 1234 to a string <br>

In [45]:
2 xexp 32

12342342332

4.294967e+009


12342342332


In [47]:
"m"$2010.05.01
2010.05m

2010.05m


2010.05m


## 2 Lists

There are two different types of lists in q. Firstly we will look at simple lists or vectors which consist of a series of atoms of the same type. E.g.

In [1]:
//use  :  to assign to a variable

L:10 20 30            // simple list of longs
F:3.4 5.0 3.1         // simple list of floats
I:3 4 5 6i            // simple list of ints
H:100 200 300h        // simple list of shorts
str:"hello there"     // simple list of chars (also called a string)
s:`JPM`GE`IBM         // simple list of symbols

When we take the type of a simple list, we get a positive number, which represents the type of all the atoms in that list. This differs from taking the type of an atom which returns a negative number:

In [2]:
type L

type 10

type s

type `JPM

7h


-7h


11h


-11h


In q we also have general lists, which consists of a list of different types, or items which may also be lists:

In [3]:
person:(`John;32;`USA)
matrix:(1 2 3;4 5 6)

A general list has type 0h

In [54]:
type person

type matrix

0h


0h


In the case of person, each item is an atom whereas each item in matrix is a simple list

In [4]:
person
person[0]
person 0
type person 0                   // a symbol

type person 1                   // a long

type matrix 0   // a simple list of longs
matrix
matrix[0 1]
type matrix 1                   // a simple list of longs

a:13
type enlist 13

`John
32
`USA


`John


`John


-11h


-7h


7h


1 2 3
4 5 6


1 2 3
4 5 6


7h


7h


In [5]:
matrix1:(1 2 3;4 5 6;til 5;5+til 3)
matrix1
matrix1[0]

1 2 3
4 5 6
0 1 2 3 4
5 6 7


1 2 3


For one dimensional lists one or more items may be extracted from a simple list by typing the index or indices you are looking for.

For a list with N items, the indices range from 0 up to N-1

In [8]:
L:10 20 30 40 50

count L

L 0           // obtain first item

L 4           // obtain last item

L 0 1 2       // first 3 items

L 0 0 0       // you can repeat an index if you wish

L 4 2         // or specify any incides you like

5


10


50


10 20 30


10 10 10


50 30


You may also use square brackets when indexing the list

In [9]:
L[0]

L[3 2 1]

10


40 30 20


A simple list is deemed to have a depth of 1 since its items are atoms. A general list that itself contains simple lists is said to have a depth of 2

In [64]:
G:(10 20 30;100 200 300)

G[0;0]            // returns the first item of the first item

G[0;0 1]          // returns the first two items of the first item

G[0 1;0 1]        // returns the first two items of the first two items
H:(10 20 30;100 200 300;1000 2000 3000)
H[0 2;1 2]

10


10 20


10  20 
100 200


20   30  
2000 3000


When indexing at depth, there is a way of saying you want all items at that level without having to know how many items are in the list

In [6]:
M:(1 2 3 4;10 20 30 40;100 200 300 400)       // create a 3 x 4 matrix (a general list of depth 2)
M[0 1 2;0]                                    // give me the first item in each item

M[;0]                                         // same as above, leaving the first empty basically means 'give me everything at that level'
M

1 10 100


1 10 100


1   2   3   4  
10  20  30  40 
100 200 300 400


The q language has some useful built-in functions that can be used to glean information about a list. Below is a list of some of those functions.

### count 

In [71]:
count 10 20 30

count 011b

count "hello"

3


3


5


### first 

In [13]:
first 10 20 30

first 011b

first "hello"

10


0b


"h"


### last 

In [28]:
last 10 20 30

last 011b

last "hello"

30


1b


"o"


### max

In [72]:
max 4 6 2 3

max 3.2 3.7 3.1 3.4

last asc 3.2 3.7 3.1 3.4  //asc asends the list, last takes the last (let's try to time it!)

6


3.7


3.7


In [9]:
-10?10

7 1 3 9 5 0 4 6 8 2


In [78]:
//the ? generates a random list of numbers
\t:10000 max -1000?1000

147


In [79]:
\t:10000 last asc -1000?1000

207


### min 

In [28]:
min 4 6 2 3

min 3.2 3.7 3.1 3.4

2


3.1


### Practice questions

1)	Get “f” from “Vodafone” <i>(Hint: indexing)</i><br>
2)	Create the list d defined as (”abcd”;10 5 0f;(2;33;\`x\`y\`z);”hello”;1 3e) <br> 
3)	Get the 1st, 3rd and 5th elements from d <br>
4)	Get the 1st and 2nd element of each element of d by eliding an index <br>
5)	Replace the 3rd item of d with (“hi”;3.2) <br>
6)	Create a list e containing two strings, "hello" and "world" <br>
7)	Create a list called f containing the following elements: <br>
>a.	The symbols \`ab and \`bc <br>
b.	The number 12 <br>
c.	The list e <br>

8)	Extract the symbol \`bc from f <br>
9)	Extract the string “hello” from f <br>
10)	Extract the character “r” from f <br>


## 3 Tables

Tables form the core of Kx technology. Here we will go through how to create a table and insert data into a table.

### Creating a table

Below is a simple definition of an empty table called trade

In [92]:
([]abc:til 5;bcd:5+til 5;ppp:50,til 4)

//comma is a join operator

abc bcd ppp
-----------
0   5   50 
1   6   0  
2   7   1  
3   8   2  
4   9   3  


In [95]:
trade:([]sym:`$();size:`long$();price:`float$())
trade   // horizontal lines are the signature in the display of a table
meta trade

sym size price
--------------


c    | t f a
-----| -----
sym  | s    
size | j    
price| f    


As you can see, the definition comprises ( ) brackets with <b>empty square brackets</b> at the start. These are important to <b>differentiate between a table and a list</b>.
Instead of creating an empty table, let's create the trade table with initial values in it:

In [96]:
trade:([]sym:`JPM`IBM`BP;size:100 25 54;price:3.45 5.21 6.33)
trade

sym size price
--------------
JPM 100  3.45 
IBM 25   5.21 
BP  54   6.33 


### Inserting into a table

The __insert__ function may be used to append data to the table. The syntax is: <br>
`tableName insert data <br>
It is important to use the table name as opposed to the table itself.

In [97]:
trade:([]sym:`$();size:`long$();price:`float$())
trade

`trade insert(`BP;100;2.44)  // insert returns the index / indices that the data has been inserted into
trade

sym size price
--------------


,0


sym size price
--------------
BP  100  2.44 


Data can be bulk inserted

In [99]:
`trade insert(`IBM`AAPL;200 300;5.53 4.39)  // 2 indices returned      
trade

3 4


sym  size price
---------------
BP   100  2.44 
IBM  200  5.53 
AAPL 300  4.39 
IBM  200  5.53 
AAPL 300  4.39 


A table is a collection of lists of equal length (called columns)

In [100]:
trade `size           // I can extract a column list with lookup notation similar to dictionaries

trade.price           // or dot notation similar to namespaces

trade`sym`size        // I can extract multiple columns

100 200 300 200 300


2.44 5.53 4.39 5.53 4.39


BP  IBM AAPL IBM AAPL
100 200 300  200 300 


### Practice questions

1)	Create the table t1:([] sym:\`a\`b\`c\`d; price:1 2 3 4f) <br>
>a.	Use the insert syntax to insert the symbols \`e->\`g and the prices 5->7. Do it as a bulk insert. <br>
b.	Extract the 3rd row from t1. What is its type? <br>

## 4 Reading from a csv

There are several ways we can read data from a csv using q. Here we will look at 2 methods, read0 and 0:

### read0

read0 returns the lines of the file as a list of strings. Lines are assumed delimited by either LF or CRLF, and the delimiters are removed.

In [10]:
read0 `:data/exercise1.csv // you need to point to the directory where exercise1.csv resides

"sym,price,qty"
"A,1,100"
"B,2,200"
"C,3,300"


### 0:

0: returns the data from a file in table format. We supply a list of types of each of the columns, along with the delimiter. The file is then read into kdb+ with the types we have supplied. 

If we supply the types as “*” the columns will be read in as strings:

In [12]:
("***";csv)0:`:data/exercise1.csv

"sym"   ,"A"  ,"B"  ,"C" 
"price" ,"1"  ,"2"  ,"3" 
"qty"   "100" "200" "300"


Enlisting the delimiter will read the first row in as headers and the subsequent rows as the columns of the table:

In [11]:
("***";enlist csv)0:`:data/exercise1.csv

sym  price qty  
----------------
,"A" ,"1"  "100"
,"B" ,"2"  "200"
,"C" ,"3"  "300"


Finally, we can specifiy the types of each column when reading in:

In [13]:
("SFJ";enlist csv)0:`:data/exercise1.csv

sym price qty
-------------
A   1     100
B   2     200
C   3     300


### Practice questions

1)	The file “trade.csv” is a comma-separated text file with the following fields <br>
>• date (list of dates) <br>
• sym (list of symbols) <br>
• size (list of integers) <br>
• price (list of floating-point values) <br>
• cond (list of characters) <br>

Import this file as a table into a q session, including all columns from the source file. <br>


In [35]:
("DSIFC";enlist csv)0:`:Training/trade.csv

date       sym size price    cond
---------------------------------
2018.01.20 hle 28   256.3122 A   
2018.01.26 nma 17   272.9862 A   
2018.01.27 hbe 11   432.4631 B   
2018.03.03 plb 55   155.7182 I   
2018.02.12 hpj 51   406.6239 B   
2018.01.13 ggm 81   64.59691 I   
2018.02.23 cce 68   73.87733 B   
2018.02.13 jcd 96   137.1135 B   
2018.02.12 cpd 61   281.7527 A   
2018.04.04 pjb 70   441.9115 I   
2018.03.06 epj 70   121.9597 I   
2018.01.11 eka 39   335.9063 A   
2018.02.22 mpn 76   431.9796 A   
2018.01.04 nog 26   421.9904 A   
2018.04.05 hol 91   271.3185 A   
2018.01.09 kkf 83   38.78666 I   
2018.01.01 peg 76   318.7318 I   
2018.02.12 jcg 88   488.0623 B   
2018.02.16 dlh 4    269.8408 B   
2018.02.28 cme 56   358.1429 I   
..


## 5 qSQL

For those of you who have SQL experience, the syntax of writing queries in q will be very similar. First lets define a table we can query:

In [25]:
trade:([]sym:`JPM`GE`JPM`GE`MSFT;size:100 300 200 500 200;price:3.50 4.21 5.44 6.22 5.44;exchange:`N`T`N`N`T)
trade

sym  size price exchange
------------------------
JPM  100  3.5   N       
GE   300  4.21  T       
JPM  200  5.44  N       
GE   500  6.22  N       
MSFT 200  5.44  T       


### Selecting columns

To select columns from a table, simply add the columns in between the select and from words:

In [26]:
select sym from trade                     // select one column

sym 
----
JPM 
GE  
JPM 
GE  
MSFT


In [27]:
select sym,size from trade                // select multiple columns, separated by ,

sym  size
---------
JPM  100 
GE   300 
JPM  200 
GE   500 
MSFT 200 


In [28]:
select sym,sz:size from trade             // give column a different name in result set

sym  sz 
--------
JPM  100
GE   300
JPM  200
GE   500
MSFT 200


### Filtering the dataset

To filter the result set on one or more clauses, we can use the where clause

In [29]:
select from trade where sym=`JPM

select from trade where sym in`JPM`MSFT

select from trade where sym in`JPM`MSFT,size=200

sym size price exchange
-----------------------
JPM 100  3.5   N       
JPM 200  5.44  N       


sym  size price exchange
------------------------
JPM  100  3.5   N       
JPM  200  5.44  N       
MSFT 200  5.44  T       


sym  size price exchange
------------------------
JPM  200  5.44  N       
MSFT 200  5.44  T       


The columns of a table are lists, and we can perform operations on them like we can any list

In [30]:
select sum size from trade

select max size from trade

select min price from trade

select total:size*price from trade

size
----
1300


size
----
500 


price
-----
3.5  


total
-----
350  
1263 
1088 
3110 
1088 


### The by clause

Instead of operating over the full column, it may be desirable to perform an operation for each unique item in another column or columns. 

In [31]:
q)select max size by sym from trade                         // get maximum size per sym

q)select high:max size,low:min size by sym from trade       // multiple columns returned, giving each column a custom name

q)select avg price by sym,exchange from trade               // use multiple columns in by clause

sym | size
----| ----
GE  | 500 
JPM | 200 
MSFT| 200 


sym | high low
----| --------
GE  | 500  300
JPM | 200  100
MSFT| 200  200


sym  exchange| price
-------------| -----
GE   N       | 6.22 
GE   T       | 4.21 
JPM  N       | 4.47 
MSFT T       | 5.44 


### Practice questions

1)	In a q session, create a table trade containing date, time, sym, side, price fields. Fill the table with 
10000 random records with the following constraints: <br>
>- 3 different dates (e.g. today, yesterday, day before yesterday) 
- random times (e.g. between 00:00:00.000 and now) 
- 5 different syms: \`VOD.L\`BMW.DE\`AAA.L\`FDP.L\`GOOG.NY 
- 2 sides:\`B\`S 
- price between 0 and 200 
<br>

a. Select from trade all buy side trades (\`B) made by \`AAA.L on one particular date. <br>
b. Repeat 2 applying the where constraints in the order side, sym, date. How long does it take to do this 1000 times? Repeat in the order date, sym, side. How long does this take? <br>
c. Select all trades made today with a price between 100 and 110 <br>
d. Generate a count of the number of trades each sym made with the same conditions as 4 <br>
e. On the third day, \`AAA.L was renamed to \`BBB.L. Update trade to reflect this change <br>
f. Delete all trades where the price was greater than 190 (keyword ‘delete’) <br>
g. Get a list of syms, that traded on the last 2 days, that end in .L (keyword ‘like’) <br>
h. By sym, calculate the maximum, minimum and average price as well as the spread (max-min) of trades made in the first minute of the second day on the buy side only. <br>

## 6 embedPy

Once you have installed embedPy, it is very simple to run python commands from within your q terminal. For installation instructions and more information on the use of embedPy see: https://code.kx.com/q/ml/embedpy/

### Running Python commands

The interface allows execution of Python code directly in a q console or from a script. In both console and scripts, prefix Python code with *p)*

In [1]:
p)print(1+2)

3


Q scripts (but not the console) can load and execute multiline Python code. Prefix the first line of the code with *p)* and indent subsequent lines of Python code according to the usual Python indentation rules.

In [3]:
\l Training/embedPytest.q
p)print(add1(12))

13


Full scripts of Python code can be executed in q, using the *.p* file extension (not .py). The script is loaded as usual.

In [8]:
\l Training/helloq.p

Hello q!


In [30]:
/%python
print(1+1)

2


<center><h1> Q&A </h1></center>