# Row vs Column Storage

In InterSystems IRIS®, a relational table, such as the one shown here, is a logical abstraction. It does not reflect the underlying physical storage layout of the data.

![image info](https://raw.githubusercontent.com/grongierisc/iris-devslam/master/misc/img/table_abstraction.jpg)

## How data can actually be stored

Il y a plusieurs possibilité pour stocké de la donnée sous forme de table.
Nous allons en voir quatre type.
Ces quatres types peuvent etre classée en deux catégories : en ligne, en colonne.

### Row storage

### Column storage

## Import utils function

In [1]:
from utilsrowcolumn import * 

# init list of different kind of table used in this demo

We will be using :

* Demo.BankTransactionRow
  * A Table that store data in row
* Demo.BankTransactionColumn
  * A Table that store data in column
* Demo.BankTransactionIndex
  * A Table that store data in row but with an index in column
* Demo.BankTransactionMix
  * A Table that store data in row and in column

In [2]:
list_tables = ["Demo.BankTransactionRow", "Demo.BankTransactionColumn","Demo.BankTransactionIndex","Demo.BankTransactionMix" ]


## Let's start

We will drop tables if they exists

In [3]:
# init drop table if exists
print("init drop table if exists")
for table in list_tables:
    benchmark_sql_query("DROP TABLE IF EXISTS %s" % table)

# drop table description
print("drop table description")
benchmark_sql_query("DROP TABLE IF EXISTS Demo.BankTransactionDescription")

init drop table if exists
number of rows : 0
DROP TABLE IF EXISTS Demo.BankTransactionRow in 2.738652467727661
number of rows : 0
DROP TABLE IF EXISTS Demo.BankTransactionColumn in 1.2384445667266846
number of rows : 0
DROP TABLE IF EXISTS Demo.BankTransactionIndex in 1.2114002704620361
number of rows : 0
DROP TABLE IF EXISTS Demo.BankTransactionMix in 1.3700847625732422
drop table description
number of rows : 0
DROP TABLE IF EXISTS Demo.BankTransactionDescription in 0.03394770622253418


## Create the row storage

Not much to see here.
Basic DDL Statement.

In [4]:
sql_row = """
CREATE TABLE Demo.BankTransactionRow (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2),
  Type VARCHAR(10)
)
"""
benchmark_sql_query(sql_row)

number of rows : 0

CREATE TABLE Demo.BankTransactionRow (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2),
  Type VARCHAR(10)
)
 in 0.21885895729064941


## Create the indexed row table

Same SQL statement that above, but a new index with the tag column.

In [5]:
# index column storage
sql_index = """
CREATE TABLE Demo.BankTransactionIndex (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2),
  Type VARCHAR(10)
)
"""
benchmark_sql_query(sql_index)
# Create the index
benchmark_sql_query("""CREATE COLUMNAR INDEX AmountIndex
ON Demo.BankTransactionIndex(Amount)""")

number of rows : 0

CREATE TABLE Demo.BankTransactionIndex (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2),
  Type VARCHAR(10)
)
 in 0.1506037712097168
number of rows : 0
CREATE COLUMNAR INDEX AmountIndex
ON Demo.BankTransactionIndex(Amount) in 0.14767193794250488


## Create the column storage

Pay attention to the tag : WITH STORAGETYPE = COLUMNAR

In [6]:
# column storage
sql_column = """
CREATE TABLE Demo.BankTransactionColumn (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2),
  Type VARCHAR(10)
)
WITH STORAGETYPE = COLUMNAR
"""
benchmark_sql_query(sql_column)

number of rows : 0

CREATE TABLE Demo.BankTransactionColumn (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2),
  Type VARCHAR(10)
)
WITH STORAGETYPE = COLUMNAR
 in 0.15253281593322754


## Finaly the mixed storage

Pay attention to the Amount column

In [7]:
# mix storage
sql_mix = """
CREATE TABLE Demo.BankTransactionMix (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2) WITH STORAGETYPE = COLUMNAR,
  Type VARCHAR(10)
)
"""
benchmark_sql_query(sql_mix)

number of rows : 0

CREATE TABLE Demo.BankTransactionMix (
  AccountNumber INTEGER,
  TransactionDate DATE,
  Description VARCHAR(100),
  Amount NUMERIC(10,2) WITH STORAGETYPE = COLUMNAR,
  Type VARCHAR(10)
)
 in 0.1300206184387207


## Now we have to insert data in those table

We will be doing it in 2 setps.
First generate 100 000 syntetic datas.
Second duplicate this syntetic data with the speed of light

In [8]:
print("create data")
data = create_n_fake_data(100000,list_tables)


create data
99.9%
create 400000 fake data in 61.52186441421509 seconds, number of rows per second : 6501.7535441851305


# Let reatch the speed of light

Here we will be using the underlying nature of IRIS, globals.
We will be copying with low level code the 64000 first datas of each table 100 times.

In [10]:
x = 100
print(f"\ninsert {x}*64 000 rows")
start = time.time()
for i in range(x):
    print(f"\r{i/x*100:.1f}%", end='')
    for table in list_tables:
        add_64000_rows(table)
end = time.time()
print(f"\ninsert {x*64000} rows in {end - start} row per second : {64000*x/(end - start)}")



insert 100*64 000 rows
99.0%insert 6400000 rows in 41.20470690727234 row per second : 155322.06100634695


## build the indexes

In [11]:
# build index
for table in list_tables:
    benchmark_sql_query(f"build index for table {table}")
# tune table
print("tune table")
for table in list_tables:
    benchmark_sql_query("TUNE TABLE %s" % table)


number of rows : 0
build index for table Demo.BankTransactionRow in 4.890666246414185
number of rows : 0
build index for table Demo.BankTransactionColumn in 5.194094896316528
number of rows : 0
build index for table Demo.BankTransactionIndex in 6.571071147918701
number of rows : 0
build index for table Demo.BankTransactionMix in 4.4333226680755615
tune table
number of rows : 0
TUNE TABLE Demo.BankTransactionRow in 1.000699520111084
number of rows : 0
TUNE TABLE Demo.BankTransactionColumn in 1.4356026649475098
number of rows : 0
TUNE TABLE Demo.BankTransactionIndex in 0.8562057018280029
number of rows : 0
TUNE TABLE Demo.BankTransactionMix in 0.9843282699584961


## Create an desciption table to showcase SQL join

In [13]:
# create a description table of debit and credit
print("create a description table of debit and credit")
benchmark_sql_query("""
CREATE TABLE Demo.BankTransactionDescription (
    Description VARCHAR(100),
    Type VARCHAR(10)
)
"""
)

# insert data in description table
print("insert data in description table")
benchmark_sql_query("""
INSERT INTO Demo.BankTransactionDescription
values
('Salary','credit')
""")
benchmark_sql_query("""
INSERT INTO Demo.BankTransactionDescription
values
('Rent','debit')
"""
)

create a description table of debit and credit
number of rows : 0

CREATE TABLE Demo.BankTransactionDescription (
    Description VARCHAR(100),
    Type VARCHAR(10)
)
 in 0.13489317893981934
insert data in description table
number of rows : 0

INSERT INTO Demo.BankTransactionDescription
values
('Salary','credit')
 in 0.010725259780883789
number of rows : 0

INSERT INTO Demo.BankTransactionDescription
values
('Rent','debit')
 in 0.0003070831298828125


## Summerias

In less that 3 minutes we have built a data set of **1 billion lines** :)

# Now start the demo

## let count the row in tables

In [14]:
# query count data
print("query count data")
for table in list_tables:
    print_sql_query(f"SELECT COUNT(*) FROM {table}")


query count data
SELECT COUNT(*) FROM Demo.BankTransactionRow :
[12900000]
SELECT COUNT(*) FROM Demo.BankTransactionColumn :
[12900000]
SELECT COUNT(*) FROM Demo.BankTransactionIndex :
[12900000]
SELECT COUNT(*) FROM Demo.BankTransactionMix :
[12900000]


## Query the top 100 000 datas for each tables

In [15]:
# query data
print("query data")
for table in list_tables:
    benchmark_sql_query("SELECT TOP 100000 * FROM %s " % table)


query data
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionRow  in 1.9589028358459473
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionColumn  in 1.878265142440796
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionIndex  in 1.820725917816162
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionMix  in 1.7693462371826172


## Let try aggregation

In [16]:
# benchmark aggregation
print("benchmark aggregation")
for table in list_tables:
    benchmark_sql_query("SELECT AVG(ABS(Amount)) FROM %s " % table)


benchmark aggregation
number of rows : 1
SELECT AVG(ABS(Amount)) FROM Demo.BankTransactionRow  in 1.196108102798462
number of rows : 1
SELECT AVG(ABS(Amount)) FROM Demo.BankTransactionColumn  in 1.7184019088745117
number of rows : 1
SELECT AVG(ABS(Amount)) FROM Demo.BankTransactionIndex  in 0.5890133380889893
number of rows : 1
SELECT AVG(ABS(Amount)) FROM Demo.BankTransactionMix  in 0.6251823902130127


## Show case SQL join

In [17]:
# benchmark join
print("benchmark join")
for table in list_tables:
    benchmark_sql_query("""SELECT TOP 100000 * FROM %s t1 
        JOIN Demo.BankTransactionDescription t2 ON t1.Type = t2.Type""" % table)


benchmark join
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionRow t1 
        JOIN Demo.BankTransactionDescription t2 ON t1.Type = t2.Type in 2.2476725578308105
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionColumn t1 
        JOIN Demo.BankTransactionDescription t2 ON t1.Type = t2.Type in 1.9205644130706787
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionIndex t1 
        JOIN Demo.BankTransactionDescription t2 ON t1.Type = t2.Type in 1.942474603652954
number of rows : 100000
SELECT TOP 100000 * FROM Demo.BankTransactionMix t1 
        JOIN Demo.BankTransactionDescription t2 ON t1.Type = t2.Type in 2.1324243545532227


## Bench Insert

In [18]:
# benchmark insert
print("benchmark insert")
for table in list_tables:
    start = time.time()
    print(f"for table {table}")
    create_n_fake_data(10000,[table])
    end = time.time()

benchmark insert
for table Demo.BankTransactionRow
99.9%
create 10000 fake data in 1.3060083389282227 seconds, number of rows per second : 7656.918950614444
for table Demo.BankTransactionColumn
99.9%
create 10000 fake data in 2.790039300918579 seconds, number of rows per second : 3584.178902679847
for table Demo.BankTransactionIndex
99.9%
create 10000 fake data in 1.7931938171386719 seconds, number of rows per second : 5576.642025208744
for table Demo.BankTransactionMix
99.9%
create 10000 fake data in 1.782226800918579 seconds, number of rows per second : 5610.958153499819


## Check table size

In [19]:
# table size
print("table size")
for table in list_tables:
    print_sql_query("SELECT * FROM bdb_sql.TableSize('%s')" % table)


table size
SELECT * FROM bdb_sql.TableSize('Demo.BankTransactionRow') :
['total', '518.07', '463.06']
SELECT * FROM bdb_sql.TableSize('Demo.BankTransactionColumn') :
['total', '294.07', '280.06']
SELECT * FROM bdb_sql.TableSize('Demo.BankTransactionIndex') :
['total', '570.07', '514.06']
SELECT * FROM bdb_sql.TableSize('Demo.BankTransactionMix') :
['total', '485.07', '438.06']
