In [16]:
%%sql
drop database time_series;

Working with Timeseries data 

In [19]:
%%sql
create database time_series;

use time_series;

In [23]:
%%sql
CREATE TABLE tick(
 ts datetime(6) ,
 symbol varchar(5),
 price numeric(18,4));

Inserting timeseries data into the table

In [27]:
%%sql
INSERT INTO tick VALUES
('2020-02-18 10:55:36.179760', 'ABC', 100.00),
('2020-02-18 10:57:26.179761', 'ABC', 101.00),
('2020-02-18 10:59:16.178763', 'ABC', 102.50),
('2020-02-18 11:00:56.179769', 'ABC', 102.00),
('2020-02-18 11:01:37.179769', 'ABC', 103.00),
('2020-02-18 11:02:46.179769', 'ABC', 103.00),
('2020-02-18 11:02:59.179769', 'ABC', 102.60),
('2020-02-18 11:02:46.179769', 'XYZ', 103.00),
('2020-02-18 11:02:59.179769', 'XYZ', 102.60),
('2020-02-18 11:03:59.179769', 'XYZ', 102.50);

Query to fetch high , low , max , min and volume for particular tick without using any window function 

In [30]:
%%sql
WITH ranked AS
(SELECT symbol,
   RANK() OVER w as r,
   MIN(price) OVER w as min_pr,
   MAX(price) OVER w as max_pr,
   FIRST_VALUE(price) OVER w as first,
   LAST_VALUE(price) OVER w as last,
   from_unixtime(unix_timestamp(ts) div (60*60) * (60*60)) as ts
   FROM tick
   WINDOW w AS (PARTITION BY symbol,
              from_unixtime(unix_timestamp(ts) div (60*60) * (60*60))
              ORDER BY ts
              ROWS BETWEEN UNBOUNDED PRECEDING
              AND UNBOUNDED FOLLOWING))


SELECT ts, symbol, min_pr, max_pr, first, last
FROM ranked
WHERE r = 1
ORDER BY symbol, ts;


ts,symbol,min_pr,max_pr,first,last
2020-02-18 10:00:00,ABC,100.0,102.5,100.0,102.5
2020-02-18 11:00:00,ABC,102.0,103.0,102.0,102.6
2020-02-18 11:00:00,XYZ,102.5,103.0,103.0,102.5


Using First function

Syntax : FIRST ( value [,time]);

In [33]:
%%sql
SELECT first(price,ts) FROM tick;

"first(price,ts)"
100.0


Using Last function 

Syntax : LAST ( value [,time]);

In [34]:
%%sql
SELECT last(price,ts) from tick;


"last(price,ts)"
102.5


Time Bucket function : 

Syntax : TIME_BUCKET( bucket_width [,time [,origin]]);

The time bucket function organizes a timeseries column into specified intervals, allowing for targeted operations within each interval.

In [36]:
%%sql
SELECT time_bucket('2m',ts) as ts, symbol, min(price) as min_pr,
   max(price) as max_pr, first(price,ts) as first, last(price,ts) as last
FROM tick
group by 2, 1
order by 2, 1;

ts,symbol,min_pr,max_pr,first,last
2020-02-18 10:54:00,ABC,100.0,100.0,100.0,100.0
2020-02-18 10:56:00,ABC,101.0,101.0,101.0,101.0
2020-02-18 10:58:00,ABC,102.5,102.5,102.5,102.5
2020-02-18 11:00:00,ABC,102.0,103.0,102.0,103.0
2020-02-18 11:02:00,ABC,102.6,103.0,103.0,102.6
2020-02-18 11:02:00,XYZ,102.5,103.0,103.0,102.5


In [None]:
%%sql
drop table tick;

Interpolation in Singlestore 

In [38]:
CREATE TABLE tick (ts datetime(6), symbol varchar(5),
  price numeric(18,4));


SyntaxError: invalid syntax (3209436851.py, line 1)

We have a table tick and lets assume the data is inserted every sec into the tick table 

In [None]:
INSERT INTO tick VALUES
 ('2019-02-18 10:55:36.000000', 'ABC', 100.00),
 ('2019-02-18 10:55:37.000000', 'ABC', 102.00),
 ('2019-02-18 10:55:40.000000', 'ABC', 103.00),
 ('2019-02-18 10:55:42.000000', 'ABC', 104.00);

select * from tick;


As observed in the tick table , the value for 38 , 39 and 41 st second is missing . 
This is a regular issue obseverd in any timeseries data .

In [None]:
Below is the procedure which has the code for linear interpolation . 

The first one, driver(), retrieves data from a table named tick and then calls another procedure named interpolate_ts() to perform an interpolation on the time series data fetched.

The interpolate_ts() procedure takes a sorted query result as input, collects the data into an array, and processes it by interpolating timestamps where necessary to ensure a continuous time series.

It checks for the sorted nature of the time series and performs operations to fill in missing timestamps with interpolated prices. If the time series is not sorted or if there are duplicate timestamps, it raises exceptions accordingly.

In [None]:
DELIMITER //
CREATE OR REPLACE PROCEDURE driver() AS
DECLARE
 q query(ts datetime(6), symbol varchar(5), price numeric(18,4));
BEGIN
 q = SELECT ts, symbol, price FROM tick ORDER BY ts;

ECHO SELECT 'Input time series' AS message;
 ECHO SELECT * FROM q ORDER BY ts;
 ECHO SELECT 'Interpolated time series' AS message;
 CALL interpolate_ts(q);
END //
DELIMITER ;


DELIMITER //
CREATE OR REPLACE PROCEDURE interpolate_ts(
 q query(ts datetime(6), symbol varchar(5), price numeric(18,4)))
   -- Important: q must produce sorted output by ts
AS
DECLARE
 c array(record(ts datetime(6), symbol varchar(5), price numeric(18,4)));
 r record(ts datetime(6), symbol varchar(5), price numeric(18,4));
 r_next record(ts datetime(6), symbol varchar(5), price numeric(18,4));
 n int;
 i int;
 _ts datetime(6); _symbol varchar(5); _price numeric(18,4);
 time_diff int;
 delta numeric(18,4);
BEGIN
 DROP TABLE IF EXISTS tmp;
 CREATE TEMPORARY TABLE tmp LIKE tick;
 c = collect(q);
 n = length(c);
 IF n < 2 THEN
   ECHO SELECT * FROM q ORDER BY ts;
   return;
 END IF;


 i = 0;
 r = c[i];
 r_next = c[i + 1];


 WHILE (i < n) LOOP
   -- IF at last row THEN output it and exit
   IF i = n - 1 THEN
     _ts = r.ts; _symbol = r.symbol; _price = r.price;
     INSERT INTO tmp VALUES(_ts, _symbol, _price);
     i += 1;
     CONTINUE;
   END IF;


   time_diff = unix_timestamp(r_next.ts) - unix_timestamp(r.ts);


   IF time_diff <= 0 THEN
     RAISE user_exception("time series not sorted or has duplicate timestamps");
   END IF;


   -- output r
   _ts = r.ts; _symbol = r.symbol; _price = r.price;
   INSERT INTO tmp VALUES(_ts, _symbol, _price);


   IF time_diff = 1 THEN
     r = r_next; -- advance to next row
   ELSIF time_diff > 1 THEN
     -- output time_diff-1 rows by extending current row and interpolating price
     delta = (r_next.price - r.price) / time_diff;
     FOR j in 1..time_diff-1 LOOP
       _ts += 1; _price += delta;
       INSERT INTO tmp VALUES(_ts, _symbol, _price);
     END LOOP;
     r = r_next; -- advance to next row
   ELSE
     RAISE user_exception("time series not sorted");
   END IF;


   i += 1;
   IF i < n - 1 THEN r_next = c[i + 1]; END IF;
 END LOOP;
 ECHO SELECT * FROM tmp ORDER BY ts;
 DROP TABLE tmp;
END //
DELIMITER ;


In [None]:
%%sql
call driver();

select * from tick;

In [None]:
You can observe the interpolated value for 38 , 39 and 41st second in the tick table abo