Skip to content

mcai4gl2/tss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tss

CircleCI

tss is a simple time series storage on top of Mongodb. It allows user to store pandas DataFrames directly into mongodb under a simple schema. Data stored in Mongodb are in native format. This is to allow other languages to directly interacting the storage to read or modify data.

tss also supports dynamodb as backend in addition to Mongodb. More information can be found at here.

tss uses two collections to store the data in Mongodb:

  • series - which stores the time series meta data and chunks' meta data. The document is defined as follows:
Attribute Data Type Notes
_id ObjectId id for the time series
frequency String the frequency of the data: 1d, 1m and 1s
name String the name of the time series
columns String[] the name of the columns
slices array of slice slices meta data

Where sub document slice is defined as:

Attribute Data Type Notes
id ObjectId slice's data's object id in data document
start DateTime start time of the slice
num_of_samples int the number of data stored in this slice
is_sparse boolean indicates whether the slice's data is stored in sparse way or not
  • data - which stores the actual data. data can be stored in either sparse way or not (currently, only sparse way is supported).
Attribute Data Type Notes
_id ObjectId id for slice
data array the actual data

For sparse slice, the actual data is stored as subdocument as follows:

Attribute Data Type Notes
timestamp DatTime the timestamp of the data point
data array the data array representing a row of data

Examples:

  • Creating a new time series from pandas DataFrame:
from StringIO import StringIO

import pandas as pd
import numpy as np

from tss.utils import get_mongo_db

db = get_mongo_db()
input_data=StringIO("""col1,col2,col3
1,2,3
4,5,6
7,8,9
""")
    df = pd.read_csv(input_data, sep=",")
    df['time'] = pd.Series([np.datetime64(datetime(2017, 3, 8)),
                            np.datetime64(datetime(2017, 3, 9)),
                            np.datetime64(datetime(2017, 3, 10))])
    df.set_index(['time'], inplace=True)
    result = utils.create_with_sparse_slices_from_df(df, 'test1', '1d', 3, db)

By default, tss connects to mongo at localhost:27017 with db name tss. This can be customized by environment variables: MONGO_SERVER, MONGO_PORT, and MONGO_DB_NAME.

In mongo, the data is stored as:

> db.series.findOne()
{ 
	"_id" : ObjectId("58c45d4af4a6b0054cecdad6"), 
	"frequency" : "1d", "name" : 
	"test1", 
	"columns" : [ "col1", "col2", "col3" ], 
	"slices" : [ 
		{ 
			"start" : ISODate("2017-03-08T00:00:00Z"), 
			"num_of_samples" : 1, 
			"id" : ObjectId("58c45d4af4a6b0054cecdad7"), 
			"is_sparse" : true 
		}, 
		{ 
			"start" : ISODate("2017-03-09T00:00:00Z"), 
			"num_of_samples" : 1, 
			"id" : ObjectId("58c45d4af4a6b0054cecdad8"), 
			"is_sparse" : true 
		}, 
		{ 
			"start" : ISODate("2017-03-10T00:00:00Z"), 
			"num_of_samples" : 1, 
			"id" : ObjectId("58c45d4af4a6b0054cecdad9"), 
			"is_sparse" : true 
		} 
	] 
}
> db.data.find({})
{
	{ "_id" : ObjectId("58c45d4af4a6b0054cecdad7"), "data" : [ { "timestamp" : ISODate("2017-03-08T00:00:00Z"), "data" : [ 1, 2, 3 ] } ] }
	{ "_id" : ObjectId("58c45d4af4a6b0054cecdad8"), "data" : [ { "timestamp" : ISODate("2017-03-09T00:00:00Z"), "data" : [ 4, 5, 6 ] } ] }
	{ "_id" : ObjectId("58c45d4af4a6b0054cecdad9"), "data" : [ { "timestamp" : ISODate("2017-03-10T00:00:00Z"), "data" : [ 7, 8, 9 ] } ] }
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages