Skip to content
Clustering time series data with SQL.
HTML Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
img
.gitignore
README.md
config.py
module.py
sql-clustering.html
sql-clustering.ipynb

README.md

Clustering time series data with SQL

alt

Description

Clustering time series data with SQL. The purpose is to prove that doing data science doesn't always require fancy tools.

Examples in this repository might be helpful, if you must use SQL instead of proper data science tools such as python. Thus this repository is not a comprehensive guide for time series data clustering.

Even though clustering is often connected to machine learning, this showcase relies only on logical decision making.

I have focused on IoT related data in the field of predictive maintenance.

Files

sql-clustering.ipynb The analysis notebook. You find the notebook fully rendered here.

module.py The contains the functions to generate and plot the sample data.

config.py Settings and variables for the calculations.

Creating a temporary database in-memory

The experimentation required a temporary database. The first options were either a local sqlite database or a managed database service in cloud.

I chose the sqlite3 library in python and created an sqlite databse on my computer disk. Once I read the documentation further, it seemed possible to create the sqlite db to memory.

That approach was perfect for a temporary database:

import sqlite3
con = sqlite3.connect(":memory:")

Choice of visualization library

Standard python data visualization matplotlib is known as a nice library for static plots.

For this kind of tutorials interactive plots are more instructive in my opinion. A user can hover the mouse all over and receive additional information.

After some investigation I ended up using plotly because of its interactive capabilities. The implementation is effortless and it has great documentation.

An interesting finding was the dash library based from plotly. dash is a dashboarding framework for advanced graphing. However dash was not used in this experiment.

Rendering notebooks online

GitHub should be able to render python notebooks in the web browser. Unfortunately some libraries in my project broke the rendering.

Then I tried nbviewer from Jupyter which didn't show the plotly charts properly. Finally saving the notebook as an html file and viewing through nbviewer solved the issue.

You can’t perform that action at this time.