Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A proposed SQLite data access layer (DAL) #2

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions espymetrics/app.py
Expand Up @@ -9,7 +9,7 @@
from clastic.errors import BadRequest
from clastic.middleware import GetParamMiddleware

from dal import LineDAL
from dal import LineDAL, SQLiteDAL
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change.

NB: This is an affirmative comment. All other comments inline are critiques, unless otherwise specified. Only the sqlite_dal.py file is of material interest. Because there are so many comments, it may be useful to uncheck the "Show notes" option above to do an initial read through. Alternatively, use the "View" button in the top right of the sqlite_dal.py file below.



PORT = 8888
Expand All @@ -24,7 +24,7 @@ def main():
opts, _ = prs.parse_known_args()
debug = opts.debug

v1_app = create_v1_app()
v1_app = create_v1_app('sql')
app = Application([('/v1', v1_app)])
meta_app = MetaApplication()
if debug:
Expand All @@ -39,7 +39,7 @@ def create_v1_app(dal_name=DEFAULT_DAL, file_path=None):
if dal_name == 'line':
dal_type = LineDAL
elif dal_name == 'sql':
pass
dal_type = SQLiteDAL
else:
raise ValueError('unrecognized DAL name: %r' % dal_name)

Expand Down
10 changes: 7 additions & 3 deletions espymetrics/client/collect.py
Expand Up @@ -29,16 +29,20 @@
HAVE_UCS4 = getattr(sys, 'maxunicode', 0) > 65536


TIME_INFO = {'utc': str(datetime.datetime.utcnow()),
TIME_INFO = {'utc_epoch': str(datetime.datetime.utcnow()),
'std_utc_offset': -time.timezone / 3600.0}


def get_python_info():
ret = {}
ret['argv'] = sys.argv
ret['argv'] = ' \t '.join(sys.argv)
ret['bin'] = sys.executable
ret['is_64bit'] = IS_64BIT
ret['version'] = sys.version
try:
ret['version'] = sys.version.split()[0]
except:
ret['version'] = ''
ret['version_full'] = sys.version
ret['compiler'] = platform.python_compiler()
ret['build_date'] = platform.python_build()[1]
ret['version_info'] = list(sys.version_info)
Expand Down
1 change: 1 addition & 0 deletions espymetrics/dal/__init__.py
@@ -1,2 +1,3 @@

from line_dal import LineDAL
from sqlite_dal import SQLiteDAL
101 changes: 95 additions & 6 deletions espymetrics/dal/sqlite_dal.py
Expand Up @@ -13,25 +13,114 @@

* Not human readable - Binary format makes it harder to read/grep
* More complex - More logic = more work = more time/resources?

"""

import os
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be an unused import, and while built-in, can set the wrong expectation for the following code.

import sqlite3

import boltons
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unused import, but this one is more concerning. Because it's a third-party package, this can really harm the reader's impression of the following code's reusability.

You should add PyLint to your editor. PyLint will tell you about unused imports.


_MISSING = object()
SEP = '$' # '$' is valid in sqlite column names, no escaping required

TABLE_NAME = 'on_import_data'
CREATE_TABLE = ('CREATE TABLE IF NOT EXISTS ' + TABLE_NAME +
'(id integer primary key asc, ' +
'hostfqdn, hostname, python$argv, python$bin, ' +
'python$build_date, python$compiler, python$have_readline, ' +
'python$have_ucs4, python$is_64bit, python$version, python$version_full, ' +
'time$utc_epoch real, time$std_utc_offset real, ' +
'uname$0, uname$1, uname$2, uname$3, uname$4, uname$5, ' +
'username, uuid)')

total_count = 0

sqlite3.enable_callback_tracebacks(True)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of code at the module-level results in the infamous "import-time side effect". While the intentions may be good, this unexpected behavior can have serious implications.

It's best to leave functions like this to the modules closer to the main code entry point. In this case, because there may be other future sqlite usages in the project, I would centralize this sort of behavior in app.py.



class SQLiteDAL(object):
_extension = '.db'

def __init__(self, file_path):
self.file_path = file_path
self.__file_path = file_path
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use name mangling unless absolutely necessary (I have only used it once in 7 years). Name mangling is not private variables in Python, and is really just a workaround for a very specific inheritance corner case.

If you want to communicate to other developers that a variable is for internal use, a single underscore is preferred, e.g., self._file_path. (I wouldn't even say that is a high priority in this case. Mustn't spend all day navelgazing though.)


conn = sqlite3.connect(file_path)
conn.execute(CREATE_TABLE)

def add_record(self, in_dict):
pass
query = ('INSERT INTO ' + TABLE_NAME +
' (hostfqdn, hostname, python$argv, python$bin, ' +
'python$build_date, python$compiler, python$have_readline, ' +
'python$have_ucs4, python$is_64bit, python$version, python$version_full, ' +
'time$utc_epoch, time$std_utc_offset, ' +
'uname$0, uname$1, uname$2, uname$3, uname$4, uname$5, ' +
'username, uuid) VALUES (')
query += '"' + in_dict['hostfqdn'] + '", '
query += '"' + in_dict['hostname'] + '", '
query += '"' + in_dict['python']['argv'] + '", '
query += '"' + in_dict['python']['bin'] + '", '
query += '"' + in_dict['python']['build_date'] + '", '
query += '"' + in_dict['python']['compiler'] + '", '
query += '"' + str(in_dict['python']['have_readline'])[0] + '", '
query += '"' + str(in_dict['python']['have_ucs4'])[0] + '", '
query += '"' + str(in_dict['python']['is_64bit'])[0] + '", '
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very unidiomatic way to convert a bool into a string. Just use an inline 'T' if var else 'F' or a semantically meaningful converter function.

query += '"' + in_dict['python']['version'] + '", '
query += '"' + in_dict['python']['version_full'] + '", '
query += '"' + str(in_dict['time']['utc_epoch']) + '", '
query += '"' + str(in_dict['time']['std_utc_offset']) + '", '
query += '"' + in_dict['uname'][0] + '", '
query += '"' + in_dict['uname'][1] + '", '
query += '"' + in_dict['uname'][2] + '", '
query += '"' + in_dict['uname'][3] + '", '
query += '"' + in_dict['uname'][4] + '", '
query += '"' + in_dict['uname'][5] + '", '
query += '"' + in_dict['username'] + '", '
query += '"' + in_dict['uuid']
query += '")'
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of string construction of SQL has many problems:

  • It is incorrect. If one of the values includes a quote, the escaping will break, causing a SQL error at best.
  • It is insecure. String construction of SQL queries also opens you up to SQL injection.
  • It is brittle. One missing key and the whole function goes up in KeyError smoke.
  • It is hard to expand. If the schema or message is updated, keeping the orders consistent throughout will be a major issue.

At the very least use the parameters keyword argument of the execute function. Never allow inputs to modify the SQL query strings.


try:
conn = sqlite3.connect(self.__file_path)
conn.isolation_level = None # autocommit
conn.execute(query)
conn.close()
except:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch-all except clause is a major antipattern. Ignoring all exceptions is unpythonic. It's ok to pause a second and be specific about which exceptions you want to handle.

This block of code is not best-effort. This method should allow any exceptions raised here to propagate. The calling function can catch exceptions if it wants.

Furthermore, even though it's only four lines, so many different things happen in this try block, that I would also call this try block too large.

pass

global total_count
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel to the comment above, the global keyword is a big yellow flag. Uses are rare enough in production code that a code comment is warranted. In this case, total_count should be an attribute on self.

total_count += 1
return
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At over forty lines, this function is too long, especially for the level of other problems it has. Factor out a less brittle approach. Expect more power out of a line of Python.


def raw_query(self, query):
pass
try:
conn = sqlite3.connect(self.__file_path)
rows = conn.execute(query).fetchall()
return rows
except:
return -1
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from problems described in other comments, this simple function commits a crime against usability.

In the success case, this function returns a list. In the failure case, it returns an integer.

  • Use exceptions for errors.
  • Functions should have one return type (or one type + None). An exception to this rule is when the function allows the caller to pass in its own default or processing function.

In the except case, this function should allow the exception to propagate or return an empty list.


def select_records(self, limit=None, group_by=None):
pass
ret = {}
if not group_by:
return ret
group_by = group_by.replace('.', '$')
query = 'SELECT ROWID, ' + group_by + ', COUNT(*) FROM ' + TABLE_NAME
query += ' GROUP BY ' + str(group_by)
if limit:
query += ' ORDER BY ROWID DESC LIMIT ' + str(limit)

try:
conn = sqlite3.connect(self.__file_path)
rows = conn.execute(query).fetchall()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connecting and executing are common among all the methods of this class. Factor out logic like this to a central method.

except:
return
ret['counts'] = {}
for _, group_key, group_count in rows:
ret['counts'][group_key] = group_count

ret['grouped_by'] = group_by
ret['grouped_key_count'] = len(rows)
ret['record_count'] = sum(ret['counts'].values())
return ret


if __name__ == '__main__':
SQLiteDAL('test.db')