Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session - Hybrid DB #3252

Merged
merged 7 commits into from Jul 24, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
59 changes: 59 additions & 0 deletions mitmproxy/addons/session.py
@@ -0,0 +1,59 @@
import os
import sqlite3

from mitmproxy.exceptions import SessionLoadException
from mitmproxy.utils.data import pkg_data


# Could be implemented using async libraries
class SessionDB:
"""
This class wraps connection to DB
for Sessions and handles creation,
retrieving and insertion in tables.
"""

def __init__(self, db_path=None):
"""
Connect to an already existing database,
or create a new one with optional path.
:param db_path:
"""
if db_path is not None and os.path.isfile(db_path):
self._load_session(db_path)
else:
path = db_path or 'tmp.sqlite'
# in case tmp.sqlite already exists in FS
if os.path.isfile(path):
os.remove(path)
self.con = sqlite3.connect(path)
script_path = pkg_data.path("io/sql/session_create.sql")
qry = open(script_path, 'r').read()
with self.con:
self.con.executescript(qry)

def _load_session(self, path):
if not self.is_session_db(path):
raise SessionLoadException('Given path does not point to a valid Session')
self.con = sqlite3.connect(path)

@staticmethod
def is_session_db(path):
"""
Check if database entered from user
is a valid Session SQLite DB.
:return: True if valid, False if invalid.
"""
try:
c = sqlite3.connect(f'file:{path}?mode=rw', uri=True)
cursor = c.cursor()
cursor.execute("SELECT NAME FROM sqlite_master WHERE type='table';")
rows = cursor.fetchall()
tables = [('FLOW',), ('BODY',), ('META',), ('ANNOTATION',)]
if all(elem in rows for elem in tables):
c.close()
return True
except:
if c:
c.close()
return False
4 changes: 4 additions & 0 deletions mitmproxy/exceptions.py
Expand Up @@ -129,6 +129,10 @@ def __init__(self, message=None):
super().__init__(message)


class SessionLoadException(MitmproxyException):
pass


class Disconnect:
"""Immediate EOF"""

Expand Down
26 changes: 26 additions & 0 deletions mitmproxy/io/sql/session_create.sql
@@ -0,0 +1,26 @@
CREATE TABLE FLOW (
FID INTEGER PRIMARY KEY,
MID INTEGER,
BID INTEGER,
CONTENT BLOB
);

CREATE TABLE META (
MID INTEGER PRIMARY KEY,
INTERCEPTED INTEGER,
MARKED INTEGER,
MODE VARCHAR(20)
);

CREATE TABLE BODY (
BID INTEGER,
BREQ BLOB,
BRES BLOB
);

CREATE TABLE ANNOTATION (
AID INTEGER PRIMARY KEY,
FID INTEGER,
TYPE VARCHAR(20),
CONTENT BLOB
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments on the schema design here, some trivial/aesthetic, some substantive. I'm going to put them all down here for discussion, and let's chat about this on Slack when we're both on, or here if you prefer.

  • Could we please more explanatory names than AID, FID, etc. There are two schools of thought here, and I'm OK with both. The first is to just use "id" for the primary key in all tables. In practice this is fine, because they will be referred to along with the table name in almost all circumstances. The second is to prefix the ID with the table name, so flow_id or flowid. Pick whichever is more aesthetic to you. :)
  • Let's not un-necessarily capitalise the table and column names. All of those are better off with lower-case names.
  • You have a number of foreign keys in there without explicit constraints (e.g. FID in ANNOTATION). Please annotate those unless there's a good reason not to.
  • I am not sure that the META table is needed at all. Why not make each of the different META classes an annotation in the annotations table? Things like the order accounting will be maintained in memory anyway (as discussed) so I don't feel that performance is a bottleneck here. I also think that this is exactly what the annotations table is intended for.
  • I would do the association between the BODY and FLOW tables slightly differently. In most cases we're only going to be interested in either the request or the response body, not both. I think this leads us to a simpler schema, the BODY table has a flowid, a typeid, and a blob. This lets us both forward and reverse associate between the body and flows tables, and opens things to easily be extended when we need large blob stores for other things (e.g. websocket messages, etc.).
  • At the moment, our flow ID is a UUID, which means that there's a lot of stuff we don't have to worry about in terms of sequencing and uniqueness guarantees. I think maintaining that is fine, and will have to be reflected in the database. There are many approaches we can take here, but the simplest is to make the flow primary key a varchar(36). An alternative is to still key on an integer (this will be faster) and have a separate index over the UUID varchar in the flow table.

52 changes: 52 additions & 0 deletions test/mitmproxy/addons/test_session.py
@@ -0,0 +1,52 @@
import sqlite3
import os
import pytest

from mitmproxy.exceptions import SessionLoadException
from mitmproxy.addons import session
from mitmproxy.utils.data import pkg_data


class TestSession:
def test_session_temporary(self, tdata):
open('tmp.sqlite', 'w')
s = session.SessionDB()
assert session.SessionDB.is_session_db('tmp.sqlite')
s.con.close()
os.remove('tmp.sqlite')

def test_session_not_valid(self, tdata):
path = tdata.path('mitmproxy/data/') + '/test.sqlite'
if os.path.isfile(path):
os.remove(path)
with open(path, 'w') as handle:
handle.write("Not valid data")
with pytest.raises(SessionLoadException):
session.SessionDB(path)

def test_session_new_persistent(self, tdata):
path = tdata.path('mitmproxy/data/') + '/test.sqlite'
if os.path.isfile(path):
os.remove(path)
session.SessionDB(path)
assert session.SessionDB.is_session_db(path)

def test_session_load_existing(self, tdata):
path = tdata.path('mitmproxy/data/') + '/test.sqlite'
if os.path.isfile(path):
os.remove(path)
con = sqlite3.connect(path)
script_path = pkg_data.path("io/sql/session_create.sql")
qry = open(script_path, 'r').read()
with con:
con.executescript(qry)
blob = b'blob_of_data'
con.execute(f'INSERT INTO FLOW VALUES(1, 1, 1, "{blob}");')
con.close()
session.SessionDB(path)
con = sqlite3.connect(path)
with con:
cur = con.cursor()
cur.execute('SELECT * FROM FLOW;')
rows = cur.fetchall()
assert len(rows) == 1