New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session - Hybrid DB #3252

Merged
merged 7 commits into from Jul 24, 2018

Conversation

Projects
None yet
2 participants
@madt1m
Copy link
Member

madt1m commented Jul 20, 2018

First section of Session addon.

Right now, the code implements the logic to create/load a new session DB from file. Store/Retrieve methods will be implemented later, together with View code, depending on the needs.
Schema which will be used by View has been defined:

  • FLOW table: Flows are dumped here. Will be used for full storage/retrievals.
  • META table: contains informations that can be quickly retrieved for View modifications -- intercepted, marked, mode, all related to Flow ID.
  • BODY table: Every time a flow is stored to DB, its request/response body will be extracted and saved here. This way, we will parse bodies only when needed, and FLOW table can be quickly parsed.
  • ANNOTATION table: This can be used to store View Settings, so to save them in Session, and also to store custom user content related to Flows. A "type" column is there to quicken selections.

Clearly, this schema will probably be subject to updates, while Session interface shapes more clearly.

FID INTEGER,
TYPE VARCHAR(20),
CONTENT BLOB
);

This comment has been minimized.

@cortesi

cortesi Jul 23, 2018

Member

I have some comments on the schema design here, some trivial/aesthetic, some substantive. I'm going to put them all down here for discussion, and let's chat about this on Slack when we're both on, or here if you prefer.

  • Could we please more explanatory names than AID, FID, etc. There are two schools of thought here, and I'm OK with both. The first is to just use "id" for the primary key in all tables. In practice this is fine, because they will be referred to along with the table name in almost all circumstances. The second is to prefix the ID with the table name, so flow_id or flowid. Pick whichever is more aesthetic to you. :)
  • Let's not un-necessarily capitalise the table and column names. All of those are better off with lower-case names.
  • You have a number of foreign keys in there without explicit constraints (e.g. FID in ANNOTATION). Please annotate those unless there's a good reason not to.
  • I am not sure that the META table is needed at all. Why not make each of the different META classes an annotation in the annotations table? Things like the order accounting will be maintained in memory anyway (as discussed) so I don't feel that performance is a bottleneck here. I also think that this is exactly what the annotations table is intended for.
  • I would do the association between the BODY and FLOW tables slightly differently. In most cases we're only going to be interested in either the request or the response body, not both. I think this leads us to a simpler schema, the BODY table has a flowid, a typeid, and a blob. This lets us both forward and reverse associate between the body and flows tables, and opens things to easily be extended when we need large blob stores for other things (e.g. websocket messages, etc.).
  • At the moment, our flow ID is a UUID, which means that there's a lot of stuff we don't have to worry about in terms of sequencing and uniqueness guarantees. I think maintaining that is fine, and will have to be reflected in the database. There are many approaches we can take here, but the simplest is to make the flow primary key a varchar(36). An alternative is to still key on an integer (this will be faster) and have a separate index over the UUID varchar in the flow table.
@madt1m

This comment has been minimized.

Copy link
Member Author

madt1m commented Jul 24, 2018

PR has been updated with changes of schema and temp DB logic!

@madt1m madt1m force-pushed the madt1m:session-db branch 2 times, most recently from a122c4a to 8a19ec5 Jul 24, 2018

@madt1m madt1m force-pushed the madt1m:session-db branch from 8a19ec5 to 68eb07b Jul 24, 2018

path = db_path
else:
# We use tempfile only to generate a path, since we demand file creation to sqlite, and removal to os.
self.temp = tempfile.NamedTemporaryFile()

This comment has been minimized.

@cortesi

cortesi Jul 24, 2018

Member

This creates a file on the file system and returns you an opened file. It's probably not what we want when we're creating a database de nuovo. I suggest that you create a temporary directory instead, and then join the tmpdir path with some filename to get the database file path.

@cortesi

This comment has been minimized.

Copy link
Member

cortesi commented Jul 24, 2018

I'm off to bed, but once the new file creation issue is handled, feel free to merge.

@madt1m madt1m merged commit 9c949bd into mitmproxy:master Jul 24, 2018

4 checks passed

codecov/patch 100% of diff hit (target 88.8%)
Details
codecov/project 88.83% (+0.03%) compared to 16e0799
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment