Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compilationLog][LogParser]: A tool to filter out all transformations related with one given node. #3208

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
5 participants
@ZchiPitt
Copy link
Contributor

commented Jul 4, 2019

Summary:
Implement a tool that

  • filters the compile log for information about a specific node.
  • It returns all related transformations to the given node.

Test Plan:
Test with a real network: googlenet_v1_slim. Filter the node of 'InceptionV1_InceptionV1_Mixed_5b_Branch_0_Conv2d_0a_1x1_BatchNorm_batchnorm_mul21'. And result is shown in a dot graph that has all the related transformations.

python3 ../glow/utils/log_parser.py -f googlenet_v1_slim/googlenet_v1_slim.onnx_compile.log

python3 ../glow/utils/compilation_filter.py --db-file compilation_log_db.sqlite --filter-target InceptionV1_InceptionV1_Mixed_5b_Branch_0_Conv2d_0a_1x1_BatchNorm_batchnorm_mul21

dot -Tpdf transformations.dot > trans.pdf

Screen Shot 2019-07-04 at 1 24 48 AM

Yellow rectangle is the direct transformation that created/replaced the given node. Other blue rectangles are all related transformations.

=================================================================

=================================================================

Update with a QA that explains the design of this pr.

What is the reason of SQLite?

The reason of using sqlite is to fully utilize well-optimized database to have efficient queries we might have in the future (when the log might get bigger in the future).

What is the general flow?

Run log_parser.py to process the raw compilation log and create a database file and store processed data items (Node transformation right now, but there can be more in the future) into it. In details, I create a table Log_Transformation (trans_id INTEGER, operation_type VARCHAR, node_name VARCHAR, node_kind VARCHAR, scope_name VARCHAR). The script stores all node transformation into this table.

For example transaction with id 100 that is 'A lower into B,C', the insert data items would be 100, REMOVE, A, A_kind, lower, 100, ADD, B, B_kind, lower, 100, ADD, C, C_kind, lower.

Run compilation_filter.py to query all related transformation of a give node name. Given a node, query the database to find all transformations that are directly/indirectly related with it.

Do you create a new DB file each time you parse/query the log as opposed to creating it once and then running multiple queries against it? What are the pros and cons?

I create a new DB file once when parsing a new compilation log file with log_parser.py. I execute queries with 'compilation_filter.py' on the created database file. The pro here is that I don't need to parse the raw log file every time i execute queries. The con here is that whenever we update the log file we need to recreate a database file.

Do you insert all the information from the log into this DB or just some parts of it?

Right now, I only store node transformation info into the database. An example is given above. But we can expand with more possible tables that store more valuable information in the future.

How do you handle queries for finding all ancestors of a given node? Do you issue multiple queries? Or may be use use some clever tricks like those described e.g. transitive_closure SQLite extension or WITH RECURSIVE

I issue multiple queries. The steps are like below:

  1. Search all trans_ids that has the provided node_name. Store that trans_ids in a list.
    SELECT trans_id FROM Log_Transformation WHERE node_name = '{nodeName}' GROUP BY trans_id

  2. Search all node_names that are in the trans_ids_list.
    SELECT node_name FROM Log_Transformation WHERE trans_id in {trans_ids_list} GROUP BY node_name

  3. Search all trans_ids that that has the node_name in the node_name_list in step 2.
    SELECT trans_id FROM Log_Transformation WHERE node_name in {node_name_list} GROUP BY trans_id

  4. Repeat 2,3 until the trans_ids_list dont change any more.

  5. Return the trans_ids_list.

Well, i did try with CTE i.e. the WITH RECURSIVE, I found it really hard to write the same logic. Also I'm not sure if this recursive query is actually translated into multiple SELECT clauses in the dbms, which might be same efficiency as executing multiple queries.

Do you remove the DB file at the end of processing?

No, it stays there. One DB file will only get removed when log_parser.py tries to create a new db that has the same name.

@opti-mix

This comment has been minimized.

Copy link
Contributor

commented Jul 5, 2019

@ZchiPitt Could you elaborate a bit about the use of SQLite in your PR?

  • What is the reason?
  • What is the general flow?
  • Do you create a new DB file each time you parse/query the log as opposed to creating it once and then running multiple queries against it? What are the pros and cons?
  • Do you insert all the information from the log into this DB or just some parts of it?
  • How do you handle queries for finding all ancestors of a given node? Do you issue multiple queries? Or may be use use some clever tricks like those described e.g. transitive_closure SQLite extension or WITH RECURSIVE
  • Do you remove the DB file at the end of processing?
@ZchiPitt

This comment has been minimized.

Copy link
Contributor Author

commented Jul 5, 2019

@opti-mix Thanks for the asking the questions.

What is the reason?

The reason of using sqlite is to fully utilize well-optimized database to have efficient queries we might have in the future (when the log might get bigger in the future).

What is the general flow?

Run log_parser.py to process the raw compilation log and create a database file and store processed data items (Node transformation right now, but there can be more in the future) into it. In details, I create a table Log_Transformation (trans_id INTEGER, operation_type VARCHAR, node_name VARCHAR, node_kind VARCHAR, scope_name VARCHAR). The script stores all node transformation into this table.

For example transaction with id 100 that is 'A lower into B,C', the insert data items would be 100, REMOVE, A, A_kind, lower, 100, ADD, B, B_kind, lower, 100, ADD, C, C_kind, lower.

Run compilation_filter.py to query all related transformation of a give node name. Given a node, query the database to find all transformations that are directly/indirectly related with it.

Do you create a new DB file each time you parse/query the log as opposed to creating it once and then running multiple queries against it? What are the pros and cons?

I create a new DB file once when parsing a new compilation log file with log_parser.py. I execute queries with 'compilation_filter.py' on the created database file. The pro here is that I don't need to parse the raw log file every time i execute queries. The con here is that whenever we update the log file we need to recreate a database file.

Do you insert all the information from the log into this DB or just some parts of it?

Right now, I only store node transformation info into the database. An example is given above. But we can expand with more possible tables that store more valuable information in the future.

How do you handle queries for finding all ancestors of a given node? Do you issue multiple queries? Or may be use use some clever tricks like those described e.g. transitive_closure SQLite extension or WITH RECURSIVE

I issue multiple queries. The steps are like below:

  1. Search all trans_ids that has the provided node_name. Store that trans_ids in a list.
    SELECT trans_id FROM Log_Transformation WHERE node_name = '{nodeName}' GROUP BY trans_id

  2. Search all node_names that are in the trans_ids_list.
    SELECT node_name FROM Log_Transformation WHERE trans_id in {trans_ids_list} GROUP BY node_name

  3. Search all trans_ids that that has the node_name in the node_name_list in step 2.
    SELECT trans_id FROM Log_Transformation WHERE node_name in {node_name_list} GROUP BY trans_id

  4. Repeat 2,3 until the trans_ids_list dont change any more.

  5. Return the trans_ids_list.

Well, i did try with CTE i.e. the WITH RECURSIVE, I found it really hard to write the same logic. Also I'm not sure if this recursive query is actually translated into multiple SELECT clauses in the dbms, which might be of the same efficiency as executing multiple queries.

Do you remove the DB file at the end of processing?

No, it stays there. One DB file will only get removed when log_parser.py tries to create a new db that has the same name.

Show resolved Hide resolved utils/log_parser.py Outdated

@ZchiPitt ZchiPitt force-pushed the ZchiPitt:logParser branch from 54cfd1b to 5e58020 Jul 8, 2019

@nickgg

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

@ZchiPitt Why is the output a graph? Can the result graph ever branch?

Show resolved Hide resolved utils/compilation_filter.py
Show resolved Hide resolved utils/compilation_filter.py
Show resolved Hide resolved utils/compilation_filter.py Outdated
Show resolved Hide resolved utils/compilation_filter.py Outdated
Show resolved Hide resolved utils/log_parser.py
Show resolved Hide resolved utils/compilation_filter.py
Show resolved Hide resolved utils/compilation_filter.py
Show resolved Hide resolved utils/compilation_filter.py Outdated
Show resolved Hide resolved utils/compilation_filter.py Outdated
Show resolved Hide resolved utils/compilation_filter.py Outdated

@ZchiPitt ZchiPitt force-pushed the ZchiPitt:logParser branch 3 times, most recently from 985da4e to e74adcb Jul 8, 2019

@ZchiPitt

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Graph seems to me a more descriptive representation than plain text to show the ancestors and descendants of transformations. Theoretically, there could be branches as the input and output of the transformation are list of nodes, which might come from different transformations.

@ZchiPitt ZchiPitt force-pushed the ZchiPitt:logParser branch from e74adcb to 4956971 Jul 9, 2019

@nickgg

nickgg approved these changes Jul 9, 2019

Copy link
Contributor

left a comment

nice job

@facebook-github-bot
Copy link

left a comment

@ZchiPitt has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ZchiPitt ZchiPitt force-pushed the ZchiPitt:logParser branch from 4956971 to 6a17998 Jul 9, 2019

@ZchiPitt ZchiPitt force-pushed the ZchiPitt:logParser branch from 6a17998 to 182c133 Jul 10, 2019

@facebook-github-bot

This comment has been minimized.

Copy link

commented Jul 10, 2019

@ZchiPitt merged this pull request in 67b2454.

@ZchiPitt ZchiPitt deleted the ZchiPitt:logParser branch Jul 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.