Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse NF instance IDs #92

Merged
merged 2 commits into from
Apr 27, 2019
Merged

Reuse NF instance IDs #92

merged 2 commits into from
Apr 27, 2019

Conversation

koolzz
Copy link
Member

@koolzz koolzz commented Apr 2, 2019

Instead of stopping when we reach MAX_NFS, wrap back to the initial instance ID starting value.

Summary:

Reuse instance IDs of old NFs that have terminated.
I've initially implemented an inline function for the while loop so we don't use the code twice, but I revised this as I think the 2 loops with comments just look cleaner.

Usage:
I tested with decreasing MAX_NFS number to 4, seems to work didn't test all the small details yet.

This PR includes
Resolves issues
Breaking API changes
Internal API changes
Usability improvements 馃帀
Bug fixes
New functionality 馃憤
New NF/onvm_mgr args
Changes to starting NFs
Dependency updates
Web stats updates

Merging notes:

  • Dependencies: None

TODO before merging :

  • PR is ready for review

Test Plan:

Try to break this.

Review:

TBA

@onvm
Copy link

onvm commented Apr 2, 2019

In response to PR creation

CI Message

Your results will arrive shortly

@onvm
Copy link

onvm commented Apr 2, 2019

In response to PR creation

CI Message

Run successful see results:
[Results from nimbnode30]
Median TX pps for Speed Tester: 35202395

Linter Passed

Copy link
Member

@dennisafa dennisafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested by setting MAX_NFs to 4 and running multiple nf's + terminating. NF instance ID's wrap around as expected.
-- Edit: Bug occurs when attempting to terminate first NF initialized

  • Sanity checks
    NF and onvm compiles and runs properly.

  • Code design
    Code is clean and concise. Functionality of assigning previous instance ID's is easily understandable from the code.

  • Documentation
    Documentation within the code is concise and understandable.

@kevindweb
Copy link
Contributor

Sorry for my late response, but I found a pretty big issue. Replicated by setting MAX_NFS to any number greater than 1. Then try to run more nfs than that number. Tested with MAX_NFS = 2 and MAX_NFS = 4. When I get this error (correct):
Screen Shot 2019-04-12 at 11 32 22 AM

I assume nothing more bad to happen, it should have done that. But, when I try to terminate the first NF initialized (speed_tester: ./go.sh 1 -d 1 -c 16000), it does not respond.
Screen Shot 2019-04-12 at 11 32 34 AM

Screen Shot 2019-04-12 at 11 33 23 AM
I can explain more about the process I did later if needed.

@koolzz
Copy link
Member Author

koolzz commented Apr 12, 2019

This is indeed a weird issue, I'll check later. Any ideas why this happens?

@dennisafa
Copy link
Member

dennisafa commented Apr 12, 2019

This is indeed a weird issue, I'll check later. Any ideas why this happens?

I am looking into the code - I will keep you updated if I find something funky

Edit: I've been trying to replicate this bug again; but I'm having trouble. I follow @kevindweb steps, and assure that the ID we are trying to kill is proper by inserting a print statement:
Screen Shot 2019-04-13 at 8 42 03 AM

However I can easily kill the first NF running, and I am assuring that the new reuse ID function is being used. Additionally, I am making sure to test with setting MAX_NFs to 4 right away, instead of setting to 2 first then attempting to replicate.

Exact steps:

  1. Checkout reuse instance ids
  2. Limit MAX_NFs to 4, make
  3. Run manager
  4. Run 4 speed_tester NF's, getting an error code on running the 4th
  5. Attempting to kill first NF initialized (working properly, in my cases)

@koolzz
Copy link
Member Author

koolzz commented Apr 14, 2019

Agree with @dennisafa I can't seem to reproduce the behavior, @kevindweb are there any key steps we're missing here?

@kevindweb
Copy link
Contributor

Dennis told me he was able to reproduce it the same way I explained. First, I changed MAX_NFS to 2, and started 2 speed_testers, this created an error with the first initialized speed_tester. Then I did the same thing with MAX_NFS=4. I don't know exactly why this happens though.

@koolzz
Copy link
Member Author

koolzz commented Apr 14, 2019

Did you make clean && make the NFs?

@kevindweb
Copy link
Contributor

I just tested it the same way with cleaning and making /examples and /onvm again

@koolzz
Copy link
Member Author

koolzz commented Apr 14, 2019

Not sure why I can't reproduce, I'll take a look at the lab

@dennisafa
Copy link
Member

Dennis told me he was able to reproduce it the same way I explained. First, I changed MAX_NFS to 2, and started 2 speed_testers, this created an error with the first initialized speed_tester. Then I did the same thing with MAX_NFS=4. I don't know exactly why this happens though.

Strangely, this issue pops up only every once in a while. In a few cases, it'll occur on the second or third NF initialized. Could this be an issue unrelated to this specific PR? I would understand if the instance_id variable for the first NF was somehow corrupted when getting the error code (because then we couldn't clean up properly) but it's valid every time I'm testing it..

Copy link
Contributor

@kevindweb kevindweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, my first successful test of this PR!! I stress tested this a lot because I was being cautious. For a few minutes I consistently added and dropped new speed_testers, was not allowed to start more than MAX_NFS, correctly! The instance IDs wrapped around perfectly, even after multiple iterations of stopping and starting NFs. Success A#1!

@koolzz
Copy link
Member Author

koolzz commented Apr 21, 2019

@dennisafa @kevindweb maybe you guys where onto something. I found a nasty bug regarding this PR. When running different NFs and reusing the old instance_id they would segfault on pthread_join. After some gdb debugging and being very confused I have found the reason. We never cleaned up old function pointers in the onvm_nf struct.
For example: We run speed tester on instance id 1. Then we run a few other NFs, stop speed tester, and start a new simple_forward NF which after a wrap around gets instance id 1. This would segfault because the simple forward would try to call the nf_setup function from the old speed_tester setup. This would not segfault, but it would corrupt our memory badly. I'm fixing this and bundling it into a memory cleanup pr and I'll assign you guys for review as it might have been related to issues you were seeing.

@koolzz
Copy link
Member Author

koolzz commented Apr 27, 2019

@onvm go for it

@onvm
Copy link

onvm commented Apr 27, 2019

@onvm go for it

CI Message

Your results will arrive shortly

@onvm
Copy link

onvm commented Apr 27, 2019

@onvm go for it

CI Message

Run successful see results:
[Results from nimbnode30]
Median TX pps for Speed Tester: 35204617

Linter Failed

examples/aes_decrypt/aes.h:176: #endif line should be "#endif // AES_H" [build/header_guard] [5]
Total errors found: 1
examples/aes_encrypt/aes.h:185: #endif line should be "#endif // AES_H" [build/header_guard] [5]
Total errors found: 1
examples/flow_table/flow_table.h:63: #endif line should be "#endif // FLOW_TABLE_H" [build/header_guard] [5]
Total errors found: 1
examples/flow_table/msgbuf.h:71: #endif line should be "#endif // MSGBUF_H" [build/header_guard] [5]
Total errors found: 1
examples/flow_table/openflow.h:969: #endif line should be "#endif // OPENFLOW_H" [build/header_guard] [5]
examples/flow_table/openflow.h:50: Using deprecated casting style. Use static_cast(...) instead [readability/casting] [4]
examples/flow_table/openflow.h:569: Extra space before ( in function call [whitespace/parens] [4]
examples/flow_table/openflow.h:634: Extra space before ( in function call [whitespace/parens] [4]
examples/flow_table/openflow.h:771: Extra space before ( in function call [whitespace/parens] [4]
examples/flow_table/openflow.h:804: Extra space before ( in function call [whitespace/parens] [4]
examples/flow_table/openflow.h:865: Extra space before ( in function call [whitespace/parens] [4]
examples/flow_table/openflow.h:926: Extra space before ( in function call [whitespace/parens] [4]
Total errors found: 8
examples/flow_table/sdn.c:334: { should almost always be at the end of the previous line [whitespace/braces] [4]
examples/flow_table/sdn.c:365: Should have a space between // and comment [whitespace/comments] [4]
Total errors found: 2
examples/flow_table/sdn.h:104: #endif line should be "#endif // SDN_H" [build/header_guard] [5]
Total errors found: 1
examples/flow_table/sdn_pkt_list.h:122: #endif line should be "#endif // SDN_PKT_LIST_H" [build/header_guard] [5]
Total errors found: 1
examples/flow_table/setupconn.h:53: #endif line should be "#endif // SETUPCONN_H" [build/header_guard] [5]
Total errors found: 1
examples/speed_tester/speed_tester.c:364: Lines should be <= 120 characters long [whitespace/line_length] [5]
Total errors found: 1
onvm/onvm_mgr/main.c:109: Lines should be <= 120 characters long [whitespace/line_length] [5]
Total errors found: 1
onvm/onvm_mgr/onvm_init.c:107: { should almost always be at the end of the previous line [whitespace/braces] [4]
onvm/onvm_mgr/onvm_init.c:114: { should almost always be at the end of the previous line [whitespace/braces] [4]
onvm/onvm_mgr/onvm_init.c:116: { should almost always be at the end of the previous line [whitespace/braces] [4]
Total errors found: 3
onvm/onvm_mgr/onvm_pkt.c:68: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after [runtime/casting] [4]
Total errors found: 1
onvm/onvm_nflib/onvm_common.h:366: #endif line should be "#endif // ONVM_COMMON_H" [build/header_guard] [5]
Total errors found: 1
onvm/onvm_nflib/onvm_config_common.h:208: #endif line should be "#endif // ONVM_CONFIG_COMMON_H" [build/header_guard] [5]
Total errors found: 1
onvm/onvm_nflib/onvm_msg_common.h:61: #endif line should be "#endif // ONVM_MSG_COMMON_H" [build/header_guard] [5]
Total errors found: 1
onvm/onvm_nflib/onvm_nflib.c:523: Line ends in whitespace. Consider deleting these extra spaces. [whitespace/end_of_line] [4]
Total errors found: 1
onvm/onvm_nflib/onvm_pkt_common.c:98: Are you taking an address of a cast? This is dangerous: could be a temp var. Take the address before doing the cast, rather than after [runtime/casting] [4]
Total errors found: 1
onvm/onvm_nflib/onvm_sc_common.h:70: #endif line should be "#endif // ONVM_SC_COMMON_H" [build/header_guard] [5]
onvm/onvm_nflib/onvm_sc_common.h:70: Should have a space between // and comment [whitespace/comments] [4]
Total errors found: 2
onvm/onvm_nflib/onvm_sc_mgr.h:79: #endif line should be "#endif // ONVM_SC_MGR_H" [build/header_guard] [5]
Total errors found: 1

@koolzz koolzz merged commit c2eb910 into sdnfv:develop Apr 27, 2019
@koolzz koolzz added this to the ONVM 19.05 Release milestone May 26, 2019
@koolzz koolzz deleted the reuse_instance_ids branch June 7, 2019 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants