Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes an init race condition #93

Merged
merged 5 commits into from
May 21, 2019
Merged

Fixes an init race condition #93

merged 5 commits into from
May 21, 2019

Conversation

sriramster
Copy link
Contributor

This could probably be a race condition,

For ex: When we've create a subscriber in the API, and the subscriber has the data already available in the callback (Cause of existing publishers) the db entry for the particular topic would not be availalble, which in turn returns an SqliteException. This is cause write_->create_topic() call is where we add the db entry for a particular topic. And, this leads to crashing before any recording. Locally I solved it by adding the db entry first, and if create_subscription fails, remove the topic entry from the db and also erase the subscription.

Copy link
Collaborator

@Karsten1987 Karsten1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. I could see that this is a potential race condition.

I am requesting changes though because I would like to have some tests for it. It would be great if you could write a set of unit tests to cover the new functionality.

@@ -77,6 +77,16 @@ class ROSBAG2_PUBLIC Writer
*/
virtual void create_topic(const TopicMetadata & topic_with_type);

/**
* Remove a new topic in the underlying storage. If creation of subscription fails remove the,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please format for one sentence per line.

Suggested change
* Remove a new topic in the underlying storage. If creation of subscription fails remove the,
* Remove a new topic in the underlying storage.
* If creation of subscription fails remove, the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Karsten1987 most of the files have similar comments. Would you want me to edit them for all?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine if you only tackle the comments of the functions you're modifying. Otherwise the PR is getting too big.

* \param topic_with_type name and type identifier of topic to be created
* \throws runtime_error if the Writer is not open.
*/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -106,13 +106,22 @@ void Recorder::subscribe_topics(

void Recorder::subscribe_topic(const rosbag2::TopicMetadata & topic)
{
if(writer_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check shouldn't be necessary because writer_ stays a initialized shared pointer.

@@ -106,13 +106,22 @@ void Recorder::subscribe_topics(

void Recorder::subscribe_topic(const rosbag2::TopicMetadata & topic)
{
if(writer_) {
writer_->create_topic(topic);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be sufficient to call writer_->create_topic(topic) before we call subscriptions_.push_back(subscriptions) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. But, don't you feel we may need a fallback when subscription creation fails and the db still has an entry for the same. At the same time, when the subscription is started again, the race would still exist. So may be we may have to clean up the db, so added the remove api.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your last point.

sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request Mar 26, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
@sriramster
Copy link
Contributor Author

Hi @Karsten1987 I've updated the patch. Please review. Thanks

Copy link
Collaborator

@Karsten1987 Karsten1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks better now. I would still like to have it tested if possible.

@sriramster
Copy link
Contributor Author

Hi @Karsten1987 I've updated the patch with a test case for remove_topic at the lower most layer of the architecture.

Once this API starts being used extensively we may have to add more test cases into other layers of the code. Please let me know if this approach is ok. I'm trying to keep the patchset as minimal as possible.

So that I could build on this in future.

@sriramster
Copy link
Contributor Author

Hello @Karsten just reminding you about this pr

@Karsten1987
Copy link
Collaborator

thanks for the patch. It looks good to me. However, in order to run CI on it, you have to rebase your branch to the latest master.

@sriramster
Copy link
Contributor Author

Hi Karsten,

I've rebased to master. Let me know if anyting else.

@Karsten1987
Copy link
Collaborator

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

looks like the tests are not completely modified for this:

16:37:49 /home/jenkins-agent/workspace/ci_linux/ws/install/class_loader/include/class_loader/meta_object.hpp:193:12: error: invalid new-expression of abstract class type ‘TestPlugin’
16:37:49      return new C;
16:37:49             ^~~~~
16:37:49 In file included from /home/jenkins-agent/workspace/ci_linux/ws/src/ros2/rosbag2/rosbag2_storage/test/rosbag2_storage/test_plugin.cpp:25:0:
16:37:49 /home/jenkins-agent/workspace/ci_linux/ws/src/ros2/rosbag2/rosbag2_storage/test/rosbag2_storage/test_plugin.hpp:27:7: note:   because the following virtual functions are pure within ‘TestPlugin’:
16:37:49  class TestPlugin : public rosbag2_storage::storage_interfaces::ReadWriteInterface
16:37:49        ^~~~~~~~~~
16:37:49 In file included from /home/jenkins-agent/workspace/ci_linux/ws/src/ros2/rosbag2/rosbag2_storage/include/rosbag2_storage/storage_interfaces/read_write_interface.hpp:21:0,
16:37:49                  from /home/jenkins-agent/workspace/ci_linux/ws/src/ros2/rosbag2/rosbag2_storage/test/rosbag2_storage/test_plugin.hpp:25,
16:37:49                  from /home/jenkins-agent/workspace/ci_linux/ws/src/ros2/rosbag2/rosbag2_storage/test/rosbag2_storage/test_plugin.cpp:25:
16:37:49 /home/jenkins-agent/workspace/ci_linux/ws/src/ros2/rosbag2/rosbag2_storage/include/rosbag2_storage/storage_interfaces/base_write_interface.hpp:40:16: note: 	virtual void rosbag2_storage::storage_interfaces::BaseWriteInterface::remove_topic(const rosbag2_storage::TopicMetadata&)
16:37:49    virtual void remove_topic(const TopicMetadata & topic) = 0;

@sriramster
Copy link
Contributor Author

Hi Karsten,

Had add the new remove API to other dependent test cases. Fixed them now. Please have a look.

@Karsten1987
Copy link
Collaborator

Karsten1987 commented Apr 9, 2019

new round of CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

looks like there is some linter errors you might want to address.

@sriramster
Copy link
Contributor Author

@Karsten1987
Fixed the linter errors. Sorry for the delay held up with things.

@Karsten1987
Copy link
Collaborator

@sriramster Sorry for being silent on this. Do you mind rebasing this branch to the latest master?

@sriramster
Copy link
Contributor Author

@Karsten1987 Hi Karsten, done. Merged the changes. Please let me know if anything else needs to be done. Sorry for the delay. I missed your comment somehow.

@Karsten1987
Copy link
Collaborator

@sriramster I am sorry, I might have to ask you one more time to rebase. We've had to fix the master build first. Once rebased, I am happy to run CI on your branch.

sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request May 14, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
@Karsten1987
Copy link
Collaborator

Looks like something went wrong here while rebasing.

@sriramster
Copy link
Contributor Author

Hi Karsten, Do you've the log? of what went wrong? I'm unable to follow it through the tree. Could you help?

@Karsten1987
Copy link
Collaborator

Your branch has now 24 commits where 19 of them are already on master. So you have to rebase correctly to only have your 6 commits on this branch. Does this make sense?

sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request May 15, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request May 15, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
@sriramster
Copy link
Contributor Author

@Karsten1987 Could you review this now? Hope there are no issues. Gradually the branch is becoming unclean for me. :-/

@Karsten1987
Copy link
Collaborator

I think something didn't quite work out during your rebasing. Make sure you end up with only the commits you've actually authored. So for example, right now there are 33 commits on this branch - whereas I believe only 6 commits are actually yours.
So just to be clear, you have to run

git checkout master
git fetch --all
git rebase origin/master

There might be some conflicts you have to resolve locally on your side, but this should help you end up with a branch which only has your 6 commits on top of the upstream master branch.

Sriram Raghunathan added 5 commits May 20, 2019 13:19
…subscriber in the API, and the subscriber has the data already available in the callback (Cause of existing publishers) the db entry for the particular topic would not be availalble, which in turn returns an SqliteException. This is cause write_->create_topic() call is where we add the db entry for a particular topic. And, this leads to crashing before any recording.

Locally I solved it by adding the db entry first, and if
create_subscription fails, remove the topic entry from the db and also
erase the subscription.

Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request May 20, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request May 20, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
sriramster pushed a commit to sriramster/rosbag2 that referenced this pull request May 20, 2019
Signed-off-by: Sriram Raghunathan <rsriram7@visteon.com>
@sriramster
Copy link
Contributor Author

Well. I used the same technique to rebase. Let me know if this works. I've cleaned up and pushed again.
Thanks for the patience.

@Karsten1987
Copy link
Collaborator

So I think we have to start from a clean sheet here. There are now 49 commits on this branch and most of them are duplicated. Here is what I propose:

cd <your/path/to/rosbag2>
git fetch --all
git checkout master
git reset --hard origin/master
git cherry-pick 3bf3313befbddaf2ff6a982fac501c81c65bb976..4841d7caf57a98d3b6094ab9bf4e88c2a3ab5fae

This will reset your branch to the latest upstream branch. Then with this cherry-pick command you can only apply the commit range you've contributed. You might have to resolve conflicts along the way of cherry-picking.

Alternatively, you can close this PR, check-out a new branch (I recommend renaming it to something else than master, e.g. fix_init_race_condition) and put your commits on top of that new branch. As far as I can recall, the changes are not too big.

If you then look at your git log or gitk or similar visualization tool for git branches, you should only see a couple of commits on top of origin/master. In any case, once you've done either one I would ask you to compile and run tests on your workspace to verify your branch.

@sriramster
Copy link
Contributor Author

I fixed my repo by force pushing it. Somehow not used to this model, and taking some time to understand it. Please close the pull request if this does'nt work. I'll re-do this work and raise a new request.

@Karsten1987
Copy link
Collaborator

Karsten1987 commented May 20, 2019

Happy to see you could fix it. The branch looks good now. Just fyi, whenever you rebase, there is no way around force pushing. So that's to be expected.

Running CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Collaborator

@Karsten1987 Karsten1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks for iterating over it.

@Karsten1987 Karsten1987 merged commit dcd65a6 into ros2:master May 21, 2019
@sriramster
Copy link
Contributor Author

Thanks @Karsten1987 I'm not used to rebase workflow, I follow fetch, review, merge and then push.
I think that's what messed the branch up.

ruffsl added a commit to dledr/bbr_ros2 that referenced this pull request Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants