Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survey manager execute goals script #103

Merged

Conversation

marinagmoreira
Copy link
Member

@marinagmoreira marinagmoreira commented Nov 9, 2023

Creating a proof of concept executor that would fill the gap between the survey manager goals and the robot commands.
This is still very archaic, I just tested it in simulation that I was able to interact with the inspection_tool normally and that the recursive -resume worked.
There are still many things to add like the -ns flags to the commands, the /env_wrapper for robot commands, proper error handling, etc, but I figured the sooner I share this and get feedback the better such that we're all working towards a common goal. (The interface will also be changed as the plan_interpreter.py starts taking shape.. we could even join both scripts)

@trey0 @Bckempa

Copy link
Contributor

@trey0 trey0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Marina- I bet this PR is pretty out of date by now but just wanted to provide some feedback. Happy to talk more about next steps!

astrobee/survey_manager/CMakeLists.txt Outdated Show resolved Hide resolved
astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
send_command("python gds_helper_batch.py -i cmd -- bagger -start stereo_" + bay + "_" + run)
send_command("python gds_helper_batch.py -i cmd -- plan -load plans/ISAAC/" + bay + "_stereo_mapping.fplan")
send_command("python gds_helper_batch.py -i cmd -- plan -run")
send_command("python gds_helper_batch.py -i cmd -- bagger -stop")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a work in progress, just wanted to re-raise some issues we've discussed that need resolution;

  • Propagating errors up from these subcommands to represent action failure (may be a good place to use Python exceptions)
  • But probably good to have a default interactive retry loop around most commands (continues on success, waits for operator feedback about retry on failure, and if you choose retry the plansys2 executor is never told about the failure). For the multi-subcommand actions, it's an interesting question whether the retry should wrap around the whole sequence or the individual subcommands.
  • Some subcommands (bagger -stop) sort of want to be in a Python finally: clause so they always get run even if an earlier command fails.
  • Multi-robot will affect this somehow. Probably either wrapping other-robot commands in ssh or somehow namespacing the ROS topics.

Copy link
Contributor

@trey0 trey0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Marina- This is a nice start! I think it has a ways to go, and I'm a bit concerned about getting it to be really robust (like, can we test it enough in the relevant edge cases).

astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
try:
while True:
# Get output from process
output = process.stdout.readline()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a busy loop. You can avoid that using select.select().

# Check if the user wants to exit
if user_input.lower().strip() == 'exit':
break
sock_input.send(user_input.encode("utf-8")[:1024])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not great to silently discard user input past 1024 characters. Possibly replace with an assertion that user_input is not too big, or at least print a warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user input is always just two characters, anything over that is not valid. So I actually added way more than needed (not sure if that's something to avoid)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that 1024 is too short. I'm saying that in general it's bad to accept an invalid input and silently try to sort-of-but-not-really fix it and keep going.

So the recommendation is to switch from silently truncating the string to doing something more sane. It could be reporting a warning and dropping the invalid input, or it could be raising an AssertionError when an assert fails, whatever behavior works out best for the user.

astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
# Check if the user wants to exit
if user_input.lower().strip() == 'exit':
break
sock_input.send(user_input.encode("utf-8")[:1024])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are trying to write line-oriented messages to a byte-oriented stream but without boundaries between lines (like \n). This is not robust. There is no guarantee that the whole line will be sent in one send() call, nor that the receiver will receive it with one recv() call.

You should check the docs for socket.send(). Unfortunately, you need to pay attention to the number of bytes sent and keep trying (with a truncated buffer) if not all were sent.

And if you want to make sure you're sending complete lines, it would be way better to use a message protocol where finding message boundaries doesn't rely on UTF-8 decoding. Like each message could start with a binary representation of the number of encoded bytes in the line, then the rest of the message is exactly that number of encoded bytes. The reason to do it this way is the mismatch between UTF-8 symbols and bytes. If you try to decode the first part of a byte string that encodes a UTF-8 string, the byte string could be truncated in the middle of a multi-character UTF-8 symbol, so the decoding will fail. But if you make sure you get all the bytes corresponding to a complete UTF-8 string, you won't have that problem.

(Note that the decoding problem I'm describing is one of those things that will probably work most of the time but occasionally fail for obscure reasons, so a nasty bug that would be hard to replicate.)

while not self._stop_event.is_set():
print("waiting to receive")
try:
request = client_socket.recv(1024).decode("utf-8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a robust way to receive a line. There is no guarantee that your recv() call will return the complete message sent by a send() call. Your decode() call could fail if the byte sequence is truncated in the middle of a UTF-8 symbol. In another comment I describe a protocol you could use to ensure you get a complete message and avoid the decode() problem.

astrobee/survey_manager/scripts/command_astrobee Outdated Show resolved Hide resolved
marinagmoreira and others added 6 commits December 7, 2023 14:59
* Add `survey_manager` subdirectory without layout for survey nodes

Includes placeholder directories with hyperlinked `readme.md` files in
the documentation hierarchy for the manger, planner, and bridge.

Per discussion with @marinagmoreira `survey_manager` lives under
`astrobee` to signify the code is intended for cross-compilation.

* Improve linking to survey manager docs

* Add files via upload

Adding planning domain, problem, and xml of the atomic actions of the robotic agents

* added  files to the correct directory

* adding soft constraints problem files

* Attempt at a complete MVP domain model

* Oops, remove redundant effect

* Simplified collision checking predicate

* In a working state

* Add sample_output_plan.txt

* Oops, remove obsolete commented-out code

* Fixed stereo survey part of the model based on analysis of old stereo surveys by Marina - unfortunately causes additional planner flakiness

* Update sample_output_plan.txt to reflect latest domain/problem

* Now generate PDDL problem from higher-level problem specification

* Tune panorama estimated duration

* Add plan_interpreter.py. Minor cleanup in problem_generator.py

* Simplify dynamic config just a bit

* Add `survey_manager` subdirectory without layout for survey nodes

Includes placeholder directories with hyperlinked `readme.md` files in
the documentation hierarchy for the manger, planner, and bridge.

Per discussion with @marinagmoreira `survey_manager` lives under
`astrobee` to signify the code is intended for cross-compilation.

* Improve linking to survey manager docs

* Remove `survey_bridge` capability will be added to `astrobee`

* Add traclabs plansys2 backport via submodule, thanks @ana-GT

* Move sub-modules to `survey_manager` path

* Upgrade behaviortree to V4

NOTE: If already installed, remove V3 before installing V4

* Cleanup unused, misplaced sub-modules, again.

* Remove `survey_manager` package and organize `survey_planner`

* Deprecate Ubuntu-16.04 (xenial) builds of Isaac to support Plansys2

- Remove Ubuntu-16.04 (xenial) CI builds
- Update dockerfile ubuntu version defaults to Ubuntu-20.04 (focal)
- Update apk build environment to Ubuntu-20.04 (focal)

* Fix python formatting

* Update to forks of traclabs backports

* Add readline development files to setup for plansys2 terminal interface

If you have already built your VM, run `sudo apt install libreadline-dev`

* Removed outdated PDDL files per @trey0

nasa#107 (comment)

* Revert CI upgrades for APK builds

---------

Co-authored-by: Abiola Akanni <Aoakanni@smcm.edu>
Co-authored-by: Trey Smith <trey.smith@nasa.gov>
@marinagmoreira marinagmoreira changed the base branch from develop to survey_manager December 14, 2023 21:30
@marinagmoreira marinagmoreira changed the base branch from survey_manager to develop January 3, 2024 03:16
@marinagmoreira marinagmoreira marked this pull request as ready for review January 17, 2024 17:10
Copy link
Member

@bcoltin bcoltin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing is I would elaborate in the documentation on command_astrobee, it's unclear what this is for if you aren't already aware.


# This class starts a new process and lets you monitor the input and output
# Mostly used for actions where user inteference might be required
class ProcessExecutor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very unclear to me from the name and comment what this class does--- can you be more specific?

# If broken pipe connect again
if not request:
break
print("got: " + request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were these to debug, should they stay here?

# Get the return code of the process
process.kill()
finally:
# Forcefully stop the thread (not recommended)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not recommended?

)
args = parser.parse_args()

exit_code = survey_manager_executor(args.robot_name, args.command_name, args.to_bay, args.from_bay, args.run, args.config_static)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.exit(...)?

while True:
# Get user input dynamically
user_input = input()
print("user input: " + user_input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still want this?

Copy link
Contributor

@trey0 trey0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much progress here! Thank you!

I have lots of comments, but I would like to go ahead and merge this quickly and then work to iterate.

berth:
berth1: "1"
berth2: "2"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would note this whole section seems intended for managing inter-module moves using the survey manager, which is outside our agreed scope. Let's not get too wrapped up in working on this yet. Good to note it's isolated into fields of the YAML file that are not used elsewhere yet, so it shouldn't break anything.

usl_bay3: "-pos '2.6 0 4.8'"
usl_bay4: " -pos '1.55 0 4.8'"
usl_bay5: "-pos '0.5 0 4.8'"
usl_bay6: "-pos '-0.5 0 4.8'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section is implementing features that are out of scope, so I don't want to encourage more work on this right now.

But here are some thoughts for improvements if we are able to add inter-module motion as a stretch goal:

It's too bad many of the positions declared here are copy/paste of position data declared in bays above. Probably a cleaner way overall would be to:

  • Separate the complete list of known location pose declarations from the (subset) list of locations that should be included in the PDDL problem instance. This is potentially important because including irrelevant locations can hurt planner performance.
  • The location pose declarations can be split into position and (optional) attitude fields.
  • The correct move arguments for a location can be generated from the pose declaration.
  • The current problem generator has a dumb way to infer which locations are connected, relying on everything being sorted by bay number, that would need to be revisited if there are multiple modules involved.
  • The current PDDL domain model can't readily represent different attitudes required for moving in different directions. But that's ok. The action executor receives both from and to position for each move action and could use this info to implement an attitude constraint without exposing this level of detail to the planner.

@@ -2,7 +2,7 @@
;; Command was: ./tools/problem_generator.py
;; Working directory was: /home/vagrant/isaac/astrobee/survey_manager/survey_planner
;; Problem template: pddl/jem_survey_template.pddl
;; Config 1: data/jem_survey_static.yaml
;; Config 1: data/survey_static.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remember it's not currently in scope to manage inter-module moves with the survey manager. It would be viable to use different static config files for the JEM and USL surveys, and might be simpler. At least we should keep that in mind as an open option.

self.ack_msg = msg
self.ack_needed = False

def publish_and_wait_response(self, cmd):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended return value semantics are not documented. If the only possible return values are currently 0 or 1, maybe the return type should be changed to bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per exit codes standards, anything other than zero is failure, so I'm passing a non-zero code back if it fails. I want it to be a number instead of a bool because we're adding the exit_codes to evaluate overall success

@marinagmoreira marinagmoreira merged commit f3e5df1 into nasa:develop Jan 20, 2024
4 checks passed
@marinagmoreira marinagmoreira deleted the survey_manager_execute_goals branch January 20, 2024 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants