Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which options #5

Closed
Tsunehiko opened this issue Jul 2, 2019 · 24 comments
Closed

which options #5

Tsunehiko opened this issue Jul 2, 2019 · 24 comments
Assignees

Comments

@Tsunehiko
Copy link

Hello. I am not very familiar with reinforcement learning.

python3 main.py --log-dir trained_models/acktr --algo acktr --model squeeze --num-process 24 --map-width 27 --render

In the above, main.py does not work. Please tell me the options you need to run 'main.py'.

@smearle
Copy link
Owner

smearle commented Jul 2, 2019

Hi Tsunehiko, thanks for your interest.
Apologies, the Readme is outdated. Try this:

python3 main.py --experiment test000 --model FullyConv --num-process 24 --map-width 16 --render --overwrite --algo a2c

Please have a look at arguments.py or call python3 main.py --help to see a full list of arguments available.

@Tsunehiko
Copy link
Author

Thank you for your quick response.
I want to try other algorithms on Micropolis. Is it possible to connect gym_micropolis / envs / env.py to other algorithms (eg stable_baselines) as a gym environment?

@smearle
Copy link
Owner

smearle commented Jul 2, 2019

Funny, this repository is actually an adaptation of the following: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail, which is itself adapted from OpenAI baselines.

I suppose one could use this environment with a different learning algorithm. There would be two roadblocks for you to overcome, at least that spring to mind:

  1. The function setMapSize() passes a bunch of micropolis-specific options to the environment. So you'd need to edit env.py to call this in the init() function, with predetermined options.
  2. The action space in Micropolis is very large. It corresponds to a flattened 2D image the size of the map, with number of channels corresponding to number of tile-specific actions, so you'd need to make sure the model being used was compatible with this (i.e., I'm exploring fully convolutional NN's that retain a spatial correspondence between the observation and action spaces - you can find them in model.py; I don't think you'd want to use a net that bottlenecks too drastically - Atari has a much smaller action space, for example).

Indeed it would be nice for this to be incorporated as part of the standard family of gym environments. I am working on it here for now because I'm still experimenting with the environment itself, e.g., playing with different reward functions, designing mini-games, playing with different map sizes.

Which algorithm were you thinking of using in particular (stable_baselines is a collection of algorithms, not one single algorithm)?

@smearle
Copy link
Owner

smearle commented Jul 2, 2019

Feel free to shoot me an email if you have any more questions, ideas, or want to chat about RL :)

@Tsunehiko
Copy link
Author

Thank you for your kind explanation and kindness.
The reason I used stable_baselines is that I wasn't used to RL, and stable_baselines seemed to be the easiest way to handle the algorithm. (There was no particular emphasis on the algorithm.)

In model.py, an error occurs when the modules of densenet_pytorch and ConvLSTMCell are not found. Which module should I install?

I will ask you by email from next.

@smearle
Copy link
Owner

smearle commented Jul 2, 2019

Aha, densenet_pytorch is an old dependency, so I got rid of that, and I added ConvLSTMCell.py to the repo. Do a 'git pull' from inside the repo and try again!
Thanks for bringing my attention to these problems.

@smearle
Copy link
Owner

smearle commented Jul 2, 2019

And indeed, you'd probably be wise to start with RL by playing with stable_baselines and some Atari games, etc. But I selfishly would rather you play with this repo because you're helping me troubleshoot it :)

@Tsunehiko
Copy link
Author

Thank you for your quick response. I'm glad if I'm useful too.
Can I use the --no-cuda option? I can not use a machine with GPU now, but I have to handle it only with CPU :(

By the way, I'm thinking of using this repo for at least July.

@smearle
Copy link
Owner

smearle commented Jul 2, 2019

Not ideal, but no problem. Just patched up something that was in the way of using no-cuda. I've now got the above command working with --no-cuda. Do another pull and let me know if you have any luck.

@smearle smearle self-assigned this Jul 2, 2019
@Tsunehiko
Copy link
Author

python3 main.py --experiment test000 --model FullyConv --num-process 24-map width 16 --render --overwrite --algo a2c --no-cuda
I ran the above command. The displayed screen only passes time, and no action is taken. How can I see how agents are playing? May I use enjoy.py?

@smearle
Copy link
Owner

smearle commented Jul 3, 2019

Strange, I can't replicate this. What are you getting in the command line?

@Tsunehiko
Copy link
Author

The command line output is very large, so I don't know which one to write. I wrote the last command line output. Also, the output of the screen at that time looks like an attached image.

PLAYCITY
('PAUSED', False, 'running', True)
PLAYMODE
('PAUSED', False, 'running', True)
len of intsToActions: 4864
 num tools: 19

len of intsToActions: 4864
 num tools: 19
('WINDOW SIZE', 800, 608)
len of intsToActions: 4864
 num tools: 19
('WINDOW SIZE', 800, 608)
{'Road': ['Road', 'Wire', 'Rail', 'Water'], 'Wire': ['Road', 'Wire', 'Rail', 'Water'], 'Rail': ['Rail', 'Wire', 'Road', 'Water'], 'Water': ['Road', 'Water', 'Wire', 'Rail'], 'Net': ['Net', 'Airport'], 'Airport': ['Net', 'Airport']}
PLAYCITY
('PAUSED', False, 'running', True)
PLAYMODE
('PAUSED', False, 'running', True)
len of intsToActions: 4864
 num tools: 19
/home/tsunehiko/gym-micropolis/micropolis/MicropolisCore/src/images/tileEngine/tiles.png
/home/tsunehiko/gym-micropolis/micropolis/MicropolisCore/src/images/tileEngine/tiles.png

スクリーンショット 2019-07-03 7 57 23

@smearle
Copy link
Owner

smearle commented Jul 3, 2019

This is the most recent command line output? If time is passing on the map, then training must be underway, so you must be getting printouts indicating avg. reward, number of frames and the like. Try the same with --log-interval 1 so that this printout will occur as frequently as possible. Depending on your system training might be very slow. You might also adjust --num-proc 2, so that only 2 games will run during training (there's a bug with only 1, at the moment). This will speed up gameplay in those individual games.

@Tsunehiko
Copy link
Author

I tried both --log-interval 1 and --num-proc 2, but the map still displayed does not change, only time is advancing. The output displayed at the end of the command line is shown below.

==== STARTGAME                                                                                       
GENERATECITY                                                                                         
STARTMODE                                                                                            
('PAUSED', True, 'running', False)                                                                   
('SWITCHPAGE', <micropolisnotebook.MicropolisNotebook object at 0x7fd3f24d3e10 (pyMicropolis+micropolisEngine+micropolisnotebook+MicropolisNotebook at 0x563695516510)>, <micropolisnotebook.MicropolisNot
ebook object at 0x7fd3f24d3e10 (pyMicropolis+micropolisEngine+micropolisnotebook+MicropolisNotebook at 0x563695516510)>, <micropolisnoticepanel.MicropolisNoticePanel object at 0x7fd3f24d5630(pyMicropolis+micropolisEngine+micropolisnoticepanel+MicropolisNoticePanel at 0x563695518340)>, 0)              
('WINDOW SIZE', 800, 608)                                                                            
('WINDOW SIZE', 800, 608)                                                                            
('WINDOW SIZE', 800, 608)                                                                            
('WINDOW SIZE', 800, 608)                                                                            
{'Road': ['Road', 'Wire', 'Rail', 'Water'], 'Wire': ['Road', 'Wire', 'Rail', 'Water'], 'Rail': ['Rail', 'Wire', 'Road', 'Water'], 'Water': ['Road', 'Water', 'Wire', 'Rail'], 'Net': ['Net', 'Airport'], 'Airport': ['Net', 'Airport']}                                                                        
PLAYCITY                                                                                             
('PAUSED', False, 'running', True)                                                                   
PLAYMODE                                                                                             
('PAUSED', False, 'running', True)                                                                   
{'Road': ['Road', 'Wire', 'Rail', 'Water'], 'Wire': ['Road', 'Wire', 'Rail', 'Water'], 'Rail': ['Rail', 'Wire', 'Road', 'Water'], 'Water': ['Road', 'Water', 'Wire', 'Rail'], 'Net': ['Net', 'Airport'], 'Airport': ['Net', 'Airport']}                                                                        
PLAYCITY                                                                                             
('PAUSED', False, 'running', True)                                                                   
PLAYMODE                                                                                             
('PAUSED', False, 'running', True)                                                                   
len of intsToActions: 4864                                                                           
 num tools: 19                                                                                       
len of intsToActions: 4864                                                                           
 num tools: 19                                                                                       
BASE NETWORK:                                                                                        
 MicropolisBase_FullyConv(                                                                           
  (embed): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1))                                         
  (k5): Conv2d(32, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))                            
  (k3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))                            
  (val_shrink): Conv2d(32, 32, kernel_size=(2, 2), stride=(2, 2))                                    
  (val): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))                            
  (act): Conv2d(32, 19, kernel_size=(1, 1), stride=(1, 1))                                           
)                                                                                                    
/home/tsunehiko/gym-micropolis/micropolis/MicropolisCore/src/images/tileEngine/tiles.png             
/home/tsunehiko/gym-micropolis/micropolis/MicropolisCore/src/images/tileEngine/tiles.png 

@smearle
Copy link
Owner

smearle commented Jul 3, 2019

How fast is time moving? An update only occurs for me in the late 1900s. You could make --map-size 4 and --max-step 16? This should make updates occur by 1950 latest.

@Tsunehiko
Copy link
Author

I tried --map-size 4 and --max-step 16, but the speed of the time was ultrafast and increased infinitely beyond the 1950s.

@smearle
Copy link
Owner

smearle commented Jul 3, 2019

I'm at a loss. I can only recommend looking at line 265 in main.py, which should be printing out, and working backward from there with print statements, trying to find out why 265 is never reached.

@Tsunehiko
Copy link
Author

Thank you for checking carefully.
I examine the code and try it myself. I will contact you again if I have any questions.

@smearle
Copy link
Owner

smearle commented Jul 4, 2019

Also, try without --render, and see if you get any printouts.

@Tsunehiko
Copy link
Author

I tried without --render. Results are now displayed! However, no matter how much learning is done, reward will not increase ...
スクリーンショット 2019-07-06 23 01 18

@Tsunehiko
Copy link
Author

Also, an error like an image has appeared. How do I fix this error?
スクリーンショット 2019-07-06 23 43 03

@smearle
Copy link
Owner

smearle commented Jul 6, 2019

That's some good news! I suspected that the problem stemmed from the GUI. In particular, there must be a call to gtk.main_interation(), or something to that effect, hidden somewhere. This function runs the GUI indefinitely, waiting for user input, so it would stop our training code dead in its tracks (and simply let the game run very fast). Strange that I don't experience the same issue on my end, and can't find a call to the function in the code. Might be an operating system-specific issue.

As for the "Unable to init server" error, I think this might be the result of too many dead python processes hanging out on the cpu (interrupting training is not yet handled gracefully by the code). Try again after pkill -python or simply after restarting your machine.

To see if the bot's doing anything, git pull the repo, then try again with the option --print-map which will print an array representing (the 0th-ranked environment's) game map, with different tile-types corresponding to different integers. If this map is not changing, then the bot is not building on the map.

@Tsunehiko
Copy link
Author

Tsunehiko commented Jul 7, 2019

The array displayed by --print-map has changed, so it seems that learning is possible. How do I check the bots I learned? Should I use enjoy.py?

Also, "Unable to init server" is an error that was displayed when I tried using a GPU server that I can use, not my machine. This may be the cause.

@smearle
Copy link
Owner

smearle commented Jul 7, 2019

Yes, you can try something like python enjoy.py --render --map-size 16 --load-dir a2c_FullyConv_w16/test000 (I wonder if we'll get stuck in the same GUI loop though!).

@smearle smearle closed this as completed Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants