The Speed of Life & Learning

Initial reflections about time in Unscripted (14 Nov 2017)

How is time managed and shared among bots?

We could either use a round-based system to keep everything fair among all the remotely controlled bots. A bot can do at most one thing within a round. But we don't want a laggard to slow the population down. Another approach is to use rounds with fixed-duration and if a bot is too slow it loses its chance to act during a round but can act in the next round.

If we do that, then which duration should we chose for a round?

Another option is to have no round at all. Any bot can act any time. With the risk of request flooding from very fast bots. Then perhaps we could impose a limit.

At the moment (Nov 2017) we don't have fixed-duration round, the system let bots acts any time, as often as they like.

How long should a bot's life be?

(assuming they don't die prematurely due to an accident or careless behavior)

We could fix a general duration, e.g. l real days. But what is a day? What is a time unit or a time step in Unscripted?

We decided to go for another approach where a bot dies after N actions. Which makes things more fair for all bots. Faster bots have a shorter life but all in all they can achieve as many things. You can compare that with say the difference between the fast reaction of a fly and slow pace but long life of a turtle. This approach was chosen to avoid making hard decisions about time units but it creates issues for example an 'eternal' bot that acts only every 100 years or cheat by hybernating and 'waking up' only during eras where the world is more welcoming and rich. So we'll need to revisit that choice.

How long is an action?

We can't completely avoid the definition of a time step. Bots can perform discrete actions in the world. Some actions will be longer than others. Currently we have a 'walk' and 'drink' action. Walk moves the bot forward in a straight line by 0.8 meter which is approximately the average size of a man's step. A time step is thus ~0.6s (assuming average speed of 5km/h) compared to real life. Although the focus of Unscripted is not on low level physical details or hyper-realism, one of its key principles is to place the bots in similar conditions as humans so we can read and compare the evolution of their behavior more intuitively.

Some actions in the future could take longer. For instance if a bot chops a tree. So should we slit actions into smaller pieces that match our 0.6s time step? A simple approach would be to give a completion duration to each action, if much more than ~0.6s then it will be like a progress bar: the first action starts the progress and they can cancel at any time. If they cancel and restart, it will be from 0. Or we could save the progress in the state of the object (e.g. tree.chopped = 0.4 for 40% chopped) which would be more realistic and allow more interesting scenarios (chop a bit every day, or multiple bots chopping the same tree to finish faster).

How fast should things go to let bots learn quickly?

I haven't implemented any adaptive bot's mind yet. They just act randomly. Probably the hardest challenge in Unscripted is the ability for bots to learn in the almost complete absence of feedback. One of the core principles of Unscripted is to see how adaptive life can emerge from just a few nature-inspired ingredients and without any artificial feedback. There is not even a concept of pain or pleasure to guide their minds. They can learn to interpret external states, actions and their results as pain and pleasure if they decide to but the world engine won't give them anything like that. The key element that triggers the adaptive dynamic is reproduction, that is the ability for part of a bot's mind to survive through their children.

The consequence of this is that learning will be terribly slow. And since we start from zero knowledge about the world, the first generations of bots will be completely dumb. It will take many generations and many death to witness the faintest sign of adaptations. We don't want to wait millions of real-time years!

So here are some possible ways to speed things up:

shorter lifespan (at least a the beginning)
exaggerate the effects of shortage of primary needs (e.g. an 'hour' without a drink and the bot dies of dehydration)
increase the population size
keep the world simpler and small (less things to learn, smaller search space)
bots minds that learn more quickly
faster acting bots and faster world engine

The last two items are the most crucial for a proof of concept. We'll need to have bots with very fast minds (i.e. decision making code running on the client side) and a server-side framework that can cope with a deluge of actions. Things are getting perhaps even more unrealistic if we consider that I can only afford to work with a i7 laptop at the moment. This leads to a few other questions.

How many actions per seconds?
How can we make the bot minds learn more rapidly in general? Extract the most from their experience?
How can we optimise the framework architecture and code?

This all seem completely impossible. Which makes things even more exciting! For a very first proof of concept I'd be ready to simplify the world to the absolute maximum to show any kind of adaptation is possible. Then we can start developing things from there. Increasing the complexity of the world and working with faster framework and more educated generations.

Optimisations

Possible optimisations:

cheat and run the bots on the server, by bypasssing the web API
run the database in memory (with regular and async dumps to disk)
custom coding of the bots minds (still without prior-knowledge or rules but with smaller models and tightly written code)
use GPU optimisation for bots minds
parallel processing
distributed processing on a few machines

Targets and Measurements

Target performance level for the server side framework (API+World Engine): we want to support a population of 100 bots living 100 times faster than humans.

All tests done with bot mind and framework on the same laptop (i7 6500U / 2 cores, 12GB RAM, cpython) with no other significant CPU load. The test is done by running

python manage.py utest any simulate --cycles 100

20 Nov 2017: ~20000 times below target (~2^14) (~50000 x on Raspberry Pi 3, using pypy, mongodb 2.6.10, ~2^16)
23 Nov 2017: ~10000 times below target (~2^13) (~20000 x on Pi)
24 Nov 2017: ~3300 times below target (~2^12) (~8500 x on Pi). Optimisation: django-cached the actions of all the types of Things, also cached the module keys for each type of Thing. Note that the small population of 10 bots now acts faster than real-time (i.e. a virtual walking step takes less time than a real step). This is still far too slow to start working on reinforcement learning. Noticed that the speed slowly increases when the simulation is running over 300+ cycles. I've seen ~9000 x on pi3 after 1000 cycles. MongoDB 3.2 on host makes things run ~8% times faster than 3.4 on docker.
26 Nov 2017: profiling the code shows that 80% of the time is spent in server-side python. More caching and smarter algorithms could be applied but there's a limit to what we can do and there's a danger of optimising too early. Ideally the Thing modules and the Engine in the framework should be easy to write and edit to keep experiments spontaneous and open. So I've decided to replace the Mongodb backend with a simpler in-memory collection of Thing objects in python. This should considerably speed things up but will create some severe limitations: no server-side parallelism (at least for now); the whole world must hold in memory; all data modifications can only be done by a single process on the server side. That's absolutely fine for a small prototype. But I'll keep the framework compatible with mongodb backend via an abstraction layer so we can support larger worlds in the (distant) future.
28 Nov 2017: ~1400 times below target (~2^11) after replacing requests with PyCurl on the client side (reuse the connection settings and keep connection alive) and using bjoern as a wsgi server on the server side rather than django runserver. Note that I couldn't install bjoern on PyPy which is prob. due to the fact it's written in C.
Also worked on the communication performance on its own as it was a bottleneck at some point. Can be tested with python manage.py ucase any conn --cycles 10000 (for empty request/response, i.e. no logic, all execution is related to communication). Progressed from 300 reqs/s. to 1700 reqs/s.
3 Dec 2017: ~271 x below target (i.e. 620 rps, ~2^8) after applying several optimisations, using python 3.6, websockets. Note that PyCurl causes seg faults with pypy 2 and pypy 3.
10 Dec 2017: ~180 x below target (i.e. 1130 rps or ~100 cycles/s, ~2^7) after further optimisations of pymemdb, buffering stdout, implementing an asyncio server (server side is aiohttp, client side uses websockets package). Note that we know bypass completely django router and middlewares so we won't gain anything from dropping Django. At that speed, we can compress the life (say expectancy of 50 years) of a population of 10 into 6 months.

A day is compressed to 15 minutes. Unless I have omitted something obvious, I doubt I'll get much further in term of speed. So I'll have to use subterfuges to unrealistically reduce the lifespans otherwise the learning rate will be appalling: shorter max lifespan to 20 years, reduce age of puberty to 8, ... another option is to reduce the duration of a virtual day to 3 hours (with time compression of 100 that's around a minute and a half in real time). 20 years / 8 / 100 ~= 9 real-time days. 9 days in our reality for a bot to live until 20 'years' and die of natural death. We can also remove the need for sleep so 20 years of their life is more like 26 of our real active life.

The next step now is to set up a client machine (world; 6 cores + GPU) connected via cable to a server machine (bots mind; i7 2 cores) and start working on the bot mind. I expect this to be even more challenging as the mind's model will be difficult to run in real time.

11 Dec 2017: with two machines connected with cable, the speed drops from 1130 rps (local) to 663 rps (i.e. ~60 cycles/s). CPU utilisation is low at both ends (30% client, 60% server).

Optimisation left-overs: multi-process service with world data shared in memory; prepare next cycle while waiting for response; use websockets; further code optimisation;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly