Skip to content

Anatomy of a Dota 2 Replay File

Sam edited this page May 22, 2020 · 2 revisions

Where are these replay files?

If you're here, you already know that a .dem file is a Dota 2 game replay. You can find these files in different places, depending on your platform.

Mac OS X: ~/Library/Application Support/Steam/SteamApps/common/dota 2 beta/dota/replays

Windows: in a similar directory structure in C:\Program Files (x86)\Steam.

Linux: Usually can be found in ~/.local/share/Steam/steamapps/common/dota\ 2\ beta/game/dota.

In any case, when you download a replay in-game, this is where they end up. You will see these files referred to interchangeably as "demo" and "replay."

Google Protocol Buffers (aka "protobuf")

Google Protocol Buffers is a lightweight system for defining communications across a network. Protobuf messages definitions are written in a special mini-language. To get code for handling these messages, they must be "compiled" targeting a specific programming language, like Python or Java.

The good news is: all of this is done for you in smoke. You can see the protobuf definitions in demo2.proto, and the compiled code in dota2_palm.py. When new versions of the Dota 2 client come out, if they break smoke, the protobufs might just need updating. If you notice this, open an issue!

SteamRE is an excellent reverse engineering project which maintains up-to-date Dota 2 protobufs. This project uses its files.

Replays are simple...

Right!

Replays are very simple in format. The data are complex, but the format is simple. A replay starts with a twelve-byte header.

The first 8 bytes are a string literal PBUFDEM\0.

The next 4 bytes are a little-endian unsigned integer indicating a byte offset in the file to a game summary protobuf message.

The remainder of the file are entries of the following pattern (smoke refers to one set of these data as a Peek):

kind
tick
size
message

...where kind, tick, and size are protobuf-encoded variable-length integers (aka "varints").

message is protobuf-encoded binary data of length size. If kind has a certain bit set, message must be decompressed with Google's snappy library before deserializing it with protobuf.

tick is mostly irrelevant, but for completeness it's the time elapsed in replay time. It's analogous to the position of the progress bar watching a replay. No relation to game time.

Other than that, the code generated from dota2.proto handles deserialization of message data. And once you have a protobuf object, it's simple property access.

Embedded Data

Of course, it's not that simple. All the messages at the top level of replay files (as described above) are referred to as "DEM" messages in the protobuf definitions. There are a finite and small set of them. Here are the actual values of kind (minus the compression bit, listed last):

DEM_Stop = 0
DEM_FileHeader = 1
DEM_FileInfo = 2
DEM_SyncTick = 3
✝DEM_SendTables = 4
DEM_ClassInfo = 5
DEM_StringTables = 6
✝DEM_Packet = 7
✝DEM_SignonPacket = 8
DEM_ConsoleCmd = 9
DEM_CustomData = 10
DEM_CustomDataCallbacks = 11
DEM_UserCmd = 12
✝✝DEM_FullPacket = 13
✝✝✝DEM_SaveGame = 14
DEM_IsCompressed = 112

✝ has embedded data

✝✝ has an embedded DEM_Packet, which itself has embedded data

✝✝✝ no one really knows wtf this is yet, but it's related to "replay takeover" and probably has the entire world state embedded in it

DEM_SignonPacket and DEM_Packet are both parsed as DEM_Packet, but the "signon" variety has vital pre-game information used to set up state required for processing.

Within some of these DEM protobuf messages (marked above) there is yet another, embedded stream of two entirely different classes of messages. These classes are "svc" and "net", and they share a numeric namespace:

net_Tick = 4
net_SetConVar = 6
net_SignonState = 7
svc_ServerInfo = 8
svc_SendTable = 9
svc_ClassInfo = 10
svc_CreateStringTable = 12
svc_UpdateStringTable = 13
svc_VoiceInit = 14
svc_VoiceData = 15
svc_Sounds = 17
svc_SetView = 18
svc_UserMessage = 23
svc_EntityMessage = 24
svc_GameEvent = 25
svc_PacketEntities = 26
svc_TempEntities = 27
svc_GameEventList = 30

Unlike "DEM" class messages, these embedded streams have no tick in the header. Otherwise, they are very similar:

kind
size
message

Embedded messages are never compressed.

The Takeaway

To understand replays--and, of course, to parse them--you have to understand all of the "DEM," "svc," and "net" messages and what they mean. First, some concepts.