Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM firmware reverse engineering #64

Open
quo opened this issue Oct 31, 2022 · 25 comments
Open

SAM firmware reverse engineering #64

quo opened this issue Oct 31, 2022 · 25 comments

Comments

@quo
Copy link

quo commented Oct 31, 2022

First of all, @qzed thanks for all the work you've already done reverse engineering and documenting a lot of the SAM stuff. It's been quite helpful!

I've been reverse engineering the SP7 SAM firmware (specifically SurfaceSAM_14.312.139.bin) in an attempt to debug something. Haven't found anything particularly useful so far, but figured I would share regardless. Let me know if you'd like me to look at anything in particular.

Click here for info dump

Firmware structure

As can be seen on SP7 teardown photos, the SAM microcontroller is an NXP LPC54S001J (Cortex-M4, 360KB RAM), with a separate Winbond 16MB flash chip.

The SVD is for a slightly different part number, but it does the job. There's a script to import SVDs into Ghidra, but it fails due to some overlapping address ranges. You can either remove these from the SVD or hack the script a bit.

The bin file consists of a signature and two firmware images. The images are encoded as arrays of { u32 offset, u8 len = 16, u8[16] data }, so can be extracted fairly easily. The two images are identical, except one is meant to be flashed at 0x10004000 and the other at 0x10084000 (standard A/B update handling), so some addresses differ by one bit. The images have a header and end with a CRC16, which are used by the SAM when flashing. The actual raw firmware image starts at 0x66C.

The raw firmware images start with a standard ARM vector table, and contain an NXP image header which tells the NXP bootloader to load the first 0x29484 bytes into SRAM at address 0. For reverse engineering, you can just split the raw image and load the first 0x30000 bytes at 0, and the remaining 0x50000 bytes at 0x10034000 or 0x100B4000.

The firmware contains an RTOS which is internally referred to as Kaos. I can't find anything about it on Google, so I assume it was created by MS. Kaos appears somewhat inspired by FreeRTOS and offers the same basic primitives: tasks, timers, events, semaphores, and message queues. There are dozens of tasks and timers, which communicate through dozens of message queues, which leads to a ton of indirection (and memory overhead) so it can be very difficult to follow what's going on. Some parts of the firmware use vtable-like constructions for some additional indirection. Lots of fun.

SAM protocol

(I'll try to use the terminology from https://github.com/linux-surface/surface-aggregator-module/tree/master/doc.)

The following target(/source) IDs are implemented:

  • 0 = Host
  • 1 = SAM
  • 2 = KIP
  • 3 = Debug
  • 4 = Surflink

I believe the SAM just forwards any messages with TID != 1 to different serial connections. (But I have not yet traced the entire message path.)

For TID == 1, only the following TCs are handled by the SAM on the SP7:

  • 01 = SAM: SAM
  • 02 = BAT: Battery
  • 03 = TMP: Thermal
  • 04 = PMC: Power
  • 05 = FAN: Fan
  • 07 = DBG: Debug
  • 09 = FWU: Firmware update
  • 0c = TCL: (Trace/crash logs?)
  • 0d = SFL: Surflink
  • 10 = BLD: Surface Blades
  • 12 = SEN: Sensors
  • 13 = SRQ: (?)
  • 15 = HID: HID
  • 17 = BKL: Backlight
  • 1b = USC: USB-C

The firmware does not explicitly name the TCs in any way (not even using the abbreviated names), so the above names are based on names of related tasks, message queues, etc. Of course, most of these names were already known.

FWU/TCL/SRQ commands are handled together via the NVM message queue, so presumably they all use the flash storage in some way.

Debug mode

The SAM has a debug mode variable (default 0) and a "safe mode" flag (default 1).

The safe mode flag is read with TC 1, CID 0x27, and written by TC 7, CID 0x5F.
When the safe mode flag is true, all CIDs >= 0x80 are disabled (for all TCs), and various other functionality is disabled.

The debug mode is read with TC 1, CID 0x29, and written by TC 7, CID 0x4E.
The debug mode values are:

  • 0 = disabled
  • 1 = also disabled?
  • 2 = basic
  • 3 = possibly related to firmware update?
  • 4 = full

The debug mode can only be set to 0 or 2 normally. It is currently not known how to set it to other values.

There is a command to read arbitrary RAM in debug mode 2, which is very useful, however there appears to be no command to write RAM, even in the higher debug modes. And everything seems to be locked down pretty well (range checks on all command arguments), so I've been unable to find a way to write arbitrary memory so far.

Commands

Here's a complete list of command IDs I've found (for TID 1), and descriptions for some of them. This list will probably contain mistakes! Will try to update this as I figure out more.

Format: CID { command data } => { response data } description
"Handled separately" means the switch handling these commands is in a separate function, so presumably the commands have related functionality.
(Please excuse the poor formatting.)

TC 01: SAM

01  (see existing doc)
02  (see existing doc)
03  (see existing doc)
04  {} disable safe mode and set debug mode 4, if device is in a certain power state?
05  {} enable safe mode and set debug mode 0, if device is in a certain power state?
06  {} nop
07  {} nop
08  {} nop
0b  (see existing doc)
0c  (see existing doc)
0d
0e
0f  (see existing doc)
10  (see existing doc)
13  (see existing doc)  {} => { u32 } get SAM firmware version
14  (see existing doc)
15  (see existing doc)
16  (see existing doc)
17  (see existing doc)
18
19
1a  (see existing doc)
1b  (see existing doc)
1c  (see existing doc)
1d  (see existing doc)
1e  (see existing doc)
1f  (see existing doc)
20  (see existing doc)
21
22  (see existing doc)
23  (see existing doc)
24
25
26  { u8 } set debug mode to 0 if zero, or to 2 if nonzero
27  {} => { u8 } get safe mode flag
29  {} => { u8 } get debug mode
2a
2b
2c
2e  {} => { u8 x, u8 0, u16 0x2e, u32 0 } get active firmware image location (x = 0x11 for 0x10004000, or 0x12 for 0x10084000)
2f
33  (see existing doc)
34  (see existing doc)
35
36
37
38  { u16 } => { u16 }
39
3a
3b
81

TC 02: Battery

01  (see existing doc)
02  (see existing doc)
03  (see existing doc)
04  (see existing doc)
0b  (see existing doc)
0c  (see existing doc)
0d  (see existing doc)
0f  (see existing doc)
18
2d
2e
2f
30
31
32
33
34
3c
3d
3e
3f
42
50
53
51

Handled separately:

00
07
08
1e,20
1f,21
29
2a
2b
2c
35
36
37
38
39
3a
3b
43
44
45
46
47
48
4d
4e
52
54
55
56
57
58
59
5a
5d
5e
61
80
81
86
87
8c
8d
90
94
95
96
97
98
99
9a
9b

Handled separately:

8b

TC 03: Thermal

01  (see existing doc)
03  (see existing doc) (handled twice)
04  (see existing doc)
0c
0d
0e
17

Handled separately:

02  (see existing doc)
03  (see existing doc) (handled twice)
0f
10
11
14
15
16
83
90
91
92
93
94
95

Handled separately:

09  (see existing doc)
0a  (see existing doc)
12
13

TC 04: Power

01
02
04,8b
05
06
07
09
0a
81
83
8a
8c
8d
8e
8f
90  { u8 } set RTOS idle task enabled
91  {} log and reset idle stats

TC 05: Fan

01
02
03
04
05
80
81
83

TC 07: Debug

3f  { u8 } set debug pins connection mode? 0=SAM_Flash, 2=PCH_Logging, 3=SAM_Debug, 4=Touch_JTAG, 5=Power_Monitor, 6=?, 7=PCH_JTAG, 9=Blade_UART
4b  { u8 } set debug log target (0=Debug, 1=Host, 2=KIP, 3=Surflink)
4e  { u8 } set debug mode
53  { u32 } => { u32 } clear a gpio, sleep N microseconds in ram function, then set gpio again
5f  { u8 } set safe mode flag
80  { u8, u8 } => { u8 } flush logs(?) and optionally reset SAM (first byte & 0xe0 must be zero, second byte must be 2 for reset)

Requires debug mode 2 or 4:

03  {} log full OS state
11  {} log fw version and flash location
18  { u8 cmd, u8 module } cmd 0 = log enabled module bits, 1 = enable logging for module, 2 = disable logging for module; module = 0..127
19  { u8 cmd, u8 level } cmd 0 = log loglevel, cmd 1 = set loglevel
30  {} log safe mode state
41  {} same as TCL CID 86?
42  { u8 cmd, u8 module } cmd 0 = log verbose module bits, 1 = enable verbose logging for module, 2 = disable verbose logging for module; module = 0..127
4d  { u32 addr, u16 len } => { u32 addr, u16 len, u8[] data } read memory; only the region from 0x20001000 to 0x200258f0 can be read, max len 94

Requires debug mode 4:

0e  { u8 len, u8[] } => { u8 len, u8[] } ping?
0f  { u8 len, u8[] } => { u8 len, u8[] } ping? (different response type?)
20  {} set power related flag and feed watchdog
32  {} toggle debug LED on
4f  { u8 module, u8 addr, u8 register, u8 len } => { u8 error, u8 len, u8[16] data } i2c read; module = 0..4
50  { u8 module, u8 addr, u8 register, u8 len, u8[] data } i2c write
51  { u8 module } i2c bus scan (module 0xff == all)
54  { u8 module?, u8 }
55  { u8, u8 len?, u8[16], u16 resplen } => { u8[] }
5a  { u8 port, u8 pin } log GPIO pin value
5b  { u8 port, u8 pin } set GPIO high
5c  { u8 port, u8 pin } set GPIO low
63  {} crash (call null pointer)
64  {} read invalid(?) peripheral addr 0x402055aa

When log target is set to Host, SAM will send log messages with TID=3. The request ID for these messages (except CID 49) is set to a hash of internal timers, so is effectively random.
When the amount of data in a log record exceeds 40 bytes, it is split over multiple messages with the same request ID.
The first 8 bytes of the log data are always { u32 timestamp_millis, u32 event_code }. For split records, only the first message will have this header.
There are four types of log record: u32 array, string array, error, and buffer. These use the following CIDs:

43  u32 array (start of split record)
44  u32 array (middle of split record)
45  u32 array (end of split record, or non-split record)
46  null-terminated string array (start of split record)
47  null-terminated string array (middle of split record)
48  null-terminated string array (end of split record, or non-split record)
49  error (request ID is 0, data is { u32 timestamp_millis, u32 event_code, u32 value })
4a  buffer (i.e. raw byte array) (same CID is used for all messages if split)

TC 09: Firmware update

02  {} => { u8 numarrayentries, u8 0, u8 0, u8 4, { u32 version, u8 location+flags, u8 dest_id, u16 0x2e }[7] } get flash status
03  { u8 ?, u8 ?, u8 dest, u8 cookie?, u32 fwversion, u8 1, u8 ?, u8 ?, u8 ?, u8 ?, u8 ?, u16 0x2e } => { u8 0, u8 0, u8 0, u8 cookie, u8 0, u8 0, u8 0, u8 0, u8 ?, u8 0, u8 0, u8 0, u8 ?, u8 0, u8 0, u8 0, u8 0 } firmware upload setup
04  { u8 flags, u8 len, u16 cookie?, u32 offset, u8[] data } => { u16 cookie, u8 error, u8[13] 0 } firmware upload, flags: 0x80 = start, 0x40 = finish
80  {} switch active firmware location
a0,a1,a2

Requires debug mode 3 or 4:

09
0a
0b
0c

Firmware destinations:

  • 0 = SAM firmware
  • 0x12 = USB-C PD firmware?
  • 0xfe = two 32-byte buffers?

TC 0C: TCL

0b  (see existing doc) { u16 bufid } => { u16 bufid, u8 instanceid?, u8 flags? } erase?
0c  (see existing doc) { u16 bufid, u32 offset, u16 readlen, u8 0 } => { u16 bufid, u32 offset, u16 len, u8 status, u8[] } read, max len = 560, status: 0 = more, 1 = end, 0xfd/0xfe/0xff = error
0d  { u16 bufid } => { u16, u8 error } erase things? disabled in safe mode
0e  (see existing doc) {} => { u16 0xffff, { u16, u8 }[8] }
85  {} => { u16 0xffff, u8 0 } call CID 0D for all buffers
86  {} => { u16 0xffff, u8 0 } does something with buffers 2 and 3

Valid buffer/instance combinations:

  • Buffer 1, instance 1/2 = Crash dump
  • Buffer 2, instance 1/2 = ?
  • Buffer 3, instance 1/2 = ?
  • Buffer 4, instance 1 = Battery?
  • Buffer 5, instance 1 = Blades?
  • Buffer 6, instance 1 = Thermal?

Handled separately (data seems to be all zeroes?):

0f  {} => { u16 error, u32 } get something
10  {} => { u16 error, u32 } get something
11  {} => { u16 error, u32 } get something
12  {} => { u16 error, u32 } get something
13  {} => { u16 error, u32 } get something
14  {} => { u16 error, u32[16] } get something
90  {} => { u16 error, u16 size } get total buffer size for command 91/92
92  { u16 pos, u16 len } => { u16 pos, u16 len, u16 error, u8[] data } read some buffer

Handled separately (setters for above):

80  { u32 } => { u16 error } set something
81  { u32 } => { u16 error } set something
82  { u32 } => { u16 error } set something
8a  { u32 } => { u16 error } set something
8c  { u32 } => { u16 error } set something
8e  { u32[16] } => { u16 error } set something
91  { u16 pos, u16 len, u16 unused, u8[] data } => { u16 pos, u16 len, u16 error, u8[] data } write some buffer

TC 0D: Surflink

(TODO The command decoding seems different here. Maybe not CIDs?)

02
03
06
0c

TC 10: Surface Blades

01
02
03
04
05
06
07
08
0a
0c

Handled separately:
(TODO The command decoding seems different here?)

00
0e
10
15
23
2e
33
34
5a
5b

TC 12: Sensors

Instance ids:

  • 5 = BMA223 accelerometer on I2C
03 {} => { ... } read calib
80 { ... } => { ... } read registers
81 { ... } => { ... } write registers
82 {} => { ... } read sensor values
83 {} => { ... } reset default calib
84 { ... } => { ... } set calib

TC 13: SRQ

02  {} => { u8 1, u8 cid, u8 datalen = 32, u32 status, u32 garbage?, u8[32] data }
03  {} => { u8 1, u8 cid, u8 datalen = 1, u32 status, u32 garbage?, u8 safemodeflag } get safe mode flag?
04  {} => { u8 1, u8 cid, u8 datalen = 4, u32 status, u32 garbage?, u16 ?, u16 ? }
05  {} => { u8 1, u8 cid, u8 datalen = 1, u32 status, u32 garbage?, u8 safemodeflag } set safe mode flag?

TC 15: HID

00
01  (see existing doc)
02  (see existing doc)
03  (see existing doc)
04  (see existing doc)

TC 17: Backlight

02
03
04,87
05,88

TC 1B: USB-C

00
06,80
81
82
83

Handled separately:

04
05
@qzed
Copy link
Member

qzed commented Oct 31, 2022

Oh, very nice work!

Regarding debug logs: How are they sent to the host? Standard events? I think it would be nice if we could provide some way to enable them via sysfs and dump them to the kernel logs.

@qzed
Copy link
Member

qzed commented Oct 31, 2022

Regarding terminology: I think SurfLink is the charging/dock connector. Blade (at least on the SPX) is the keyboard connector, which also uses a UART. I don't know the specifics of that on earlier generations like the SP7.

Also there's an older talk from Alex Ionescu about the SP4 firmware (including KaOS, IIRC). Might interest you: https://recon.cx/media-archive/2017/mtl/recon2017-mtl-04-alex-ionescu-Fun-with-Sam-Inside-the-Surface-Aggregator-Module.mp4. (Note that the SP4 uses the old HID interface instead of the "new" UART one).

@quo
Copy link
Author

quo commented Nov 1, 2022

Regarding debug logs: How are they sent to the host? Standard events?

Not events I think, just regular messages. If you do:

./ctrl.py request 7 1 0x4b 0 0 1 # set log target = host
./ctrl.py request 7 1 0x4e 0 0 2 # set debug mode = 2

Then you'll start seeing "dropping unexpected command message" errors in dmesg.
I haven't actually looked at the log data yet, but I'm pretty sure it's almost entirely numbers, with very few strings, so fairly opaque.

Blade (at least on the SPX) is the keyboard connector, which also uses a UART.

"Blades" was MS's term for Surface accessories, but all the news articles about it are from 2013, so I figured it was dead technology. Makes sense that the pins are still there on the keyboard connector, but is the Blade stuff actually used for the keyboard? I thought the keyboard used the KIP or HID messages?

Also there's an older talk from Alex Ionescu about the SP4 firmware (including KaOS, IIRC). Might interest you.

Definitely very interesting! Skimmed it quickly. His description of Kaos seems very similar to what I've seen, but most of the other stuff he mentions sounds different. He says he spent months reverse engineering almost everything, I wonder why he never published anything beyond this talk.

@qzed
Copy link
Member

qzed commented Nov 1, 2022

Then you'll start seeing "dropping unexpected command message" errors in dmesg.

Ah, I think that's what I meant with "events" (messages sent by the EC that are not a response to a direct previous command. Might have a look at that later today.

I haven't actually looked at the log data yet, but I'm pretty sure it's almost entirely numbers, with very few strings, so fairly opaque.

Ah, I kind of feared that. But makes sense if they have some tracing infrastructure set up. Might be worth a try checking how that windows kernel driver trace stuff works and see if there are similarities.

Makes sense that the pins are still there on the keyboard connector, but is the Blade stuff actually used for the keyboard? I thought the keyboard used the KIP or HID messages?

So I'm not sure how the SP4 to SP7+ handle the keyboard stuff since the kernel sees that as USB, but on the SPX, SP8, and SP9, the connector thing for the keyboard is some custom serial/UART thing, which I think is what they call the blade interface. That then gets handled/translated into HID messages by SAM, which it presents via the HID interface. I think the KIP subsystem (likely keyboard and integrated peripherals or something like that) is also involved (also that might be some separate processor due to extra firmware I think), so maybe that translates it instead of SAM.

So in essence: Type cover <--(blade interface)--> SAM <--(HID interface)--> kernel, at least for SPX and everything with the new typecover.

I'm wondering whether the blade thing is relevant for the gens before that at all... maybe SAM somehow has a USB interface on those gens and the blade interface is still used? Or maybe in some reduced capacity? Otherwise I'd have expected it only on the newer Pros and maybe on the Surface Books.

He says he spent months reverse engineering almost everything, I wonder why he never published anything beyond this talk.

Yeah, he has the presentation somewhere as PDF but that's unfortunately all I could find. Also a lot has probably changed since the SP4 days. Especially due to the new interface and added components.

@qzed
Copy link
Member

qzed commented Nov 1, 2022

Then you'll start seeing "dropping unexpected command message" errors in dmesg.

Ah, I think that's what I meant with "events" (messages sent by the EC that are not a response to a direct previous command. Might have a look at that later today.

Yeah, normal events. I also had to enable them via

./events.py enable 0x01 0x01 0x0b 0x0c 0x07 0x00 0x01

@qzed
Copy link
Member

qzed commented Nov 1, 2022

So CID=0x48 seems to send some null-terminated strings, but the majority of data seems to be in some binary format. Also my guess is that there's some timestamp and other header stuff before each record.

@quo
Copy link
Author

quo commented Nov 1, 2022

I'm wondering whether the blade thing is relevant for the gens before that at all... maybe SAM somehow has a USB interface on those gens and the blade interface is still used? Or maybe in some reduced capacity? Otherwise I'd have expected it only on the newer Pros and maybe on the Surface Books.

Well, Alex also discusses the Blade stuff he found in the old firmware, which includes communication and authentication. And the SP7 firmware also has various Blade related tasks/etc.

There's some old info here which suggests the old type cover also uses a serial protocol: http://edwardsh.in/keyboard%20cover/2015/08/13/applying-logic-to-the-surface-touch-cover

So my guess would be that all the type covers use the "Blade" protocol, and they've just changed where/how the translation to HID takes place.

Yeah, normal events. I also had to enable them

Ah, ok. From a quick look at your code I figured events had small request ids whereas the ids in the errors looked more random. I need to have a closer look at the event stuff.

Also my guess is that there's some timestamp and other header stuff before each record.

Yeah, it should be similar/identical to the TCL data. I'll see if I can figure out the format.

@qzed
Copy link
Member

qzed commented Nov 1, 2022

So my guess would be that all the type covers use the "Blade" protocol, and they've just changed where/how the translation to HID takes place.

I think that makes sense.

Ah, ok. From a quick look at your code I figured events had small request ids whereas the ids in the errors looked more random. I need to have a closer look at the event stuff.

On my SB2, the request ID seems to always be 0x0007 (normally request ID should match the target category, so that checks out here). But other values (command ID and especially instance ID) seem all over the place. I assume instance ID is some subsystem ID.

Haven't checked on the SPX yet though, so maybe things are a bit different on newer devices.

@qzed
Copy link
Member

qzed commented Nov 1, 2022

And you are absolutely correct with the request IDs on the newer devices... they're all over the place. With the new format it looks like

[header] 80 07 03 01 [...]

and I don't have to enable any events (in fact trying to do that will return some error code).

So, there's the 03 that is normally only set for host-to-EC messages (also with those it's then either 01 or 02, but I guess 03 means debug).

@quo
Copy link
Author

quo commented Nov 5, 2022

I've added some info about the debug log data format. And I was wrong about it being related to TCL, it doesn't look like the log data ends up in the TCL buffers. There's a fault handler function that seems to fill TCL buffer 1. It's not yet clear to me how the other TCL buffers are filled.

So, there's the 03 that is normally only set for host-to-EC messages

Yeah, AFAICT that just means the message was intended to go to TID 3 = Debug. When you override the log target, it doesn't bother to set the "correct" TID.

@qzed
Copy link
Member

qzed commented Nov 5, 2022

So, there's the 03 that is normally only set for host-to-EC messages

Yeah, AFAICT that just means the message was intended to go to TID 3 = Debug. When you override the log target, it doesn't bother to set the "correct" TID.

If I haven't messed anything up it's actually a bit weirder than that: For "normal" messages you have e.g.:

  • [header] 80 TC 00 01 [...]: SAM to host
  • [header] 80 TC 01 00 [...]: host to SAM
  • [header] 80 TC 00 02 [...]: KIP to host
  • [header] 80 TC 02 00 [...]: host to KIP

But we have [header] 80 07 03 01 [...] which would be SAM to host (so far pretty standard) but also host to debug? Point is, there's a byte set that I normally would have expected to be zero.

@qzed
Copy link
Member

qzed commented Nov 5, 2022

I think it's possible this could also be interpreted as debug to SAM and SAM to host, but I'm not sure if that would fit into the whole KIP perspective and I'd also have thought that debug messages originate from SAM itself.

Any chance you can find out more about the two target ID bytes, especially on how they seem to be used?

@quo
Copy link
Author

quo commented Nov 5, 2022

The bytes are just target ID and source ID.
So 03 01 means from SAM to Debug (except the override causes the message to be sent to the Host instead).
And like I said, I think the SAM just forwards everything with TID!=1.
So if you send something with 02 00, the SAM forwards it to the KIP. The KIP then sends a reply to the SAM with 00 02, and since the TID of that reply is 0, the SAM forwards it to the Host.

@qzed
Copy link
Member

qzed commented Nov 5, 2022

Ah, got it. That actually makes much more sense, thanks. I'll update the docs accordingly.

@qzed
Copy link
Member

qzed commented Nov 5, 2022

Alright, I've improved the handling for unknown/unsupported TIDs a bit: linux-surface/kernel@32815a5...351805f. Mostly just linux-surface/kernel@f1b2c93, which means that instead of trying to match up the request ID to something that in the best case doesn't exist and in the worst case is a wrong match, it ignores and drops anything that isn't addressed to the host directly.

I guess if we want to properly handle the debug messages, we'll have to handle them separately from regular messages anyways, meaning we'll need a command->tid == SSAM_SSH_TID_DEBUG there.

@quo
Copy link
Author

quo commented Nov 13, 2022

Looks good!

Meanwhile I've managed to flash modified firmware to the SAM. Interestingly there seems to be some code in the firmware update logic to do something with hashes and (maybe) signatures, but it's not actually used for the firmware images MS provides. So all you need to do is extract the firmware image, modify it, update the CRC16 at the end of the file, and then upload it using TC 9 CID 3 and CID 4.

So now I've added a command to the firmware to write arbitrary memory. And it works. :)

I've published some scripts here: https://github.com/quo/sam-fw-tools

@qzed
Copy link
Member

qzed commented Nov 13, 2022

Nice work!

I'm kind of surprised that it's not signed. Is there any other protection against that or could just any random user with admin permissions on Windows upload some firmware?

@quo
Copy link
Author

quo commented Nov 13, 2022

My guess is that anyone who can communicate with the SAM can flash new firmware. You might even be able to do it via the Surflink connection (ie. without being admin, and even when the device is turned off).

There are some conditions that seem to relate to whether firmware updates are allowed that I haven't really figured out yet, but that's bypassed when the safe mode is disabled.

@qzed
Copy link
Member

qzed commented Nov 13, 2022

My guess is that anyone who can communicate with the SAM can flash new firmware. You might even be able to do it via the Surflink connection (ie. without being admin, and even when the device is turned off).

If you manage to do that without any sort of authentication (I'd hope there is some), I think you should ask MS for a sizeable bug bounty xD

I honestly have no idea how locked down the driver interface is on Windows, so chances are that that's limited to some kernel stuff and user-space might be blocked somehow... might be worth checking if the Windows API allows arbitrary commands. Some SAM stuff should definitely be available like DTX (on the SB2 and SB3) or I think the TCL stuff. But if an attacker has admin rights, you've probably lost anyways. Anyways... awesome work!

@quo
Copy link
Author

quo commented Nov 13, 2022

I actually wonder if the Surflink UART connection from the SAM is directly exposed on the Surflink connector, or if there is another controller in between. Might be fun to try to probe the connector, send some commands to TID 4, and see if anything shows up.

I'm not really sure what the worst thing is that you could actually do with modified SAM firmware though, in terms of security. Keylogging and then somehow exfiltrating via synthesized HID events maybe?

@qzed
Copy link
Member

qzed commented Nov 13, 2022

Yeah, HID interface is probably the most dangerous thing. Keylogging, full keyboard and touchpad control... I'd guess you could find ways to use that as a sort of basic rootkit to pull more advanced stuff into the OS (like having it write commands into a terminal or something, use that to disable some protection stuff and download OS-level malware). Doesn't really have to be that dangerous by itself, the problem is you can't really detect it from the OS until it starts to act.

@qzed
Copy link
Member

qzed commented Nov 13, 2022

I actually wonder if the Surflink UART connection from the SAM is directly exposed on the Surflink connector, or if there is another controller in between. Might be fun to try to probe the connector, send some commands to TID 4, and see if anything shows up.

I'm not sure if the information that I have is correct or how reliable it is, but there is a SAM_DEBUG_RX and a SAM_DEBUG_TX line connected directly from SAM to SurfLink. Now I'm not sure if that is only for pre-production / debug models or if that is also on the final ones. In fact, there even seems to be a debug mux that can multiplex those lines to lines of one of the USB-C ports on the SPX (again, no idea if that's still present on the final models). The USB-C mux apparently also allows access to the blade UART and the SAM-to-SoC/Host TX/RX lines.

I would somewhat assume that the USB-C mux has been stripped from the final models but the SurfLink pins might still be present (since I don't think they'd need any additional hardware).

@qzed
Copy link
Member

qzed commented Jan 5, 2023

Finally got around to unpack the SPX firmware and load it into Ghidra. Had to specify 0x67c as offset (not 0x66c), but with that, everything seems to work.

@quo
Copy link
Author

quo commented Jan 6, 2023

Nice! The offset is 0x67c because the unpack script prepends the 0x10 byte setup data, since this is needed by the upload script. So the file consists of the 0x10 byte setup header + 0x66c bytes of headers (see parse_fw()) + the actual ARM binary.

A couple pointers to get you started (assuming the firmware is very similar to the SP7 firmware):

  • If you follow the SVCall vector, you'll get to a function that contains a switch (or if/else chain) that dispatches various Kaos system calls based on syscall number. These numbers are:

    0x11 Sleep
    0x12 TaskDestroy
    0x13 TaskSetPriority?
    0x20 MessageQueueCreate
    0x21 MessageQueueSend
    0x22 MessageQueueRead
    0x23 MessageQueueDestroy
    0x24 MessageQueuePeek
    0x30 SemaphoreCreate
    0x31 SemaphoreClaim
    0x32 SemaphoreRelease
    0x33 SemaphoreReleaseAll?
    0x34 SemaphoreDestroy
    0x40 EventCreate
    0x41 EventWait
    0x42 EventSignal
    0x43 EventDestroy
    

    Look for svc instructions to find the wrapper functions which are used to invoke these syscalls from "user" mode. The creation functions are especially helpful, because the first arg is a string naming the thing being created.

  • I've also found it to be extremely helpful to create a RAM dump using the 0x4d debug command, and loading that into Ghidra as well. Trying to follow some of the indirection and figuring out the data structures using just the ROM data is almost impossible.

  • To get to the SAM command handling, look for the function that calls MessageQueueCreate("SamHostTx",...) and MessageQueueCreate("SamHostRx",...). At the end of this function, a struct is filled and passed to another function.
    The layout of the struct is:

    struct cmd_queue {
    	struct cmd_queue *next;
    	u32 *msg_queue_handle_ptr;
    	u16 msg_queue_block_size;
    	u16 _padding0;
    	u32 *event_handle_ptr;
    	u32 event_flags;
    	u8 target_category;
    	u8 _padding1[3];
    }

    The function that is called registers the TC command queue by adding it to the linked list of all queues. You can search for invocations of this function, or inspect the linked list in RAM to find the message queue for each TC. And if you look for other references to the linked list, you'll find the SAM message dispatching function. This function checks if the target ID == 1, then loops over the list to dispatch the message to the correct message queue based on the TC (or if the target ID != 1, it forwards the message to a different target).

    Then, you can look for calls to MessageQueueRead with the messages queues from the list. These will usually be found in loops, inside top level task functions. The usage is a little ad-hoc, so each call site is structured differently, but usually the read is followed by a call to a function which parses the SAM message buffer into a struct, which is then passed to a TC specific function containing a big switch on the command ID.

@qzed
Copy link
Member

qzed commented Jan 6, 2023

Thanks! That is quite helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants