forked from j616/DynamoRIO-for-ARM
/
september2011ProgressReport.tex
510 lines (476 loc) · 19.3 KB
/
september2011ProgressReport.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
\documentclass[a4paper]{article}
\title{A Progress Report of the Manchester University ARM Port of
Dynamo-RIO}
\author{James Sandford\\
School of Computer Science,\\
The University of Manchester,\\
Oxford Road,\\
Manchester,\\
M13 9PL\\
\texttt{sandfoj9@cs.man.ac.uk}}
\date{\today}
\usepackage{url}
\begin{document}
\maketitle
\newpage
\begin{abstract}
This report has been put together following several weeks of trying to
understand the Dynamo-RIO software project and the current progress made
in it's
port to the ARM architecture from the original x86 code carried out
originally
by Stephen Barton as a final year computer science project in the year
prior to
this report. I aim to provide a means to get to my current position as
quickly
as possible by showing how to get the code cross compiling on an x86
Linux
system, pitfalls I fell down when going through this process and others,
an
overview of the code structure and an overview of what has and has not
been
completed in the port at this point in time. In addition to this, I will
provide
any advice I can on working with the current code base and I will
provide
recommendations on how to progress from here.
\end{abstract}
\newpage
\tableofcontents
\newpage
\section{The code base}
\subsection{Git and GitHub}
The code for the ARM port is currently available on GitHub and can be
found at
\url{http://github.com/j616/DynamoRIO-for-ARM/}. I shalln't go into any
detail
on how to use Git or GitHub here but you can find all you need to know
at
\url{http://help.github.com/}. But Git is basically a version control
system
that's a bit nicer when it comes to things such as merging than systems
like
SVN. GitHub is just a hosting website for Git repositories that provides
a nice
web interface with some occasionally useful features such as graphing
of
contribution and programming languages along with the usual browser
based
viewing and editing of files.
\subsection{The Folder Structure}
The folder structure consists of the following:
\begin{description}
\item[Clients] Several example clients.
\item[CurrentVersion] The copy of the source being worked on. This is
our main
focus.
\begin{description}
\item[api] Strangely named as it contains the standard
Dynamo-RIO sample clients
such as bbcount.
\item[bin] The shell scripts to run Dynamo-RIO. For information
on these, look
at the README in the CurrentVersion folder.
\item[clients] A couple of clients. These are Windows specific
so I haven't
looked at them. I should note here that the ARM port is
currently Linux specific
as Windows doesn't currently support ARM. Though I believe this
will change with
Windows 8.
\item[cmake] CMake configuration files. Don't really need
touching but hold
things like the version number to give your build.
\item[core] The place where things get interesting. This is
Dynamo-RIO itself in
many ways. This is the code that manages everything else.
Importantly, this is
where we maintain control of the software we are working on
along with the
decoding, modification and encoding of instructions. Most (all?)
of the ARM
specific code will be in here.
\begin{description}
\item[Arm] The (mainly) ARM specific files. More on this
later.
\item[lib] The header files and generation scripts for
the client libraries.
\item[linux] The Linux specific files.
\item[win32] The Windows specific files (not just
32bit).
\item[x86] The (mainly) x86 specific files. More on this
later.
\end{description}
\item[ext] Extensions. I believe this is needed for the actual
use of clients in
Dynamo-RIO.
\item[include] The basic header files you will use when
constructing clients.
There is a major problem here that will be discussed later.
\item[libutil] General libraries. I haven't needed to touch
these.
\item[make] Auto-generated make files.
\item[suite] A suite of development tools for the main
Dynamo-RIO devs. This
hasn't and probably shouldn't be used in the ARM build at least
while it remains
a separate project.
\item[tools] Again, a large number of development tools. These
again look mainly
related to the main version of Dynamo-RIO and to x86. Some may
be adaptable
though for debugging and testing.
\end{description}
\item[Decoder] A standalone ARM machine code decoder. This will be
explained more later.
\item[Encoder] A standalone ARM machine code encoder. This too will be explained
a little more later.
\item[MiscCode] Samples of ARM assembly, x86 assembly, very simple test
C
programs for running Dynamo-RIO on and CMake script examples.
\begin{description}
\item[ArmAssembly] Some small test programs in C and ARM
assembly. More on
this later.
\item[Assembly] Some small test programs in x86 assembly.
\item[cMakeExamples] Some small example scripts for CMake.
\end{description}
\item[RunningVersion] A copy of the source containing mainly x86
specific code.
This has remained largely un-used by myself and thus I won't go into any
detail.
Best sticking with CurrentVersion.
\end{description}
\subsection{Notation}
My predecessor employed the following notation to keep track of his work:
\begin{description}
\item[COMPLETEDD] This function has been finished and fully tested. Note the
capitals and the extra 'D'.
\item[INPROCESSS] This function is currently being worked on and has not been
fully tested. Note the capitals and the extra 'S'.
\item[ADDME] Major functionality needs adding or adapting here. Note the
capitals.
\end{description}
In addition to this, there is a healthy spread of 'fixme' comments. At least
some of these are from the main Dynamo-RIO project though some may be from the
ARM port.
\subsubsection{INPROCESSS Tree}
Below is a tree I have constructed of the INPROCESSS functions and their
connected usage. This is based on a simple grep of the INPROCESSS functions and
is by no means a complete list of what needs doing.
\begin{verbatim}
core/linux/preload.c:_init(
core/dynamo.c:Dynamorio_app_init(
core/dynamo.c:dynamo_thread_init(
core/dynamo.c:is_thread_initialized(
)
core/fcache.c:fcache_reset_all_caches_proactivly(
core/synch.c:synch_with_all_threads(
core/synch.c:synch_with_thread(
core/dynamo.c:dynamo_other_thread_exit(
core/dynamo.c:dynamo_thread_exit_common(
core/fragment.c:enter_threadexit(
core/fragment.c:check_flush_queue(
core/vmareas.c:vm_area_flush_fragments(
core/fragment.c:fragment_delete(
core/fragment.c:fragment_output(
)
core/monitor.c:monitor_remove_fragment(
core/monitor.c:trace_abort(
core/monitor.c:internal_restore_last(
core/link.c:link_fragment_outgoing(
core/link.c:is_linkable(
core/monitor.c:monitor_is_linkable(
core/monitor.c:should_be_trace_head(
core/monitor.c:should_be_trace_head_internal(
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
core/link.c:unlink_fragment_incoming(
core/link.c:unlink_branch(
)
)
core/linux/os.c:dr_syscall_invoke_another(
)
core/linux/signal.c:thread_set_self_context(
)
core/utils.h:FATAL_USAGE_ERROR(
)
core/configc.c:config_reread(
)
core/Arm/emit_utils.c:emit_fcache_enter_shared(
core/Arm/emit_utils.c:emit_fcache_enter_common(
)
)
core/Arm/emit_utils.c:coarse_exit_prefix_size(
)
core/Arm/emit_utils.c:patch_branch(
)
core/Arm/emit_utils.c:coarse_is_entrance_stub(
)
core/Arm/emit_utils.c:emit_fcache_enter(
core/Arm/emit_utils.c:emit_fcache_enter_common(
)
)
core/Arm/instrument.c:instrument_init(
)
core/Arm/decode.c:decode_operand(
)
core/Arm/mangle.c:remangle_short_rewrite(
)
core/Arm/mangle.c:sandbox_top_of_bb(
)
core/Arm/arch.c:arch_thread_init(
core/Arm/mangle.c:set_selfmod_sandbox_offsets(
)
)
core/Arm/instr.c:instr_get_branch_target_pc(
)
core/Arm/instr.c:instr_is_syscall(
)
core/Arm/instr.c:instr_is_cti_short_rewrite(
)
core/Arm/arch.c:shared_gencode_init(
core/Arm/emit_utils.c:emit_fcache_enter_shared(
****SEE ABOVE TRACE****
)
)
core/Arm/arch.c:recreate_app_state(
core/Arm/arch.c:recreate_app_state_internal(
)
)
core/Arm/disassemble.c:instr_disassemble(
)
core/Arm/disassemble.c:dump_dr_callstack(
)
core/fragment.c:update_lookable_tls(
)
core/fragment.c:fragment_recreate_with_linkstubs(
)
core/options.c:synchronize_dynamic_options(
)
\end{verbatim}
\section{Working With the Code}
\subsection{Platform Specific Settings and Tools}
\subsubsection{Native ARM Compilation}
While it should be theoretically possible to compile natively on an ARM device,
problems are present in that assembly takes quite a large amount of memory. So
when trying to compile on the Tegra 250, I found memory usage goes up and up
until it gives up an the process dies. This would probably be solved by having
swap space. If you want to try, run CMake with the ARM option on and the
cross-compile option off. The difference with cross-compile being the build
commands used. The advantage of this is you can build on the machine under test
and thus use specific Linux kernel modules rather than generic ones. As stated
earlier, ARM support is only for Linux.
\subsubsection{Cross-Compiling from x86}
The main tool you'll need to cross-compile is the CodeSourcery Lite toolchain.
You can get this from
\url{http://www.codesourcery.com/sgpp/lite/arm/portal/subscription?@template=lite}.
You want the GNU/Linux version as this contains all the generic cross-compiled
headers required by Dynamo-RIO. Once you've got this, install it with the nice
GUI provided and all should be blue skies and roses with any luck. You want to
run CMake with both the ARM and cross-compile options on.
\subsection{Compilation}
A couple of more notes on CMake settings. To get to the options, use the
\texttt{cmake -i} command. This gives you an interactive prompt. Running
\texttt{cmake} on it's own will create make scripts using the last settings
used. You shouldn't need the advanced options. One big note is that you won't be
able to build the extensions library or samples. These require the api to build
but this is not yet possible due to missing functions.
Once you have changed any CMake settings you need, just run \texttt{make} to
build Dynamo-RIO.
\subsection{Running Dynamo-RIO}
See README.
\section{Current Project State}
\subsection{Visual Inspection of Code}
In addition to the tree of INPROCESSS functions I created above, I have also
performed visual checks of the code to try and see which source files are
finished, in process, and yet to be started. This took into account any comments
on the state of functions, whether code contained \texttt{ifdef ARM} which would
indicate the code had at least been started to be adapted, if the code contained
empty or dummy functions and the general look of the code. This resulted in the
following:
\begin{description}
\item[core]
\begin{description}
\item[Arm] Along with those specifically stated are several empty files.
\begin{description}
\item[arch.c] needs converting
\item[arch.h] needs converting
\item[arch\_exports.h] Some dummy functions. Some x86.asm references?
\item[arm.s] doesn't look finished
\item[decode.c] contains both ARM and x86 in the same file. looks unfinished.
\item[decode.h] as above
\item[decode\_fast.c] as above
\item[decode\_fast.h] as above
\item[decode\_table.c] done?
\item[disassemble.c] not started
\item[emit\_utils.c] not finished. not started?
\item[encode.c] not started
\item[instr.c] done?
\item[instr\_create.h] not started
\item[instrument.c] not finished?
\item[instrument.h] not finished - Needs relevent DRAPI comments along with
some code
\item[interp.c] not finished
\item[mangle.c] not finished
\item[proc.c] not finished
\item[proc.h] not started
\end{description}
\item[lib]
\begin{description}
\item[dr\_config.h] may need some editing
\item[genapi.pl] needs ARM integration
\item[globals\_shared.h] needs ARM integration
\item[hotpatch\_interface.h] needs ARM integration
\end{description}
\item[linux]
\begin{description}
\item[os.c] may need ARM integration
\item[os\_exports.h] may need ARM integration
\item[signal.c] needs ARM integration
\item[syscall.h] not finished?
\end{description}
\end{description}
\end{description}
On top of this, I also have the following list in my notebook under the title
"changed files". A list of files that show signs of having been altered. I can't
quite remember why I didn't give them the same treatment as the files above but
the list will probably still be of some use.
\begin{description}
\item[CMakeLists.txt] Note: I have modified this myself somewhat to allow for
both native and cross compilation.
\item[configure\_defines.h]
\item[configure.h]
\item[configure\_temp.h]
\item[core]
\begin{description}
\item[ldscript]
\item[CMakeLists.txt]
\item[globals.h]
\item[hotpatch.c]
\item[module\_list.c]
\item[utils.c]
\item[vmareas.c]
\end{description}
\item[lib]
\begin{description}
\item[libdrpreload.so]
\end{description}
\item[make]
\begin{description}
\item[DynamoRIOConfig.cmake.in]
\end{description}
\end{description}
\subsection{Stephen Barton's Project Report}
From reading Stephen Barton's project report Dynamo-RIO: Process Virtualisation
for ARM, I managed to gather the following information about the current state
of the project.
\begin{itemize}
\item An ARM instruction decoder has been written and incorporated, in part,
into Dynamo-RIO.
\item An ARM instruction encoder was also written but was only produced as a
standalone program. A partialy completed version of this is available in the
codebase that I have been told will handle "40 of the most common instructions".
Apparently, adding new instructions is as simple as adding the relevant mask to
the table.
\item Not all ARM instructions have been supported. For example, those that
update the CPSR with the format \texttt{<instr>S}.
\item The re-implementation of the x86 functions has been partially done.
\item It would also seem that an approach was taken to remove all the x86
functions and re-instate them as they were re-implemented for ARM to provide a
side-by-side comparison.
\end{itemize}
\subsection{Code Modified by James Sandford}
With a fairly small number of dummy functions (\texttt{opnd\_ceate\_far\_instr},
\texttt{opnd\_create\_instr}, \texttt{opnd\_get\_instr},
\texttt{opnd\_get\_segment\_selector}, \texttt{opnd\_is\_far\_instr} and
\texttt{opnd\_is\_instr}) in texttt{core/Arm/instr.c}, I was able to
build the software and get some debugging output of the software traversing it's
intialisation routines. With a little more effort, I discovered a bug in the
interface between the C code and the ARM assembly code and got more output. A
copy of this output is available in the \texttt{logs/runLog1\_9\_11} file.
I originally interpreted some of this output as decoding due to
it's apperance at a glance being similar to that of the output from the
standalone decoder. This is not the case though. No decoding of instructions has
been carried out by myself within the confines of Dynamo-RIO. I traced the
reason for this to the clients I had assumed were compiled for ARM being
compiled for x86. This meant I had to re-visit the build scripts for clients
and, by extention, for the includes files. I added a little code to genapi.pl to
add ARM support. It had previously just read from the x86 directory, not the ARM
directory. Neither did it check for ARM includes and ifdefs. Although I don't
believe I touched anything to do with it, a bug in this script that resulted in
replication of code now seems to have dissapeared.
The next problem was one in wich CMake would notice a change in the compiler
being used and would decide the best action would be to delete the CMake
settings file and start with the standard ones... I eventually tracked this down
to the building of clients being a seperate CMake project. The project()
function would modify the relevent variables causing the problem. I came up with
several possible solutions in accordance with the recomendations in the CMake
documentation but none of these would have worked because of the ordering of
option selection and the such. The solution I went with is the only one that
I found to work. If we are not already within a CMake project, a new one will be
created. Otherwise we just build everything as one project. This is not the
right way of doing it so if anyone can find a way to keep the projects seperate,
please implement it.
The current state of the project is that it will try to build the sample
clients. The problem right now being that some of the source files for the
"include" header files are not finished yet. Some functions are missing along
with some of the DRAPI comments required for the genapi.pl script to copy things
to the right place. There are also some errors popping up about miss-matched
IFDEF statements. This may be a problem with the source files or the genapi.pl
script. I am not certain which but I'm assuming the source files. I would
recomend completing these as the next task. This would then allow you to build
clients and thus test Dynamo-RIO properly.
Once Dynamo-RIO may be run nativly with an ARM client, I believe output may
finally reach the decoder. It will then almost certainly fail as several of the
functions in regards to the hashtable and Dynamo-RIO gaining full control of the
code under test appear to be missing.
\subsection{Scripts Created by James Sandford}
\subsubsection{DRTest}
This is a small shellscript the runs make before copying the required to an ARM
device (a machine called rightarm in this case) and then runs Dynamo-RIO. It
also redirects the output to \texttt{logs/runLog} on the local machine. I'm
using it with a trusted machine aliased in my ssh config but you can always
modify the ssh lines to your machine and with the relevant username. Just set up
your CMake config with \texttt{cmake -i} (if your config isn't changing, you
only need to do this once) and then run \texttt{./DRTest} each time you want to
test.
\section{Other Notes}
Other notes worth mentioning follow.
\begin{itemize}
\item \texttt{read\_instruction} in core/Arm/decode.c is listed as done but seems to have
a large amount of place-holders.
\item The entry point at which Dynamo-RIO latches onto a process isn't immediately
obvious. Initialization begins with the \texttt{\_init} function in core/dynamo.c
from where specific initialization routines are called. The \texttt{\_init}
function is called from the dynamic library which contains Dynamo-RIO just
before the code under test is started. It is because of this that only dynamic
code can be controlled by Dynamo-RIO.
\item There are some basic programs for testing in the top level MiscCode
folder. It is importand to note that you should NOT use \texttt{as} to assemble
the ARM assembly code as this will result in non-dynamic code that Dynamo-RIO
cannot latch on to. Use \texttt{arm-none-linux-gnueabi-gcc -nostartfiles
<sourcefile>}.
\item The jobs that need doing are not just those listed as INPROCESSS or ADDME.
But those that are listed are a good starting place and will lead you to most,
if not all, the other jobs needed to complete the project.
\end{itemize}
\end{document}