Permalink
Browse files

Reorg directory structure.

Mcpat sources and all archive builds now live in third_party.
Plus, includes from subdirectories use full names wrt the workspace.
Deleted ancient irrelevant files.
Updated run script.

TODO: integration test target; flags.
TESTED= bazel test --cpu=k8 -c opt xiosim/...:all; ./run.sh

Change-Id: I622696152a3cb51efcc57ba9ca633c58e7a7bb63
  • Loading branch information...
s-kanev committed Oct 31, 2015
1 parent 66bd5d5 commit edc68b67c247f918a685aaabb26269792ca8b43d
Showing 372 changed files with 161 additions and 1,695 deletions.
View
@@ -4,4 +4,5 @@
*.swp
*.list
*/obj-ia32
-*cscope*
+*cscope*
+bazel-*
View
@@ -1,115 +0,0 @@
-Zesto has only been tested on Red Hat Enterprise Linux versions 4 and 5 on
-Intel P4-based 32-bit and Core 2-based 64-bit systems. Even though the
-simulator runs on 64-bit systems, it's still a 32-bit application. If you find
-that it works on other distros, please let us know. If you modify Zesto to get
-it to run on other distros, send us the patches and we'll put these on the
-website and give you full credit.
-
-By default, the Makefile is set up for compiling on Red Hat Enterprise Linux
-version 5 (RHEL5). If this matches your configuration, you should be able to
-just type "make" and that's it. The only minor tweaks you might make are to
-the OFLAGS options in the Makefile, which are currently set to optimize for the
-Intel Core 2 microarchitecture (but again, in 32-bit mode as specified by the
--m32 flag).
-
-The main simulator is called sim-zesto. To discourage researchers and students
-from blindly using the default parameter/knob settings, they have all been set
-to silly values. Instead, we have included example configuration files for the
-pipeline and memory. To test the simulator, try:
-
- $ ./sim-zesto -config config/merom.cfg \
- -config dram-config/DDR2-800-5-5-5.cfg tests/fib
-
-This will run the fib program and print out a lot of stats. Fib is a toy
-program that just computes the first couple of number in the Fibonacci series
-using a dumb recursive implementation. It is a regular binary (i.e., you
-should be able to run it directly from your command prompt) written in regular
-C and compiled with gcc. You can create your own binaries by compiling on an
-x86 machine with "-m32 -static -march=pentiumpro". (NOTE: binaries created on
-RHEL4 seem to run just fine. Binaries created on RHEL5 seem to be generating
-some odd/unsupported behavior that causes the %gs register to get zeroed out
-when it seems that it shouldn't. Ignoring the resulting null-pointer loads
-seems to not cause any problems and our simple test programs appear to execute
-without any other noticeable problems.) The pentiumpro constraint is to prevent
-the compiler from emitting MMX or SSE instructions which are currently not
-supported. Obviously for "real" benchmarks, you need to add in other relevant
-optimization flags (e.g., you can set -mtune to be different from pentiumpro to
-have the code generator target a more modern microarchitecture).
-
-To run the simulator in multi-core mode, you can only use .eio files. You can
-generate these in the same way that you would have for previous versions of
-SimpleScalar (the sim-eio that comes with Zesto generated .eio files that are
-completely compatible with sim-zesto). Once you have your .eio files:
-
- $ ./sim-zesto -config config/merom.cfg \
- -config dram-config/DDR2-800-5-5-5.cfg -cores 2 -max:inst 100000 \
- -tracelimit 1000000 tests/app1.eio.gz tests/app2.eio.gz
-
-The simulator will then simulate 100,000 instructions (x86 macro-ops, *not*
-internal RISC micro-ops) as specified by the "-max:inst" knob. Note that in a
-multi-core simulation, one program may reach the instruction limit before the
-other. In this case, the statistics for the "finished" program are frozen
-(prevented from any more updates) so that they directly correspond to the
-behaviors observed for the specified number of instruction (i.e., "max:inst").
-The program, however, is allowed to continue executing so that it continues to
-contend with the other core(s) for shared resources. So what we typically do
-is collect a .eio trace file corresponding to more instructions than we want to
-collect stats for. In this example, say app1 finishes first, in which case it
-will continue executing until a limit of one million instructions have been
-committed. At this point, if app2 has not yet reached its instruction limit of
-100,000, the execution of app1 is restarted from the beginning (but again,
-performance statistics are not allowed to be updated beyond the original
-100,000 instructions). Eventually when app2 reaches 100,000 committed
-instructions, the simulation will terminate and print out even more stats for
-both cores. Note that the simulator assumes that all .eio files have the same
-tracelimit. If you use .eio files capturing different numbers of instructions,
-you should set -tracelimit to the minimum of the .eio files.
-
-Like the original SimpleScalar, you can also specify a fast-forward amount with
--fastfwd. By default, all caches and branch predictors will be warmed during
-the fast-forwarding interval (this can be disabled). In a multi-core
-simulation, the fast-forward simply functionally executes one instruction at a
-time from each program in a round-robin fashion, updating the cache hierarchy
-as it goes. This obviously will not result in a truly-correct warmed cache
-state as the cache contents depend on the exact timing of the arrival of the
-memory requests (and we don't know the exact timing since we're not simulating
-the pipeline during the fast functional simulation). So overall, when
-collecting .eio files for multi-core execution, you should start collecting the
-trace F instructions prior to the sample you want to collect (where F is the
-number of instructions you want to warm the processors with), N instructions
-corresponding to the actual sample, and then another E instructions (where E
-corresponds to extra instructions after the sample so that the simulator does
-not keep looping the same F+N instructions over and over again while waiting
-for the other cores to finish up). We don't have any particularly good
-guidance on how to choose E, and if F is large enough, it may be fine to just
-set E to zero. (One note, sim-eio's -fastfwd option specified instructions in
-*millions*.)
-
-So far in one form or another, we've been able to get a decent number of
-benchmarks running in our infrastructure.
-
-SPEC2000fp: applu, apsi, art, equake, galgel, mesa, mgrid, swim, wupwise
-SPEC2000int: all 12
-SPEC2006fp: bwaves, cactusADM, dealII, gromacs, lbm, milc, namd, soplex,
- zeusmp
-SPEC2006int: astar, bzip2, go, h264ref, hmmer, libquantum, mcf, omnetpp,
- perl, sjeng
-Mediabench: adpcm, epic, g721, gs, gsm, jpeg, mesa, mpeg2, pegwit
-Mediabench-II/video: h263, h264, jpeg, jpg2000, mpeg2, mpeg4
-MiBench: adpcm, basicmath, bitcount, blowfish, crc32, dijkstra, fft, gsm
- ispell, lame, patricia, pgp, qsort, rijndael, sha, susan
-BioBench: blastp, clustalw, fasta, hmmer, mummer, phylip, tigr
-BioPerf: clustalw, hmmpfam, phylip, predator
-PhysicsBench: breakable, continuous, deformable, everything, explosions,
- highspeed, periodic, ragdoll
-MineBench: bayes, eclat, semphy
-FacePerf: ebgm, pca
-PtrDist: anagram, bc, ft, ks, yacr2
-Stream, Stream2: all
-
-Most of the SPEC benchmarks that aren't running are Fortran. There's something
-not correctly and/or completely implemented in the simulator for properly
-handling what seems to be some of the Fortran I/O routines. There are a few
-other SPEC2006 benchmarks that we have running in a "partial state", in that
-they will run for many tens of billions of instructions before running into
-some sort of problem. These include gcc (int), leslie3d (fp) and sphinx3 (fp).
View
@@ -1,20 +0,0 @@
- NO WARRANTY
-
- THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
-APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
-HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
-WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT
-LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND
-PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
-DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
-CORRECTION.
-
- IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
-WRITING WILL ANY COPYRIGHT HOLDER BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES
-ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT
-NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR
-LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM
-TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER
-PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
View
@@ -2,26 +2,26 @@ new_http_archive(
name = "boost",
url = "http://downloads.sourceforge.net/project/boost/boost/1.54.0/boost_1_54_0.tar.gz?r=&ts=1445844477&use_mirror=superb-dca2",
sha256 = "412d003299e72555e1e1f62f51d3b07eca2a1911e27c442ee1c08167826ef9e2",
- build_file = "boost.BUILD",
+ build_file = "third_party/boost.BUILD",
)
new_http_archive(
name = "pin",
url = "https://software.intel.com/sites/landingpage/pintool/downloads/pin-2.14-67254-gcc.4.4.7-linux.tar.gz",
sha256 = "4499cfed383f362a0c74560a3ee66a5f117bea95f40067224ddf3c2606c77006",
- build_file = "pin.BUILD",
+ build_file = "third_party/pin.BUILD",
)
new_http_archive(
name = "confuse",
url = "https://github.com/martinh/libconfuse/archive/v2.8.zip",
sha256 = "34543ccff48b853241bac57dce8353bfc2c9b01d49b51f9cd6619d4a946fa5ef",
- build_file = "confuse.BUILD",
+ build_file = "third_party/confuse.BUILD",
)
new_http_archive(
name = "catch",
url = "https://github.com/philsquared/Catch/archive/v1.2.1.tar.gz",
sha256 = "24da0b6a6680256607da5ceb28004cb399009eae9f591614d7d22e3532f6980c",
- build_file = "catch.BUILD",
+ build_file = "third_party/catch.BUILD",
)
@@ -14,7 +14,7 @@
*/
program {
- exe = "../tests/fib"
+ exe = "tests/fib"
args = "> fib.out 2> fib.err"
instances = 1
}
@@ -1,90 +0,0 @@
-Please see zesto.cc.gatech.edu for more up-to-date and complete
-information and documentation.
-
-1.0 The x86 ISA
-
-The x86 instruction set required several changes to be made to the
-baseline architecture. Support had to be added to handle repeating
-instructions, condition code dependencies and the unique aspects of
-the x86 register file. In addition instructions could no longer be
-represented with fixed-byte lengths due to x86's variable length
-instructions. What follows is a overview on how these issues were
-dealt with in SimpleScalar. This is by no means an exhaustive list of
-changes made, as there were numerous small changes, but it will cover
-the major changes.
-
-1.1 Repeat Instructions [Original]
-
-Certain instructions in the x86 ISA contain fields that tell the
-processor to repeat that instruction. These repeats could be a fixed
-number of iterations or variable based on meeting a specified
-condition. To deal with this we introduced a series of macros
-(REP_COUNT, REP_AGAIN, REP_FIRST) to the main/dispatch stage. The
-REP_COUNT macro simply marks off each iteration of an instruction and
-REP_AGAIN macro checks whether or not another iteration is required.
-These functions would be enough except that it is possible for x86 to
-have instructions with repeat counts of 0. The REP_FIRST macro checks
-repeat instruction counts prior to their first execution and blocks
-execution if it has a count of 0. For the performance simulator,
-sim-outorder-x86, we also had to allow the dispatch stage to deal with
-speculative repeats. Since repeat counts cannot truly be resolved
-until after execution, we allow the dispatch stage to blast repeats
-into the ROB until execution of the last repeat occurs at which time
-the speculative repeats are blown away.
-
-1.1 Repeat Instructions [Zesto]
-
-We simulate repeat instructions more closely to the hardware. We assume
-a microcode sequencer that injects the additional necessary uops to
-perform the "microjumps" involved in executing the REP instructions.
-In particular, prior to the first iteration, uops are injected to
-test for a zero-iteration instruction. After each instruction, the
-ucode sequencer injects the uops to decrement the REP counter register
-and test for exit.
-
-1.2 Condition Code Handling
-
-The condition code flags are commonly used in x86. Many instructions
-read and/or write multiple flags and are therefor dependent on
-previous instructions through those flags. In SimpleScalar this was
-handled with a method similar to the generation of register
-dependencies. A flag create vector (effectively a flag-renaming
-table) is kept allowing flag consumers to install dependency links in
-the respective flag creator. After the creator is executed, the flags
-are broadcast to the consumer instructions in parallel with the
-broadcast of its destination register. To allow flags in x86
-instructions we also introduced two new fields (OFLAGS, IFLAGS) to the
-machine.def files which contain bit fields of the flags set/read. We also
-included two functions for set flag output dependencies and reading
-flag input dependencies in the flag create vector.
-
-1.3 Partial Forwarding [Original]
-
-The evolution of x86 registers has produces some interesting problems
-for dependence simulation. The x86 architectural registers can be
-accessed in multiple ways. For instance the 32-bit EAX register can
-be referenced as 32-bits (EAX), the lower 16-bits (AX) or one of the
-two lowest-order bytes (AH,AL). In terms of dependence generation,
-there is no real difference between EAX and AX. The interesting
-problem is that a write to AH does not produce a dependency for a read
-to AL (and visa versa). Since x86 uses a lot of byte operand
-instructions the problem of correct dependence generation had to be
-addressed. To solve this issue, we introduced a set of virtual
-registers onto the end of simulated register space. All byte-sized
-register references where transformed from their original value
-(corresponding to EAX) to this virtual register space. Then code was
-added to allow instructions to specify multiple output dependencies.
-A write to EAX would set the instruction as the creator of EAX, AH and
-AL, while a write to AH would set the instruction as the creator of
-just EAX and AH leaving AL independent. Reading dependencies from
-create vector was unchanged.
-
-1.3 Partial Forwarding [Zesto]
-
-To support partial-register reads/writes, we force each instruction
-to always update the entire 32-bit register (e.g., EAX). A write to
-AL causes the entire EAX to be updated. Where necessary, additional
-partial-write merging uops have been inserted into the uop flows to
-ensure that all 32 bits always get updated together. This means that
-an update to AL followed by an update to AH will in fact be serialized
-by the intermediate merging uop. Same goes for writes to AX.
@@ -1,9 +1,9 @@
#!/bin/bash
-PIN=${PIN_ROOT}/pin.sh
-BIN_PATH=../bazel-bin/pintool
+PIN=bazel-xiosim/external/pin/pin-2.14-67254-gcc.4.4.7-linux/pin.sh
+BIN_PATH=bazel-bin/xiosim/pintool
PINTOOL=${BIN_PATH}/feeder_zesto.so
-ZESTOCFG=../config/none.cfg
+ZESTOCFG=xiosim/config/N.cfg
BENCHMARK_CFG_FILE=benchmarks.cfg
CMD_LINE="setarch x86_64 -R ${BIN_PATH}/harness \
Oops, something went wrong.

0 comments on commit edc68b6

Please sign in to comment.