Skip to content

Commits

Permalink
mttcg/tcg-next…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on May 3, 2016

  1. translate-all: add tb hash bucket info to 'info jit' dump

    Examples:
    
    - Good hashing, i.e. tb_hash_func5(phys_pc, pc, flags):
    TB count            715135/2684354
    [...]
    TB hash buckets     388775/524288 (74.15% head buckets used)
    TB hash occupancy   33.04% avg chain occ. Histogram: [0,10)%|▆ █  ▅▁▃▁▁|[90,100]%
    TB hash avg chain   1.017 buckets. Histogram: 1|█▁▁|3
    
    - Not-so-good hashing, i.e. tb_hash_func5(phys_pc, pc, 0):
    TB count            712636/2684354
    [...]
    TB hash buckets     344924/524288 (65.79% head buckets used)
    TB hash occupancy   31.64% avg chain occ. Histogram: [0,10)%|█ ▆  ▅▁▃▁▂|[90,100]%
    TB hash avg chain   1.047 buckets. Histogram: 1|█▁▁▁|4
    
    - Bad hashing, i.e. tb_hash_func5(phys_pc, 0, 0):
    TB count            702818/2684354
    [...]
    TB hash buckets     112741/524288 (21.50% head buckets used)
    TB hash occupancy   10.15% avg chain occ. Histogram: [0,10)%|█ ▁  ▁▁▁▁▁|[90,100]%
    TB hash avg chain   2.107 buckets. Histogram: [1.0,10.2)|█▁▁▁▁▁▁▁▁▁|[83.8,93.0]
    
    - Good hashing, but no auto-resize:
    TB count            715634/2684354
    TB hash buckets     8192/8192 (100.00% head buckets used)
    TB hash occupancy   98.30% avg chain occ. Histogram: [95.3,95.8)%|▁▁▃▄▃▄▁▇▁█|[99.5,100.0]%
    TB hash avg chain   22.070 buckets. Histogram: [15.0,16.7)|▁▂▅▄█▅▁▁▁▁|[30.3,32.0]
    
    Suggested-by: Richard Henderson <rth@twiddle.net>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    0fcb7d5 View commit details
    Browse the repository at this point in the history
  2. tb hash: track translated blocks with qht

    Having a fixed-size hash table for keeping track of all translation blocks
    is suboptimal: some workloads are just too big or too small to get maximum
    performance from the hash table. The MRU promotion policy helps improve
    performance when the hash table is a little undersized, but it cannot
    make up for severely undersized hash tables.
    
    Furthermore, frequent MRU promotions result in writes that are a scalability
    bottleneck. For scalability, lookups should only perform reads, not writes.
    This is not a big deal for now, but it will become one once MTTCG matures.
    
    The appended fixes these issues by using qht as the implementation of
    the TB hash table. This solution is superior to other alternatives considered,
    namely:
    
    - master: implementation in QEMU before this patchset
    - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
    - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
                  MRU is implemented here by adding an intermediate struct
                  that contains the u32 hash and a pointer to the TB; this
                  allows us, on an MRU promotion, to copy said struct (that is not
                  at the head), and put this new copy at the head. After a grace
                  period, the original non-head struct can be eliminated, and
                  after another grace period, freed.
    - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
                       no MRU for lookups; MRU for inserts.
    The appended solution is the following:
    - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
                     no MRU for lookups; MRU for inserts.
    
    The plots below compare the considered solutions. The Y axis shows the
    boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
    sweeps the number of buckets (or initial number of buckets for qht-autoresize).
    The plots in PNG format (and with errorbars) can be seen here:
      http://imgur.com/a/Awgnq
    
    Each test runs 5 times, and the entire QEMU process is pinned to a
    single core for repeatability of results.
    
                                Host: Intel Xeon E5-2690
    
      28 ++------------+-------------+-------------+-------------+------------++
         A*****        +             +             +             master **A*** +
      27 ++    *                                                 xxhash ##B###++
         |      A******A******                               xxhash-rcu $$C$$$ |
      26 C$$                  A******A******            qht-fixed-nomru*%%D%%%++
         D%%$$                              A******A******A*qht-dyn-mru A*E****A
      25 ++ %%$$                                          qht-dyn-nomru &&F&&&++
         B#####%                                                               |
      24 ++    #C$$$$$                                                        ++
         |      B###  $                                                        |
         |          ## C$$$$$$                                                 |
      23 ++           #       C$$$$$$                                         ++
         |             B######       C$$$$$$                                %%%D
      22 ++                  %B######       C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
         |                    D%%%%%%B######      @e@@@@@@    %%%D%%%@@@e@@@@@@e
      21 E@@@@@@e@@@@@@f&&&@@@e@@@&&&D%%%%%%B######B######B######B######B######B
         +             E@@@   F&&&   +      E@     +      F&&&   +             +
      20 ++------------+-------------+-------------+-------------+------------++
         14            16            18            20            22            24
                                 log2 number of buckets
    
                                     Host: Intel i7-4790K
    
      14.5 ++------------+------------+-------------+------------+------------++
           A**           +            +             +            master **A*** +
        14 ++ **                                                 xxhash ##B###++
      13.5 ++   **                                           xxhash-rcu $$C$$$++
           |                                            qht-fixed-nomru %%D%%% |
        13 ++     A******                                   qht-dyn-mru @@e@@@++
           |             A*****A******A******             qht-dyn-nomru &&F&&& |
      12.5 C$$                               A******A******A*****A******    ***A
        12 ++ $$                                                        A***  ++
           D%%% $$                                                             |
      11.5 ++  %%                                                             ++
           B###  %C$$$$$$                                                      |
        11 ++  ## D%%%%% C$$$$$                                               ++
           |     #      %      C$$$$$$                                         |
      10.5 F&&&&&&B######D%%%%%       C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$    $$$C
        10 E@@@@@@e@@@@@@b#####B######B######E@@@@@@e@@@%%%D%%%%%D%%%###B######B
           +             F&&          D%%%%%%B######B######B#####B###@@@d%%%   +
       9.5 ++------------+------------+-------------+------------+------------++
           14            16           18            20           22            24
                                  log2 number of buckets
    
    Note that the original point before this patch series is X=15 for "master";
    the little sensitivity to the increased number of buckets is due to the
    poor hashing function in master.
    
    xxhash-rcu has significant overhead due to the constant churn of allocating
    and deallocating intermediate structs for implementing MRU. An alternative
    would be do consider failed lookups as "maybe not there", and then
    acquire the external lock (tb_lock in this case) to really confirm that
    there was indeed a failed lookup. This, however, would not be enough
    to implement dynamic resizing--this is more complex: see
    "Resizable, Scalable, Concurrent Hash Tables via Relativistic
    Programming" by Triplett, McKenney and Walpole. This solution was
    discarded due to the very coarse RCU read critical sections that we have
    in MTTCG; resizing requires waiting for readers after every pointer update,
    and resizes require many pointer updates, so this would quickly become
    prohibitive.
    
    qht-fixed-nomru shows that MRU promotion is advisable for undersized
    hash tables.
    
    However, qht-dyn-mru shows that MRU promotion is not important if the
    hash table is properly sized: there is virtually no difference in
    performance between qht-dyn-nomru and qht-dyn-mru.
    
    Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
    X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
    can achieve with optimum sizing of the hash table, while keeping the hash
    table scalable for readers.
    
    The improvement we get before and after this patch for booting debian jessie
    with arm-softmmu is:
    
    - Intel Xeon E5-2690: 10.5% less time
    - Intel i7-4790K: 5.2% less time
    
    We could get this same improvement _for this particular workload_ by
    statically increasing the size of the hash table. But this would hurt
    workloads that do not need a large hash table. The dynamic (upward)
    resizing allows us to start small and enlarge the hash table as needed.
    
    A quick note on downsizing: the table is resized back to 2**15 buckets
    on every tb_flush; this makes sense because it is not guaranteed that the
    table will reach the same number of TBs later on (e.g. most bootup code is
    thrown away after boot); it makes sense to grow the hash table as
    more code blocks are translated. This also avoids the complication of
    having to build downsizing hysteresis logic into qht.
    
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    4222691 View commit details
    Browse the repository at this point in the history
  3. qht: add test program

    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    009674f View commit details
    Browse the repository at this point in the history
  4. qht: QEMU's fast, resizable and scalable Hash Table

    This is a hash table with optional auto-resizing and MRU promotion for
    reads and writes. Its implementation goal is to stay fast while
    scaling for read-mostly workloads.
    
    A hash table with these features will be necessary for the scalability
    of the ongoing MTTCG work; before those changes arrive we can already
    benefit from the single-threaded speedup that qht also provides.
    
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    a5a5b13 View commit details
    Browse the repository at this point in the history
  5. qdist: add test program

    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    c410979 View commit details
    Browse the repository at this point in the history
  6. qdist: add module to represent frequency distributions of data

    Sometimes it is useful to have a quick histogram to represent a certain
    distribution -- for example, when investigating a performance regression
    in a hash table due to inadequate hashing.
    
    The appended allows us to easily represent a distribution using Unicode
    characters. Further, the data structure keeping track of the distribution
    is so simple that obtaining its values for off-line processing is trivial.
    
    Example, taking the last 10 commits to QEMU:
    
     Characters in commit title  Count
    -----------------------------------
                             39      1
                             48      1
                             53      1
                             54      2
                             57      1
                             61      1
                             67      1
                             78      1
                             80      1
    qdist_init(&dist);
    qdist_inc(&dist, 39);
    [...]
    qdist_inc(&dist, 80);
    
    char *str = qdist_pr(&dist, 9, QDIST_PR_LABELS);
    // -> [39.0,43.6)▂▂ █▂ ▂ ▄[75.4,80.0]
    g_free(str);
    
    char *str = qdist_pr(&dist, 4, QDIST_PR_LABELS);
    // -> [39.0,49.2)▁█▁▁[69.8,80.0]
    g_free(str);
    
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    9bf2b43 View commit details
    Browse the repository at this point in the history
  7. tb hash: hash phys_pc, pc, and flags with xxhash

    For some workloads such as arm bootup, tb_phys_hash is performance-critical.
    The is due to the high frequency of accesses to the hash table, originated
    by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
    More info:
      https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html
    
    To dig further into this I modified an arm image booting debian jessie to
    immediately shut down after boot. Analysis revealed that quite a bit of time
    is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
    results in very uneven loading of chains in the hash table's buckets;
    the longest observed chain had ~550 elements.
    
    The appended addresses this with two changes:
    
    1) Use xxhash as the hash table's hash function. xxhash is a fast,
       high-quality hashing function.
    
    2) Feed the hashing function with not just tb_phys, but also pc and flags.
    
    This improves performance over using just tb_phys for hashing, since that
    resulted in some hash buckets having many TB's, while others getting very few;
    with these changes, the longest observed chain on a single hash bucket is
    brought down from ~550 to ~40.
    
    Tests show that the other element checked for in tb_find_physical,
    cs_base, is always a match when tb_phys+pc+flags are a match,
    so hashing cs_base is wasteful. It could be that this is an ARM-only
    thing, though.
    
    BTW, after this change the hash table should not be called "tb_hash_phys"
    anymore; this is addressed later in this series.
    
    This change gives consistent bootup time improvements. I tested two
    host machines:
    - Intel Xeon E5-2690: 11.6% less time
    - Intel i7-4790K: 19.2% less time
    
    Increasing the number of hash buckets yields further improvements. However,
    using a larger, fixed number of buckets can degrade performance for other
    workloads that do not translate as many blocks (600K+ for debian-jessie arm
    bootup). This is dealt with later in this series.
    
    Reviewed-by: Richard Henderson <rth@twiddle.net>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    54e9c09 View commit details
    Browse the repository at this point in the history
  8. exec: add tb_hash_func5, derived from xxhash

    This will be used by upcoming changes for hashing the tb hash.
    
    Add this into a separate file to include the copyright notice from
    xxhash.
    
    Reviewed-by: Richard Henderson <rth@twiddle.net>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    84ddbad View commit details
    Browse the repository at this point in the history
  9. qemu-thread: add simple test-and-set spinlock

    Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
    [Rewritten. - Paolo]
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    [Emilio's additions: use atomic_test_and_set instead of atomic_xchg;
     call cpu_relax() while spinning; optimize for uncontended locks by
     acquiring the lock with TAS instead of TATAS.]
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    Guillaume Delbergue authored and stsquad committed May 3, 2016
    Copy the full SHA
    97b9f7e View commit details
    Browse the repository at this point in the history
  10. atomics: add atomic_test_and_set

    This new helper expands to __atomic_test_and_set where available;
    otherwise it expands to atomic_xchg.
    
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    da0d61c View commit details
    Browse the repository at this point in the history
  11. include/processor.h: define cpu_relax()

    Taken from the linux kernel.
    
    Reviewed-by: Richard Henderson <rth@twiddle.net>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    17d2555 View commit details
    Browse the repository at this point in the history
  12. seqlock: rename write_lock/unlock to write_begin/end

    It is a more appropriate name, now that the mutex embedded
    in the seqlock is gone.
    
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Richard Henderson <rth@twiddle.net>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    9f71dd7 View commit details
    Browse the repository at this point in the history
  13. seqlock: remove optional mutex

    This option is unused; besides, it bloats the struct when not needed.
    Let's just let writers define their own locks elsewhere.
    
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Reviewed-by: Richard Henderson <rth@twiddle.net>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    21ecb89 View commit details
    Browse the repository at this point in the history
  14. compiler.h: add QEMU_ALIGNED() to enforce struct alignment

    Reviewed-by: Richard Henderson <rth@twiddle.net>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    cota authored and stsquad committed May 3, 2016
    Copy the full SHA
    108ed51 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2016

  1. cpu-exec: Move TB chaining into tb_find_fast()

    Move tb_add_jump() call and surrounding code from cpu_exec() into
    tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
    chaining optimization details into tb_find_fast(). It also allows to
    move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
    to tb_find_slow() which also manipulates the lock.
    
    Suggested-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    [rth: Fixed rebase typo in nochain test.]
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    e601ccb View commit details
    Browse the repository at this point in the history
  2. tcg: Rework tb_invalidated_flag

    'tb_invalidated_flag' was meant to catch two events:
     * some TB has been invalidated by tb_phys_invalidate();
     * the whole translation buffer has been flushed by tb_flush().
    
    Then it was checked:
     * in cpu_exec() to ensure that the last executed TB can be safely
       linked to directly call the next one;
     * in cpu_exec_nocache() to decide if the original TB should be provided
       for further possible invalidation along with the temporarily
       generated TB.
    
    It is always safe to patch an invalidated TB since it is not going to be
    used anyway. It is also safe to call tb_phys_invalidate() for an already
    invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
    simply unnecessary. Moreover, it can prevent from pretty proper linking
    of TBs, if any arbitrary TB has been invalidated. So just don't touch it
    in tb_phys_invalidate().
    
    If this flag is only used to catch whether tb_flush() has been called
    then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
    only 'true' and 'false' to set its value. Also, instead of setting it in
    tb_gen_code(), just after tb_flush() has been called, do it right inside
    of tb_flush().
    
    In cpu_exec(), this flag is used to track if tb_flush() has been called
    and have made 'next_tb' (a reference to the last executed TB) invalid
    for linking it to directly call the next TB. tb_flush() can be called
    during the CPU execution loop from tb_gen_code(), during TB execution or
    by another thread while 'tb_lock' is released. Catch for translation
    buffer flush reliably by resetting this flag once before first TB lookup
    and each time we find it set before trying to add a direct jump. Don't
    touch in in tb_find_physical().
    
    Each vCPU has its own execution loop in multithreaded mode and thus
    should have its own copy of the flag to be able to reset it with its own
    'next_tb' and don't affect any other vCPU execution thread. So make this
    flag per-vCPU and move it to CPUState.
    
    In cpu_exec_nocache(), we only need to check if tb_flush() has been
    called from tb_gen_code() called by cpu_exec_nocache() itself. To do
    this reliably, preserve the old value of the flag, reset it before
    calling tb_gen_code(), check afterwards, and combine the saved value
    back to the flag.
    
    This patch is based on the patch "tcg: move tb_invalidated_flag to
    CPUState" from Paolo Bonzini <pbonzini@redhat.com>.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    2b9d138 View commit details
    Browse the repository at this point in the history
  3. tcg: Clean up from 'next_tb'

    The value returned from tcg_qemu_tb_exec() is the value passed to the
    corresponding tcg_gen_exit_tb() at translation time of the last TB
    attempted to execute. It is a little confusing to store it in a variable
    named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer
    and additional information in its two least significant bits. Break it
    down right away into two variables named 'last_tb' and 'tb_exit' which
    are a pointer to the last TB attempted to execute and the TB exit
    reason, correspondingly. This simplifies the code and improves its
    readability.
    
    Correct a misleading documentation comment for tcg_qemu_tb_exec() and
    fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in
    another couple of places.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    c903ce8 View commit details
    Browse the repository at this point in the history
  4. cpu-exec: elide more icount code if CONFIG_USER_ONLY

    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    [Alex Bennée: #ifndef replay code to match elided functions]
    Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    bonzini authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    ef4a928 View commit details
    Browse the repository at this point in the history
  5. tcg: reorganize tb_find_physical loop

    Put some comments and improve code structure. This should help reading
    the code.
    
    Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
    [Sergey Fedorov: provide commit message; bring back resetting of
    tb_invalidated_flag]
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Richard Henderson  <rth@twiddle.net>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    stsquad authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    adb7427 View commit details
    Browse the repository at this point in the history
  6. tcg: code_bitmap is not used by user-mode emulation

    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    [Sergey Fedorov: eliminate the field entirely in user-mode]
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Richard Henderson  <rth@twiddle.net>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    bonzini authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    2926283 View commit details
    Browse the repository at this point in the history
  7. tcg: Allow goto_tb to any target PC in user mode

    In user mode, there's only a static address translation, TBs are always
    invalidated properly and direct jumps are reset when mapping change.
    Thus the destination address is always valid for direct jumps and
    there's no need to restrict it to the pages the TB resides in.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Cc: Riku Voipio <riku.voipio@iki.fi>
    Cc: Blue Swirl <blauwirbel@gmail.com>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    0ddc3f4 View commit details
    Browse the repository at this point in the history
  8. tcg: Clean up direct block chaining safety checks

    We don't take care of direct jumps when address mapping changes. Thus we
    must be sure to generate direct jumps so that they always keep valid
    even if address mapping changes. Luckily, we can only allow to execute a
    TB if it was generated from the pages which match with current mapping.
    
    Document tcg_gen_goto_tb() declaration and note the reason for
    destination PC limitations.
    
    Some targets with variable length instructions allow TB to straddle a
    page boundary. However, we make sure that both of TB pages match the
    current address mapping when looking up TBs. So it is safe to do direct
    jumps into the both pages. Correct the checks for some of those targets.
    
    Given that, we can safely patch a TB which spans two pages. Remove the
    unnecessary check in cpu_exec() and allow such TBs to be patched.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    301d27e View commit details
    Browse the repository at this point in the history
  9. tcg: Clean up tb_jmp_unlink()

    Unify the code of this function with tb_jmp_remove_from_list(). Making
    these functions similar improves their readability. Also this could be a
    step towards making this function thread-safe.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    75edaad View commit details
    Browse the repository at this point in the history
  10. tcg: Extract removing of jumps to TB from tb_phys_invalidate()

    Move the code for removing jumps to a TB out of tb_phys_invalidate() to
    a separate static inline function tb_jmp_unlink(). This simplifies
    tb_phys_invalidate() and improves code structure.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    b08165d View commit details
    Browse the repository at this point in the history
  11. tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()

    tb_jmp_remove() was only used to remove the TB from a list of all TBs
    jumping to the same TB which is n-th jump destination of the given TB.
    Put a comment briefly describing the function behavior and rename it to
    better reflect its purpose.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    77d4722 View commit details
    Browse the repository at this point in the history
  12. tcg: Clarify thread safety check in tb_add_jump()

    The check is to make sure that another thread hasn't already done the
    same while we were outside of tb_lock. Mention this in a comment.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    61665ce View commit details
    Browse the repository at this point in the history
  13. tcg: Init TB's direct jumps before making it visible

    Initialize TB's direct jump list data fields and reset the jumps before
    tb_link_page() puts it into the physical hash table and the physical
    page list. So TB is completely initialized before it becomes visible.
    
    This is pure rearrangement of code to a more suitable place, though it
    could be a preparation for relaxing the locking scheme in future.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    fcc4269 View commit details
    Browse the repository at this point in the history
  14. tcg: Rearrange tb_link_page() to avoid forward declaration

    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    fe0f7fb View commit details
    Browse the repository at this point in the history
  15. tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB

    These fields do not contain pure pointers to a TranslationBlock
    structure. So uintptr_t is the most appropriate type for them.
    Also put some asserts to assure that the two least significant bits of
    the pointer are always zero before assigning it to jmp_list_first.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    2a85b04 View commit details
    Browse the repository at this point in the history
  16. tcg: Clean up direct block chaining data fields

    Briefly describe in a comment how direct block chaining is done. It
    should help in understanding of the following data fields.
    
    Rename some fields in TranslationBlock and TCGContext structures to
    better reflect their purpose (dropping excessive 'tb_' prefix in
    TranslationBlock but keeping it in TCGContext):
       tb_next_offset  =>  jmp_reset_offset
       tb_jmp_offset   =>  jmp_insn_offset
       tb_next         =>  jmp_target_addr
       jmp_next        =>  jmp_list_next
       jmp_first       =>  jmp_list_first
    
    Avoid using a magic constant as an invalid offset which is used to
    indicate that there's no n-th jump generated.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    705b39e View commit details
    Browse the repository at this point in the history
  17. translate-all: Adjust 256mb testing for mips64

    Make sure we preserve the high 32-bits when masking for mips64.
    
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    rth7680 committed Apr 29, 2016
    Copy the full SHA
    bd5f21c View commit details
    Browse the repository at this point in the history
  18. translate-all: add missing munmap of the code_gen guard page for MIPS

    Signed-off-by: Emilio G. Cota <cota@braap.org>
    Message-Id: <1461283314-2353-2-git-send-email-cota@braap.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    cota authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    697cb3a View commit details
    Browse the repository at this point in the history
  19. translate-all: remove redundant setting of tcg_ctx.code_gen_buffer_size

    The setting of tcg_ctx.code_gen_buffer_size is done by the only caller of
    size_code_gen_buffer(), which is code_gen_alloc():
    
      $ git grep size_code_gen_buffer
      translate-all.c:static inline size_t size_code_gen_buffer(size_t tb_size)
      translate-all.c:    tcg_ctx.code_gen_buffer_size = size_code_gen_buffer(tb_size);
    
    Signed-off-by: Emilio G. Cota <cota@braap.org>
    Message-Id: <1461283314-2353-1-git-send-email-cota@braap.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    cota authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    69e61e2 View commit details
    Browse the repository at this point in the history
  20. tcg: Note requirement on atomic direct jump patching

    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
    Message-Id: <1461341333-19646-12-git-send-email-sergey.fedorov@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    ad7d542 View commit details
    Browse the repository at this point in the history
  21. tcg/mips: Make direct jump patching thread-safe

    Ensure direct jump patching in MIPS is atomic by using
    atomic_read()/atomic_set() for code patching.
    
    Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
    Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
    Message-Id: <1461341333-19646-11-git-send-email-sergey.fedorov@linaro.org>
    Signed-off-by: Richard Henderson <rth@twiddle.net>
    [rth: Merged the deposit32 followup.]
    sergefdrv authored and rth7680 committed Apr 29, 2016
    Copy the full SHA
    9e5b1f0 View commit details
    Browse the repository at this point in the history
Older