zstd #16

michael-schwarz · 2022-01-27T10:23:35Z

https://github.com/facebook/zstd/ at 7543085013db1a20a848d166e5931edc49e3cc2f
- compiledb succeeds
- Fails as Goblint lacks support for some Intel SIMD things defined in xmmintrin.h:
  /usr/lib/gcc/x86_64-linux-gnu/7/include/xmmintrin.h:120: Unimplemented: doInit: unexpected NEXT_INIT for float __attribute__((__vector_size__(16),__may_alias__))
- Ignore intrinsics and build with make zstd

diff --git a/programs/Makefile b/programs/Makefile
index 16763e49..8ff545b8 100644
--- a/programs/Makefile
+++ b/programs/Makefile
@@ -138,7 +138,7 @@ allVariants: zstd zstd-compress zstd-decompress zstd-small zstd-frugal zstd-nole
 zstd : CPPFLAGS += $(THREAD_CPP) $(ZLIBCPP) $(LZMACPP) $(LZ4CPP)
 zstd : LDFLAGS += $(THREAD_LD) $(DEBUGFLAGS_LD)
 zstd : LDLIBS += $(ZLIBLD) $(LZMALD) $(LZ4LD)
-zstd : CPPFLAGS += -DZSTD_LEGACY_SUPPORT=$(ZSTD_LEGACY_SUPPORT)
+zstd : CPPFLAGS += -DZSTD_LEGACY_SUPPORT=$(ZSTD_LEGACY_SUPPORT) -DZSTD_NO_INTRINSICS
 ifneq (,$(filter Windows%,$(OS)))
 zstd : $(RES_FILE)
 endif

Need to comment the contents of includes/assert.h of Goblint for the parsing to succeed?

time ./goblint ../../bench-repos/zstd --disable ana.base.context.non-ptr --disable sem.unknown_function.spawn --set ana.thread.domain plain --enable exp.earlyglobs --set ana.base.privatization none --enable dbg.print_dead_code --enable justcil  --disable include_stdlib --set cppflags[+] -DZSTD_NO_INTRINSICS -v &> zstd.cil.c

real    0m5,836s
user    0m5,495s
sys     0m0,308s

So parsing and everything is also non-horrible here. The program does contain threads.

With ./goblint zstd.cil.c --disable ana.base.context.non-ptr --disable sem.unknown_function.spawn --set ana.thread.domain plain --enable exp.earlyglobs --set ana.base.privatization none --enable dbg.print_dead_code &> zstd.cil.out -v

Summary for all memory locations:
	safe:         2388
	vulnerable:     14
	unsafe:         16
	-------------------
	total:        2418

vars = 170735    evals = 432968  

Timings:
TOTAL                          224.473 s
  parse                           0.544 s
  convert to CIL                  0.694 s
  analysis                       223.237 s
    global_inits                    1.085 s
    solving                        221.102 s
      S.Dom.equal                     0.712 s
      postsolver                     52.939 s
    warn_global                     0.131 s
Timing used
Memory statistics: total=296341.32MB, max=934.02MB, minor=296304.66MB, major=2087.03MB, promoted=2050.37MB
    minor collections=141306  major collections=28 compactions=0

Found dead code on 3427 lines (including 3087 in uncalled functions)!
Total lines (logical LoC): 21415

The text was updated successfully, but these errors were encountered:

michael-schwarz · 2022-01-27T14:23:02Z

Ah, and it requires goblint/analyzer#557 again

michael-schwarz · 2022-01-27T16:58:35Z

It works non-incrementally, but when I try to run it (on master) incrementally (without any changes to the code) I get

Fatal error: exception Invalid_argument("List.iter2")
Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
Called from UpdateCil.update_ids.reset_fun in file "src/incremental/updateCil.ml", line 46, characters 4-70
Called from Stdlib__list.iter in file "list.ml", line 110, characters 12-15
Called from UpdateCil.update_ids in file "src/incremental/updateCil.ml", line 126, characters 2-43
Called from Maingoblint.diff_and_rename in file "src/maingoblint.ml", line 457, characters 22-92
Called from Maingoblint.main in file "src/maingoblint.ml", line 509, characters 110-130
Called from Stdlib.at_exit.new_exit in file "stdlib.ml", line 554, characters 59-63
Called from Stdlib.do_at_exit in file "stdlib.ml" (inlined), line 560, characters 20-61
Called from Std_exit in file "std_exit.ml", line 18, characters 8-20

@stilscher said she'd take a look. May be related to goblint/analyzer#574

sim642 · 2022-01-27T17:03:00Z

The stacktrace looks very similar to the issue I accidentally happened upon indeed: goblint/analyzer#542 (comment). Although actually it doesn't stem from there being a different number of statements, but rather locals:

List.iter2 (fun l o_l -> l.vid <- o_l.vid) f.slocals old_f.slocals;

stilscher · 2022-02-01T17:28:50Z

We managed to get the incremental analysis to work with the additional option --set cppflags[+] -D_FORTIFY_SOURCE=0.

The problem were multiple functions with the name realpath in the CIL file. The comparison and updating of ids for the incremental analysis assume that the functions in the CIL file have unique names. The realpath definition can be found in the /usr/include/x86_64-linux-gnu/bits/stdlib.h header. The mentioned header is only included when the fortify level is greater than 0. We have not looked into, why multiple realpath functions exist in the CIL file. It might be related to the merging errors I found in #8. However, analyzing with the above mentioned flag could be a reasonable approach for now.

But there are also duplicate global variables in the CIL file at which one needs to look.

stilscher · 2022-02-01T17:44:43Z

A not too rigorous configuration that worked well for an analysis (took 5-6min for the non-incremental run) is:

./goblint ../test-repos/zstd/ --disable sem.unknown_function.spawn --enable exp.earlyglobs --set ana.base.privatization none --enable incremental.save --enable dbg.print_dead_code --set cppflags[+] -DZSTD_NO_INTRINSICS --set cppflags[+] -D_FORTIFY_SOURCE=0 &> zstd.cil.out -v

Additionally adding intervals led to a steadily increasing number of contexts and I aborted after ~30min.
In the incremental run (without any changes) a big amount of time was needed for the postsolving. So it would be interesting to see how long the incremental analysis (without and with changes) takes when including the incremental postsolving from @sim642 that is currently on the interactive branch.

michael-schwarz · 2022-02-02T07:26:04Z

You might also want to use -DGOBLINT_NO_BSEARCH to remove some merging conflicts.

If we want to get the runtime a bit higher, we could also try:

enabling more int domains (e.g. congruence or enums)
enabling intervals and not them into contexts

michael-schwarz · 2022-02-02T14:13:02Z

Also, instead of deleting the contents of assert.h, it suffices to set -DGOBLINT_NO_ASSERT as an additional preprocessor flag.

michael-schwarz · 2022-02-16T09:08:43Z

After bumping goblint-cil, we now have zero CIL warnings when working with this repo 💪

jerhard · 2022-02-24T09:51:06Z

For the incremental run on zstd even on a program without any changes, the solving was reported to take around 3 seconds, out of ~8-12 seconds total. Which is noteworthy because ideally the solving should take ~0 seconds for unchanged code.

Yesterday we, @stilscher @michael-schwarz and I, looked further into this. It turns out that most of this incremental solving time, around >2,7 seconds, was spent on the hascons-relifting. If one disables hashconsing, the overall runtime for a non-incremental run is somewhat degraded, but not too much (roughly from 131 seconds to 138 seconds). There was some significant on memory usage though, IIRC. @stilscher has the precise numbers.

There is no easy fix to get rid of the relifting in the non-server incremental mode, as weak hashtables used for hashconsing cannot be marshaled (at least not directly). For the server mode, one could probably get away with skipping the relifting in the solver, but having to run experiments in server mode might make things more complicated.

One might just prefer to use the incremental analysis without hash-consing.

stilscher · 2022-03-03T17:56:42Z

Here are the numbers that we collected for a non-incremental run with hashconsing:

TOTAL                          131.208 s
  parse                           0.497 s
  convert to CIL                  0.683 s
  analysis                       130.029 s
    createCFG                       0.502 s
    global_inits                    0.671 s
    solving                        126.195 s
      vs                              0.001 s
      Sol'.solve                     125.932 s
        relift                          0.000 s
        S.Dom.equal                     0.414 s
        postsolver                     29.456 s
      split_solution                  0.263 s
    warn_global                     0.063 s
      access                          0.050 s
Timing used
Memory statistics: total=497409.75MB, max=1878.66MB, minor=497216.06MB, major=3929.75MB, promoted=3736.05MB
    minor collections=237121  major collections=28 compactions=0

and without hashconsing

TOTAL                          138.896 s
  parse                           0.535 s
  convert to CIL                  0.698 s
  analysis                       137.664 s
    createCFG                       0.499 s
    global_inits                    0.076 s
    solving                        129.479 s
      vs                              0.001 s
      Sol'.solve                     128.823 s
        relift                          0.000 s
        S.Dom.equal                     4.364 s
        postsolver                     41.818 s
      split_solution                  0.657 s
    warn_global                     0.086 s
      access                          0.068 s
Timing used
Memory statistics: total=486846.38MB, max=2484.53MB, minor=486616.93MB, major=4642.06MB, promoted=4412.62MB
    minor collections=232065  major collections=28 compactions=0

michael-schwarz · 2022-03-11T09:31:20Z

I tried adding "ZSTD_customMalloc","ZSTD_customCalloc","POOL_create" as malloc wrappers in a hope to get the number of race warnings down, but that did not help.

sim642 · 2022-03-15T14:05:53Z

If inlines merging is enabled again, aren't all the "is used for two distinct globals" warnings just due to the silly way zstd Makefile uses relative paths? If you read the warnings, they're all about mem.h, but referenced in different ways:

../lib/common/mem.h
../lib/legacy/../common/mem.h
../lib/dictBuilder/../common/mem.h
../lib/decompress/../common/mem.h
../lib/compress/../common/mem.h

sim642 · 2022-04-15T12:09:46Z

Yesterday I looked into the data races Goblint finds in zstd. Besides the valid data race on threadLimit as pointed out here, there's two kinds of spurious races we still find if we additionally activate symb_locks (but not var_eq!):

In POOL_create_advanced there are accesses to a freshly allocated struct before the threads are created in a loop. A freshness/thread-locality/escape analysis should hopefully be able to rule these out.
Accesses in functions like POOL_add_internal, which are called from functions that always call them with a lock held (POOL_add). Ideally we'd have a symbolic lock held there, but var_eq is required to preserve the symbolic lock from the outer function as the pointers get passed in via arguments to the inner function.

Unfortunately, also enabling var_eq causes the analysis to not terminate in at least 40 minutes:

runtime: 00:40:23.881
vars: 2497599, evals: 3116757
max updates: 32 for var node 4469 "ret = (FIO_prefs_t *)((FIO_prefs_t *)tmp);" on programs/fileio.c:229:5-229:71

|rho|=2497599
|called|=325
|stable|=2465598
|infl|=2497070
|wpoint|=232
|side_dep|=0
|side_infl|=0
Found 260457 contexts for 1193 functions. Top 5 functions:
24901	contexts for entry state of MEM_readST___17 (172569) on lib/common/mem.h:212:1-212:115
18684	contexts for entry state of MEM_isLittleEndian___18 (172565) on lib/common/mem.h:158:1-176:1
18583	contexts for entry state of MEM_read32___18 (172567) on lib/common/mem.h:210:1-210:110
17048	contexts for entry state of MEM_64bits___18 (172564) on lib/common/mem.h:156:1-156:95
11366	contexts for entry state of ZSTD_NbCommonBytes___4 (172783) on lib/compress/zstd_compress_internal.h:698:1-792:1

Memory statistics: total=5314342.41MB, max=7600.24MB, minor=5313863.33MB, major=40634.39MB, promoted=40155.32MB
    minor collections=2533967  major collections=70 compactions=0

With just symb_locks, it terminates in just 5 minutes:

runtime: 00:04:56.073
vars: 258037, evals: 797082
max updates: 57 for var node 2417 "flNb ++;" on programs/zstdcli.c:1249:9-1256:9

|rho|=258037
|called|=124
|stable|=236877
|infl|=257600
|wpoint|=1858
|side_dep|=0
|side_infl|=0
Found 12771 contexts for 1709 functions. Top 5 functions:
64	contexts for entry state of longCommandWArg (169760) on programs/zstdcli.c:428:1-434:1
58	contexts for entry state of FSE_readNCount_bmi2 (24917) on lib/common/entropy_common.c:234:1-245:1
58	contexts for entry state of FSE_readNCount_body (170219) on lib/common/entropy_common.c:69:1-215:1
58	contexts for entry state of FSE_readNCount_body_default (170220) on lib/common/entropy_common.c:218:1-223:1
57	contexts for entry state of BITv05_reloadDStream (175604) on lib/legacy/zstd_v05.c:857:1-884:1

Memory statistics: total=838726.42MB, max=1074.12MB, minor=838637.35MB, major=4296.13MB, promoted=4207.05MB
    minor collections=399930  major collections=37 compactions=0

Notably, var_eq causes an extreme number of contexts to appear compared to without it, so that's probably the source of non-termination. I haven't yet tried looking into what variable equalities end up collecting into those contexts.

sim642 · 2022-04-18T13:22:46Z

Notably, var_eq causes an extreme number of contexts to appear compared to without it, so that's probably the source of non-termination. I haven't yet tried looking into what variable equalities end up collecting into those contexts.

The issue is avoided using --set ana.ctx_insens[+] var_eq. Then the analysis terminates reasonably and the previously problematic inner accesses get the required symbolic lock.

See goblint/bench#16 (comment).

sim642 · 2022-04-19T09:38:56Z

By the way, the bigger-benchmarks confs don't declare ZSTD_customMalloc and ZSTD_customCalloc as malloc wrappers, so all the allocations are joined into a single constraint unknown with a supertop value.
So any pointers written to allocated structs are just forgotten and wouldn't get called later when dereferenced. This could have big implications for soundness...

sim642 · 2022-04-19T12:02:07Z

Using goblint/analyzer#690, it's actually possible to get rid of all the spurious races now.

sim642 · 2022-04-28T09:10:10Z

After all the unsoundness saga and subsequent precision improvement attempt, it's revealed that zstd has some intricate race-freedom reasons, which Goblint is far from being able to handle: goblint/analyzer#707 (comment).

TimOrtel · 2022-07-31T09:55:28Z

I am getting this error when trying to run with efficiency.py: Fatal error: exception Failure("No suitable function to start from.")
This is the config file i am using: https://github.com/TimOrtel/analyzer/blob/175cdf05c2f4209503bf02e81ae7a06259e89e73/conf/min_incr_zstd.json

Am I missing some kind of config option?

michael-schwarz · 2022-07-31T18:03:25Z

The most likely culprit would be that the compilation database is not generated correctly. Have you checked the contents of the compile_commands.json?

TimOrtel · 2022-07-31T22:42:42Z

Compile commands is empty: []
prepare.log has the following lines:
==> building with threading support ==> building zstd with .gz compression support ==> no liblzma, building zstd without .xz/.lzma support ==> no liblz4, building zstd without .lz4 support LINK obj/conf_....../zstd zstd build completed

sim642 · 2022-08-01T09:07:38Z

You probably need to do make clean and recreate the compilation database. If make has everything cached and does nothing, then the compilation database will not contain anything.

TimOrtel · 2022-08-06T21:37:10Z

This cannot be the case, because the script efficency.py clones zstd from git on every execution. Therefore make is executed every time on a fresh zstd folder.

michael-schwarz · 2022-08-08T11:19:37Z

I think @stilscher is the one best equipped to help you here, she did the benchmarking for zstd for our last submission.

stilscher · 2022-08-09T07:33:57Z

I was not able to reconstruct the problem. I ran python3 efficiency.py /absolute/path/to/the/analyzer 1 on the branch on your fork that contains the scripts (I fixed the make error by substituting Seq.exists in detectRenamedFunctionsRecursive.ml because it does not exist in the included module). prepare.log does not contain any error messages, a compilation database is created and a main method found and analyzed.
Which compiledb version are you using?

TimOrtel · 2022-08-09T08:01:29Z

Perhaps I am using the wrong compiledb. I installed mine through pip. The version is 0.10.1. Is there another compiledb tool?

stilscher · 2022-08-09T08:20:40Z

Not that I know of. I am using the same version, also installed through pip.

TimOrtel · 2022-08-09T08:44:30Z

Are you sure you are on the correct branch? I dont get this make error. Perhaps we also have differen opam switch configurations?

michael-schwarz added project Project to analyze parsing-succeeds labels Jan 27, 2022

michael-schwarz mentioned this issue Jan 27, 2022

Add option include_stdlib to toggle inclusion of Goblint's stdlib.c goblint/analyzer#576

Closed

sim642 mentioned this issue Jan 27, 2022

Handle pseudo return node in CFG comparison goblint/analyzer#574

Closed

michael-schwarz mentioned this issue Feb 1, 2022

Preprocessor directives to disable Goblint includes goblint/analyzer#584

Merged

michael-schwarz mentioned this issue Feb 2, 2022

Proposal: Do not attempt to merge inline functions by default goblint/cil#72

Merged

michael-schwarz mentioned this issue Feb 15, 2022

Make __attribute__((__fallthrough__)); no longer parse as declaration goblint/cil#75

Merged

sim642 added the goblint Goblint-specific problem label Feb 22, 2022

michael-schwarz mentioned this issue Feb 22, 2022

Add more invalidate actions for library functions goblint/analyzer#608

Merged

stilscher mentioned this issue Feb 28, 2022

Do not allow to collect globals with duplicate identifiers in map goblint/analyzer#615

Merged

stilscher mentioned this issue Mar 8, 2022

__LINE__ in macro can cause several functions to be changed goblint/analyzer#630

Open

sim642 added a commit to goblint/analyzer that referenced this issue Apr 18, 2022

Add skipped test where var_eq gets exponentially many contexts

58825b9

See goblint/bench#16 (comment).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstd #16

zstd #16

michael-schwarz commented Jan 27, 2022 •

edited by stilscher

Loading

michael-schwarz commented Jan 27, 2022

michael-schwarz commented Jan 27, 2022

sim642 commented Jan 27, 2022

stilscher commented Feb 1, 2022 •

edited

Loading

stilscher commented Feb 1, 2022 •

edited

Loading

michael-schwarz commented Feb 2, 2022

michael-schwarz commented Feb 2, 2022

michael-schwarz commented Feb 16, 2022

jerhard commented Feb 24, 2022 •

edited

Loading

stilscher commented Mar 3, 2022

michael-schwarz commented Mar 11, 2022

sim642 commented Mar 15, 2022

sim642 commented Apr 15, 2022

sim642 commented Apr 18, 2022

sim642 commented Apr 19, 2022

sim642 commented Apr 19, 2022

sim642 commented Apr 28, 2022

TimOrtel commented Jul 31, 2022

michael-schwarz commented Jul 31, 2022

TimOrtel commented Jul 31, 2022

sim642 commented Aug 1, 2022

TimOrtel commented Aug 6, 2022

michael-schwarz commented Aug 8, 2022

stilscher commented Aug 9, 2022 •

edited

Loading

TimOrtel commented Aug 9, 2022

stilscher commented Aug 9, 2022

TimOrtel commented Aug 9, 2022

zstd #16

zstd #16

Comments

michael-schwarz commented Jan 27, 2022 • edited by stilscher Loading

michael-schwarz commented Jan 27, 2022

michael-schwarz commented Jan 27, 2022

sim642 commented Jan 27, 2022

stilscher commented Feb 1, 2022 • edited Loading

stilscher commented Feb 1, 2022 • edited Loading

michael-schwarz commented Feb 2, 2022

michael-schwarz commented Feb 2, 2022

michael-schwarz commented Feb 16, 2022

jerhard commented Feb 24, 2022 • edited Loading

stilscher commented Mar 3, 2022

michael-schwarz commented Mar 11, 2022

sim642 commented Mar 15, 2022

sim642 commented Apr 15, 2022

sim642 commented Apr 18, 2022

sim642 commented Apr 19, 2022

sim642 commented Apr 19, 2022

sim642 commented Apr 28, 2022

TimOrtel commented Jul 31, 2022

michael-schwarz commented Jul 31, 2022

TimOrtel commented Jul 31, 2022

sim642 commented Aug 1, 2022

TimOrtel commented Aug 6, 2022

michael-schwarz commented Aug 8, 2022

stilscher commented Aug 9, 2022 • edited Loading

TimOrtel commented Aug 9, 2022

stilscher commented Aug 9, 2022

TimOrtel commented Aug 9, 2022

michael-schwarz commented Jan 27, 2022 •

edited by stilscher

Loading

stilscher commented Feb 1, 2022 •

edited

Loading

stilscher commented Feb 1, 2022 •

edited

Loading

jerhard commented Feb 24, 2022 •

edited

Loading

stilscher commented Aug 9, 2022 •

edited

Loading