Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with parallel jobs? #7

Open
ghost opened this issue Aug 3, 2022 · 50 comments
Open

Bug with parallel jobs? #7

ghost opened this issue Aug 3, 2022 · 50 comments

Comments

@ghost
Copy link

ghost commented Aug 3, 2022

When I run readpst -M -b file.pst it extracts less mail, and a slightly different amount each time, than when it's run with the -j 0 option. An older libpst 0.71 extracts all mail in parallel without -j 0 so maybe there's some concurrency bug that got introduced since then? I can send you pst files if you have trouble reproducing, but for me this happens on basically every .pst file.

@pabs3
Copy link
Member

pabs3 commented Aug 4, 2022 via email

@ghost
Copy link
Author

ghost commented Aug 4, 2022

It seems this bug doesn't depend on the version of libpst, but the version of debian. I built the most recent and version 0.6.71 on a debian 11 server and both versions had this behavior while building the same source on debian 10 worked fine. I can reproduce it with the minimal debian 11 container image from https://images.linuxcontainers.org/ and lxc. After:

apt update && apt install wget readpst
wget https://files.enlacehacktivista.org/Pronico%20mail/aBrezhnev.pst
readpst  -D -M -b aBrezhnev.pst
readpst  -D -M -b aBrezhnev.pst
readpst  -D -M -b aBrezhnev.pst
readpst  -j 0 -D -M -b aBrezhnev.pst
du -h --max-depth=1

I get:

419M    ./aBrezhnev
408M    ./aBrezhnev1
417M    ./aBrezhnev2
436M    ./aBrezhnev3

The last one with -j 0 is the largest, while the ones with parallel jobs miss some random amount of mail

@pabs3
Copy link
Member

pabs3 commented Aug 4, 2022

I can reproduce it with the file you linked now.

Thanks for the debugging, that is a very strange bug.

Perhaps it is a compiler or optimsation issue.

I will try to figure out what is going on.

@pabs3
Copy link
Member

pabs3 commented Aug 4, 2022

It definitely isn't affected by the optimisation level. Seems to happen with GCC 9-12 but not GCC 8 on Debian buster.

@pabs3
Copy link
Member

pabs3 commented Aug 4, 2022

I'll continue looking at this tomorrow, but if you have any more ideas for debugging, that would be great.

@ghost
Copy link
Author

ghost commented Aug 4, 2022

Yea looks like this could be hard to debug, I "fixed" by just running with -j 0 and xargs -P to parallelize. I tried building with clang on both servers and again it works fine on debian 10, not on 11. Of course it's different versions, llvm-7 vs llvm-11 but it doesn't seem likely that both newer clang and gcc would cause compiler issue. readpst also links a lot of libraries, maybe it's caused by a different version of some library between debian 10 and 11?

@pabs3
Copy link
Member

pabs3 commented Aug 4, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 5, 2022

Looks like you are onto something, building the binary on buster and then copying it to bullseye breaks the parallel extraction even though it still runs fine.

@pabs3
Copy link
Member

pabs3 commented Aug 5, 2022

Upgrading libgsf-1-114 didn't change anything, but upgrading libc6 libgd3 libglib2.0-0 libstdc++6 (and their deps) did. Their dependency versions prevented upgrading them individually.

@pabs3
Copy link
Member

pabs3 commented Aug 6, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 6, 2022

According to the debbisect process, upgrading libc6 from 2.28-10 to 2.29-1 broken the jobs feature.

@pabs3 pabs3 closed this as completed Aug 6, 2022
@pabs3 pabs3 reopened this Aug 6, 2022
@pabs3
Copy link
Member

pabs3 commented Aug 8, 2022

I checked this issue on a Fedora 36 live image with glibc 2.35 and found that it still occurs there. I'll discuss it further with the Debian and upstream glibc maintainers.

@ghost
Copy link
Author

ghost commented Aug 8, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 9, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 10, 2022

After a false start I managed to redo the bisect and got this bad commit:

0b727ed4d605d9318cb0d323c88abb0d5a441a9b is the first bad commit
commit 0b727ed4d605d9318cb0d323c88abb0d5a441a9b
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Jul 13 16:43:24 2017 -0300

    libio: Flush stream at freopen (BZ#21037)
    
    As POSIX states [1] a freopen call should first flush the stream as if by a
    call fflush.  C99 (n1256) and C11 (n1570) only states the function should
    first close any file associated with the specific stream.  Although current
    implementation only follow C specification, current BSD and other libc
    implementation (musl) are in sync with POSIX and fflush the stream.
    
    This patch change freopen{64} to fflush the stream before actually reopening
    it (or returning if the stream does not support reopen).  It also changes the
    Linux implementation to avoid a dynamic allocation on 'fd_to_filename'.
    
    Checked on x86_64-linux-gnu.
    
            [BZ #21037]
            * libio/Makefile (tests): Add tst-memstream4 and tst-wmemstream4.
            * libio/freopen.c (freopen): Sync stream before reopen and adjust to
            new fd_to_filename interface.
            * libio/freopen64.c (freopen64): Likewise.
            * libio/tst-memstream.h: New file.
            * libio/tst-memstream4.c: Likewise.
            * libio/tst-wmemstream4.c: Likewise.
            * sysdeps/generic/fd_to_filename.h (fd_to_filename): Change signature.
            * sysdeps/unix/sysv/linux/fd_to_filename.h (fd_to_filename): Likewise
            and remove internal dynamic allocation.
    
    [1] http://pubs.opengroup.org/onlinepubs/9699919799/

:100644 100644 1fdd50a24c2629f064f8cba8d1e45814c12600d9 d0f7f5192973e5f4923f393695fd7ae88ad314ca M	ChangeLog
:040000 040000 3ea0511dc410e70a011c68e8b5e6c06b298486d0 6e6c5b0412c6493cb1b437e9a2a8807f4b9aa346 M	libio
:040000 040000 f510da053f81e8ac2aedff729918f0f4a8727b89 921bde8e25684e7d8a2dc8c8af1c9228451863a3 M	sysdeps
bisect run success

This is the git bisect log, which you can replay to copy/visualise the results locally:

$ git bisect log
git bisect start
# good: [3c03baca37fdcb52c3881e653ca392bba7a99c2b] Update NEWS, version.h, and features.h for glibc 2.28.
git bisect good 3c03baca37fdcb52c3881e653ca392bba7a99c2b
# bad: [56c86f5dd516284558e106d04b92875d5b623b7a] Tag 2.29 release
git bisect bad 56c86f5dd516284558e106d04b92875d5b623b7a
# bad: [f6b3331bbae638d1bb50813fceb429d3b3dc0eb9] mktime: fix bug with Y2038 DST transition
git bisect bad f6b3331bbae638d1bb50813fceb429d3b3dc0eb9
# good: [d90c9b1a121295a5e31810b899ab637f68898857] Invert sense of list of i686-class processors in sysdeps/x86/cpu-features.h.
git bisect good d90c9b1a121295a5e31810b899ab637f68898857
# bad: [5b784e8834ae92961eb6be89863ffb041e9c6e75] Add new ELF note types from Linux 4.19 to elf.h.
git bisect bad 5b784e8834ae92961eb6be89863ffb041e9c6e75
# bad: [2a4b25fad858ccae361afca6d4710e0e6b01b2b5] Use common bits/shm.h for more architectures.
git bisect bad 2a4b25fad858ccae361afca6d4710e0e6b01b2b5
# good: [7b1f9406761331cf35fe521fbdb592beecf68a2c] i386: Use _dl_runtime_[resolve|profile]_shstk for SHSTK [BZ #23716]
git bisect good 7b1f9406761331cf35fe521fbdb592beecf68a2c
# bad: [e7624d708d12c608daf179d90c0d6de74c24dd2c] Adjust name of ld.so in test-container.c.
git bisect bad e7624d708d12c608daf179d90c0d6de74c24dd2c
# bad: [a1ccc0654b4e858022bed10f4fb5d5ab6df58b62] x86: Use RTM intrinsics in pthread mutex lock elision
git bisect bad a1ccc0654b4e858022bed10f4fb5d5ab6df58b62
# good: [d0e91e8c4ef77997c940538eaa752799c9dda2d3] Move MREMAP_* to bits/mman-shared.h.
git bisect good d0e91e8c4ef77997c940538eaa752799c9dda2d3
# bad: [0b727ed4d605d9318cb0d323c88abb0d5a441a9b] libio: Flush stream at freopen (BZ#21037)
git bisect bad 0b727ed4d605d9318cb0d323c88abb0d5a441a9b
# first bad commit: [0b727ed4d605d9318cb0d323c88abb0d5a441a9b] libio: Flush stream at freopen (BZ#21037)

@pabs3
Copy link
Member

pabs3 commented Aug 10, 2022

I note that freopen is used by pst_reopen, which is used by try_fork in readpst. So I'm thinking that maybe freopen is being used incorrectly somehow.

@pabs3
Copy link
Member

pabs3 commented Aug 14, 2022

I used a simple program to check that freopen always has reset the file position, so that isn't the problem and the issue seems more complicated than I thought.

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

I am now running the command with the debug log enabled, with and without -j0 in both Debian buster and bullseye, then I'll compare the logs and see if I can see what happened.

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

Confirmed, dropping the debug log introduces the error again...

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

After enabling some commented out printf calls, I noticed that one of the readpst child processes crashed with SIGSEGV, investigating.

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

Looks like maybe my printf patch caused the SIGSEGV.

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

Noticed another thing, only one folder seems affected and it seems the missing items have been skipped. @amulepeweichan can you confirm that you only ever get the issue with one folder and its the same folder as me?

$ readpst -j 0 -D -M -b aBrezhnev.pst
...
Processing Folder "Входящие"
	"Входящие" - 926 items done, 18 items skipped.
...
$ readpst -D -M -b aBrezhnev.pst
...
	"Входящие" - 896 items done, 48 items skipped.
...
$ diff -Naur <(cd aBrezhnev ; find | sort) <(cd aBrezhnev1 ; find | sort)
--- /dev/fd/63	2022-08-15 01:40:32.111623788 +0000
+++ /dev/fd/62	2022-08-15 01:40:32.111623788 +0000
@@ -904,40 +904,10 @@
 ./Входящие/894
 ./Входящие/895
 ./Входящие/896
-./Входящие/897
-./Входящие/898
-./Входящие/899
 ./Входящие/9
 ./Входящие/90
-./Входящие/900
-./Входящие/901
-./Входящие/902
-./Входящие/903
-./Входящие/904
-./Входящие/905
-./Входящие/906
-./Входящие/907
-./Входящие/908
-./Входящие/909
 ./Входящие/91
-./Входящие/910
-./Входящие/911
-./Входящие/912
-./Входящие/913
-./Входящие/914
-./Входящие/915
-./Входящие/916
-./Входящие/917
-./Входящие/918
-./Входящие/919
 ./Входящие/92
-./Входящие/920
-./Входящие/921
-./Входящие/922
-./Входящие/923
-./Входящие/924
-./Входящие/925
-./Входящие/926
 ./Входящие/93
 ./Входящие/94
 ./Входящие/95

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

OK, a second run got that folder affected and another one too, nevermind. The skip count increase is exactly the same as the missing item though.

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

These debug messages are printed for each extra skipped item:

libpst.c:1283: pst_parse_block() returned an error for d_ptr->desc->i_id [0x3bc4]
readpst.c:262: A NULL item was seen

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

The reasons for each unparsed item seem different. I am getting the feeling that the code is reading data from an incorrect randomised location.

libpst.c(1675) ERROR: Unknown block constant - 0x2df4 for id 0x2ae10
libpst.c(1579) Unknown second block constant - 0x3404 0x3004 for id 0x2d54c
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x2d54c]
readpst.c(262) A NULL item was seen

libpst.c(1675) ERROR: Unknown block constant - 0x5f0e for id 0x2d84c
libpst.c(1675) ERROR: Unknown block constant - 0x6f for id 0xc400
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0xc400]
readpst.c(262) A NULL item was seen

libpst.c(1630) internal error (7c.b5 offset 0x20) in reading block id 0xfe06
libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x11922
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x11922]
readpst.c(262) A NULL item was seen

libpst.c(1675) ERROR: Unknown block constant - 0x3a for id 0x19bf0
libpst.c(1675) ERROR: Unknown block constant - 0x64 for id 0x19bf4
libpst.c(1675) ERROR: Unknown block constant - 0x41 for id 0x21ef0
libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x2206a
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x2206a]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x220be
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x220be]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x220f2
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x220f2]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x2219a
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x2219a]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x238b8
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x238b8]
readpst.c(262) A NULL item was seen

libpst.c(1675) ERROR: Unknown block constant - 0x1e41 for id 0x238d6
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x238d6]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x23db6
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x23db6]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x23e5e
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x23e5e]
readpst.c(262) A NULL item was seen

libpst.c(1675) ERROR: Unknown block constant - 0x30 for id 0x23f3a
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x23f3a]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x23fa6
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x23fa6]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x2429a
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x2429a]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x242ee
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x242ee]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x2576a
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x2576a]
readpst.c(262) A NULL item was seen

libpst.c(1675) ERROR: Unknown block constant - 0x64 for id 0x257c8
libpst.c(1675) ERROR: Unknown block constant - 0x6d for id 0x257cc
libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x257f6
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x257f6]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x25866
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x25866]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x258d6
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x258d6]
readpst.c(262) A NULL item was seen

libpst.c(1579) Unknown second block constant - 0x71 0x75 for id 0x25c62
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x25c62]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x25c9a
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x25c9a]
readpst.c(262) A NULL item was seen

libpst.c(1675) ERROR: Unknown block constant - 0x413a for id 0x25fa2
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x25fa2]
readpst.c(262) A NULL item was seen

libpst.c(1586) internal error (bc.b5.desc offset 0x9e0) in reading block id 0x2607c
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x2607c]
readpst.c(262) A NULL item was seen

libpst.c(1567) internal error (bc.b5 offset 0x20) in reading block id 0x26120
libpst.c(1283) pst_parse_block() returned an error for d_ptr->desc->i_id [0x26120]
readpst.c(262) A NULL item was seen

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

Aha, managed to reproduce it with a full debug log, so enabling that doesn't fix the issue. Good, that would have been weird :)

@pabs3
Copy link
Member

pabs3 commented Aug 15, 2022

Looking at the full debug log, it seems to get completely different data and if I use hexed to search for the bad data, it is at 0x6AF321C, while the good data is at 0x6600, so my hunch about the read location being wrong seems correct.

@pabs3
Copy link
Member

pabs3 commented Aug 16, 2022

With both -j 1 and -j 2, the file position reported by ftell is 0 after the call to freopen, with both the good and bad versions of glibc.

@pabs3
Copy link
Member

pabs3 commented Aug 16, 2022

@amulepeweichan are you able to reproduce this with any other PST files? The size of the logs with this one makes it challenging to analyse and debug. I tried with the smallest ones on that site and didn't get the error.

@ghost
Copy link
Author

ghost commented Aug 18, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 19, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Aug 21, 2022

Edited the code to delete the output_mutex but that did not help.

@pabs3
Copy link
Member

pabs3 commented Aug 21, 2022

Same for the global_children semaphore.

@pabs3
Copy link
Member

pabs3 commented Oct 11, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Nov 1, 2022

I can still reproduce this in Debian bookworm (GCC 12 glibc 2.35) and sid (GCC 12 glibc 2.36).

@pabs3
Copy link
Member

pabs3 commented Nov 12, 2022

I discovered that switching from freopen() to fclose() and fopen() fixes the issue.

@pabs3
Copy link
Member

pabs3 commented Nov 12, 2022 via email

@pabs3
Copy link
Member

pabs3 commented Nov 23, 2022

I have sent a mail to the libc-help mailing list asking for help figuring this issue out.

@pabs3

This comment was marked as outdated.

@pabs3
Copy link
Member

pabs3 commented Nov 23, 2022

I discovered that switching from freopen() to fseek(0) does not fix the issue and can cause segfaults.

@pabs3
Copy link
Member

pabs3 commented Nov 23, 2022

I discovered that switching from freopen() to fflush()+fclose()+fopen() also does fix the issue.

@fhanzlik
Copy link

I now tried to extract mails from .pst by readpst (from Fedora 37 libpst-0.6.76-9.fc37.x86_64 package), and it seem behave strangely, because number of from .PST extracted messages varies, both when using different -e / -m / -M switches and even when using the same switch repeatedly(!).
An example where I run the readpst 3 times in cycle with different switches and list the number of extracted files:

$ for i in 1 2 3; do for s in e m M; do
  A="outlook.pst"; OD="${A}-$s-$i";
  [ -d "$OD" ] && rm -rf "$OD"; mkdir "$OD";
  echo -e "\n$A -> $OD/ :";
  /usr/bin/time readpst -D -o "$OD" -$s -t e "$A" >"$OD.lst";
  echo -n "No. of files under $OD/ : "; find $OD -type f -print|wc -l;
done; done

I got results:

outlook.pst -> outlook.pst-e-1/ :
8.91user 5.95system 0:01.49elapsed 992%CPU (0avgtext+0avgdata 20600maxresident)k
0inputs+5104768outputs (6major+712881minor)pagefaults 0swaps
No. of files under outlook.pst-e-1/ : 3293

outlook.pst -> outlook.pst-m-1/ :
12.75user 18.02system 0:57.17elapsed 53%CPU (0avgtext+0avgdata 21760maxresident)k
8inputs+15047672outputs (12major+1017676minor)pagefaults 0swaps
No. of files under outlook.pst-m-1/ : 6623

outlook.pst -> outlook.pst-M-1/ :
8.73user 5.92system 0:01.49elapsed 978%CPU (0avgtext+0avgdata 20444maxresident)k
0inputs+5142264outputs (3major+712103minor)pagefaults 0swaps
No. of files under outlook.pst-M-1/ : 3290

outlook.pst -> outlook.pst-e-2/ :
8.38user 5.16system 0:01.40elapsed 964%CPU (0avgtext+0avgdata 20548maxresident)k
0inputs+5133096outputs (6major+618609minor)pagefaults 0swaps
No. of files under outlook.pst-e-2/ : 3301

outlook.pst -> outlook.pst-m-2/ :
12.57user 17.95system 1:53.00elapsed 27%CPU (0avgtext+0avgdata 21832maxresident)k
80inputs+15450920outputs (25major+1001603minor)pagefaults 0swaps
No. of files under outlook.pst-m-2/ : 6645

outlook.pst -> outlook.pst-M-2/ :
8.74user 5.89system 0:01.57elapsed 928%CPU (0avgtext+0avgdata 20508maxresident)k
0inputs+5149504outputs (2major+681726minor)pagefaults 0swaps
No. of files under outlook.pst-M-2/ : 3295

outlook.pst -> outlook.pst-e-3/ :
8.62user 6.34system 0:02.18elapsed 686%CPU (0avgtext+0avgdata 20300maxresident)k
80inputs+5082768outputs (3major+693470minor)pagefaults 0swaps
No. of files under outlook.pst-e-3/ : 3291

outlook.pst -> outlook.pst-m-3/ :
13.13user 19.92system 2:15.04elapsed 24%CPU (0avgtext+0avgdata 21932maxresident)k
64inputs+15884072outputs (15major+1052849minor)pagefaults 0swaps
No. of files under outlook.pst-m-3/ : 6633

outlook.pst -> outlook.pst-M-3/ :
8.38user 6.01system 0:02.30elapsed 625%CPU (0avgtext+0avgdata 20332maxresident)k
88inputs+5162104outputs (3major+672984minor)pagefaults 0swaps
No. of files under outlook.pst-M-3/ : 3295

$ ls -l outlook.pst
-rw-r--r--. 1 jpraisova jpraisova 1950368768 Dec 16  2009 outlook.pst

Number of extracted messages digestedly in the switch/run_number matrix:
-e : 3293	3301	3291
-m : 6623	6645	6633
-M : 3290	3295	3295

Also, isn't it weird that the run time is around two seconds with -e and -M switch, but around two minutes with -m switch?
Again the run time at a glance:

-e : 1.49	1.40		2.18
-m : 57.17	1:53.00	2:15.04
-M : 1.49	1.57		2.30

(it seems as readpst uses multiple CPU threads with -e/-M switch and only one with -m switch, but even then the differences are very big)

Also, according to readpst man page it appears that number of extracted files with '-m' should be double (.EML + .MSG) the number with -e/-M - but is bigger - maybe because there are some (maybe undocumented) files with 'temp_file_attachment' name.
Is there any explanation for this?

@pabs3
Copy link
Member

pabs3 commented Apr 21, 2023 via email

@fhanzlik
Copy link

Hi Paul, thanks for your help!
The previously posted above listing of the weird readpst run results for switches -e, -m, and -M (and 3x each) now looks fine (switch-round, No.of exported files, 'time' outputs) :

e-1 : 3304 : 11.22user 10.32system 9:27.41elapsed 3%CPU (0avgtext+0avgdata 23196maxresident)k	3811000inputs+5162824outputs (41major+6067minor)pagefaults 0swaps
m-1 : 6681 :  6.06user 4.38system 0:24.76elapsed 42%CPU (0avgtext+0avgdata 24036maxresident)k	0inputs     +12711808outputs (2major+7489minor)pagefaults 0swaps
M-1 : 3304 :  3.22user 1.28system 0:04.60elapsed 97%CPU (0avgtext+0avgdata 23676maxresident)k	0inputs      +5162768outputs (0major+6101minor)pagefaults 0swaps

e-2 : 3304 :  3.57user 1.41system 0:10.42elapsed 47%CPU (0avgtext+0avgdata 23676maxresident)k	0inputs      +5162744outputs (0major+6102minor)pagefaults 0swaps
m-2 : 6681 :  6.36user 4.56system 1:44.24elapsed 10%CPU (0avgtext+0avgdata 23968maxresident)k	0inputs     +12711816outputs (0major+7532minor)pagefaults 0swaps
M-2 : 3304 :  3.13user 1.27system 0:04.42elapsed 99%CPU (0avgtext+0avgdata 23652maxresident)k	0inputs      +5162768outputs (0major+6100minor)pagefaults 0swaps

e-3 : 3304 :  3.40user 1.33system 0:07.11elapsed 66%CPU (0avgtext+0avgdata 23676maxresident)k	0inputs      +5162768outputs (0major+6099minor)pagefaults 0swaps
m-3 : 6681 :  5.46user 4.30system 1:39.38elapsed  9%CPU (0avgtext+0avgdata 24228maxresident)k	8inputs     +12711824outputs (0major+7535minor)pagefaults 0swaps
M-3 : 3304 :  3.01user 1.28system 0:06.45elapsed 66%CPU (0avgtext+0avgdata 23676maxresident)k	0inputs      +5162776outputs (0major+6102minor)pagefaults 0swaps

6681 extracted files (-m switch) == 2 * 3304 (-e or -M switch) + 73 'temp_file_attachment' files

There are still minor uncertainties:

  • what are these 'temp_file_attachment' files (accordig to content they are original (raw, not base64) attachment) ? Can be ignored? 'man outlook.pst' say ...cases where the descoffset in the b5 block is zero, and the index2Offset in the 7c block is zero... - does this apply to this?
  • why does processing with -m switch take 15 - 23x longer than processing with -e/-M switches?

Thanks, Franta

@pabs3
Copy link
Member

pabs3 commented Apr 22, 2023 via email

@pabs3
Copy link
Member

pabs3 commented Aug 6, 2023 via email

pabs3 added a commit that referenced this issue Aug 14, 2023
This fixes a data loss problem in the readpst parallel jobs support.

glibc since 2.29 and musl both seek before closing the file descriptor.
Since the file descriptor is still linked to the file description created
and still used by the parent process, the seek changes the file position
of the file description of the file descriptor of the parent process,
which means the parent process reads data from the wrong location and
soon encounters errors, stops processing the data and misses out on
processing the messages that occur later in the PST file.

Since sharing a file description between parent and child processes is
inherently unsafe and there may be other C libraries that seek even before
fclose(), this is only a workaround and a different design is required.

Workaround-for: #7
@pabs3
Copy link
Member

pabs3 commented Aug 14, 2023

I've pushed a workaround for this bug to git. After some other fixes I'll make a new release and also backport it to other releases. At some point the readpst parallel jobs feature will need to be redesigned, probably the new design will need to use mmap.

pabs3 added a commit to pabs3/libpst that referenced this issue Aug 14, 2023
…wards

glibc since 2.29 and musl both change the file position indicator before
reopening the file in the child process.

Workaround-for: pst-format#7
pabs3 added a commit that referenced this issue Aug 14, 2023
This avoids leaving the temporary file behind in the directory.

Suggested-in: #7 (comment)
@pabs3
Copy link
Member

pabs3 commented Aug 14, 2023

@fhanzlik FYI, added a change to delete those temporary files. Not writing them in the first place should be possible, but will require an incremental API for handling attachments, so I have deferred that.

pabs3 added a commit to pabs3/libpst that referenced this issue Aug 15, 2023
…wards

glibc since 2.29 and musl both change the file position indicator before
reopening the file in the child process.

Workaround-for: pst-format#7
pabs3 added a commit to pabs3/libpst that referenced this issue Aug 20, 2023
…wards

glibc since 2.29 and musl both change the file position indicator before
reopening the file in the child process.

Workaround-for: pst-format#7
pabs3 added a commit to pabs3/libpst that referenced this issue Sep 17, 2023
…wards

glibc since 2.29 and musl both change the file position indicator before
reopening the file in the child process.

Workaround-for: pst-format#7
pabs3 added a commit to pabs3/libpst that referenced this issue Feb 1, 2024
…wards

glibc since 2.29 and musl both change the file position indicator before
reopening the file in the child process.

Workaround-for: pst-format#7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants