Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#220] avoid resource leaking in scanners #236

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

YuriRomanowski
Copy link
Contributor

Description

[#220] Avoid resource leaking in scanners

Problem: Currently we use lazy readFile, and xrefcheck opens all
the files immediately for reading, while file descriptors are
resource that can be finite.

Solution: Replaced lazy readFile with strict version.

To be done: Understand issues with parallelism

Related issue(s)

Fixes #220

✅ Checklist for your Pull Request

Ideally a PR has all of the checkmarks set.

If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).

Related changes (conditional)

  • Tests

    • If I added new functionality, I added tests covering it.
    • If I fixed a bug, I added a regression test to prevent the bug from
      silently reappearing again.
  • Documentation

    • I checked whether I should update the docs and did so if necessary:
  • Public contracts

    • Any modifications of public contracts comply with the Evolution
      of Public Contracts
      policy.
    • I added an entry to the changelog if my changes are visible to the users
      and
    • provided a migration guide for breaking changes if possible

Stylistic guide (mandatory)

✓ Release Checklist

  • I updated the version number in package.yaml.
  • I updated the changelog and moved everything
    under the "Unreleased" section to a new section for this release version.
  • (After merging) I edited the auto-release.
    • Change the tag and title using the format vX.Y.Z.
    • Write a summary of all user-facing changes.
    • Deselect the "This is a pre-release" checkbox at the bottom.
  • (After merging) I updated xrefcheck-action.
  • (After merging) I uploaded the package to hackage.

Problem: Currently we use lazy `readFile`, and xrefcheck opens all
the files immediately for reading, while file descriptors are
resource that can be finite.

Solution: Replaced lazy `readFile` with strict version.
@YuriRomanowski
Copy link
Contributor Author

When I tried to parallelize file parsing, I got some strange issues. I tried to run a program with ~100 md files repository with 8 core available. This is needed time info:

start scan repo; time: 0.00s
end scan repo; time: 0.47s
start verify repo; time: 0.47s
end verify repo; time: 1.88s

And here is the threadscope profile:
изображение

We see that CPU activity is not very high, although there is work to do. Something prevents threads to work normally... This is what can be found in the eventlog for that time:
изображение

There are a lot of foreign calls, although it seems in my code there is no anything that can result in such behaviour. I tried to run our program under the strace utility, and it showed this:
изображение
(Sorry for the russian caption, it means "Resource temporary unavailable")

Googling this returns one Stackoverflow entry, but I didn't understand much there 😄
click
Any ideas why this can occur? Maybe my code is written in a wrong way?

@YuriRomanowski YuriRomanowski changed the title Yuri romanowski/#220 avoid resource leaking in scanners [#220] avoid resource leaking in scanners Dec 12, 2022
@Martoon-00
Copy link
Member

Woow, elaborate investigation 👍

However, what do you think exactly is wrong?

I'll tell my thoughts protecting the idea that everything seems fine, and you show where I'm wrong.

In the threadscope report, to me it looks like what I would expect - low CPU consumption during the scanning stage because it is computation interleaved with I/O (and I/O is not shown on the chart).

It may be reasonable that I/O takes more time than parsing, since the former is reading from the disc, and the latter is reading from memory + some logic written in C that cmark-gfm uses internally and which thus can be fast. (Btw, you are on SSD disk?)

On what you see in the eventlog - by default, foreign calls cannot be interrupted, at that moment runtime system gives up on trying to manage anything until the foreign call is finished. Maybe that's why we see those messages, but I think they are fine, they may just mean that the control is passed to the C logic, but everything is fine with threads (I'm not sure here though, may be worth googling particularly the meaning of that eventlog message).

On strace output, as far as I understand EAGAIN in our case likely means "pipe is yet empty, read syscall cannot return anything, please try again later", and this matches the explanation above that CPU is ready to read input at a more intense rate than the device can provide.

If my understanding is correct, most of the witnessed things will be still there even if we try parsing with no parallelization as before.


If you had some intuition that told you "something is wrong here" or "things are too slow" that pushed you onto this investigation, could you share it too?

@YuriRomanowski
Copy link
Contributor Author

YuriRomanowski commented Dec 12, 2022

In the threadscope report, to me it looks like what I would expect - low CPU consumption during the scanning stage because it is computation interleaved with I/O (and I/O is not shown on the chart).

Well, I also tried to write code such that all of the files were read at once, and then only pure computations were performed in parallel, but results were the same. I can later show same graphs for this case.

If you had some intuition that told you "something is wrong here" or "things are too slow" that pushed you onto this investigation, could you share it too?

In my mind, when we have a big list of pure tasks, they can be parallelized very effectively, using ~100% load for all available CPUs. But here, although there are some CPUs, total activity is not higher (in average) than just one CPU 100% load (I look at the dashed lines on the activity graph), and this is not really satisfying. When I tried to read all the files first and then parse them in parallel, reading was performed with one thread which took 100% of CPU load, but total activity was lesser for further parsing stage, even for 8 CPUs. You can even see how activity grows up after 0.47 second (start repo verifying). Also, I don't like a lot of errors in a row for a system calls log, it looks similar to active waiting, the thing that good programs usually (in my mind) don't want to use. But GHC runtime is written with highly-skilled guys, so that can indicate that we are using their creature in a wrong way. So I believe that there's a problem we could solve, although I don't quite understand what it actually is.

Well, I also forgot that our raw md parsing is performed by the C library... Maybe that can influence this, too.

@Martoon-00
Copy link
Member

Also, I don't like a lot of errors in a row for a system calls log, it looks similar to active waiting

Yeah, that's true. Also not sure why it is done this way.

Well, I also tried to write code such that all of the files were read at once, and then only pure computations were performed in parallel

Oh that's a very good setup for testing, then we have to investigate what happens there. I'll try to think about it, if you have more info to share - it is welcome.

@aeqz
Copy link
Contributor

aeqz commented Dec 13, 2022

I'm not sure if this will reveal anything, but could you try the following in order to check if the parallel performance looks more like what you expected?

Derive NFData for ScanResult and then, in the scanRepo function, replace

return . ScanResult errs $ RepoInfo
  { riFiles = M.fromList $ processedFiles <> notProcessedFiles
  , riDirectories = M.fromList
      $ map (, TrackedDirectory) trackedDirs
      <> map (, UntrackedDirectory) untrackedDirs
  , riRoot
  }

by

let result = ScanResult errs $ RepoInfo
      { riFiles = M.fromList $ processedFiles <> notProcessedFiles
      , riDirectories = M.fromList
                        $ map (, TrackedDirectory) trackedDirs
                        <> map (, UntrackedDirectory) untrackedDirs
      }

evaluateNF result

@@ -118,7 +119,8 @@ data ScanErrorDescription
| FileErr
| ParagraphErr Text
| UnrecognisedErr Text
deriving stock (Show, Eq)
deriving stock (Show, Eq, Generic)
deriving anyclass (NFData)
Copy link
Contributor

@aeqz aeqz Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can derive NFData as in Xrefcheck.Core module if you prefer to follow that style:

instance NFData Position
instance NFData Reference
instance NFData AnchorType
instance NFData Anchor
instance NFData FileInfo
instance NFData RepoInfo

@YuriRomanowski
Copy link
Contributor Author

I tried to force evaluation of CMarkGFM part at first, and it looks like this. Here the highlighted area corresponds to parsing using CMarkGFM, and then sparks are created to do the work in pure Haskell. And in the CMarkGFM area there are a lot of foreign calls. It seems that these are just calls to C library :) It's a good question whether we can parallelize foreign calls, in some circumstances we should be able to. Also we can stop here and just run parallel computations interleaved with foreign calls. Anyway, we do get some speedup, but not a big one.
изображение

@Martoon-00
Copy link
Member

Interesting. I guess foreign calls should be parallelizable...

So from the recent picture we can tell, that if you don't use sparks for commonmarkToNode but use them for all the following parsing, then we see the expectedly high usage of CPU cores for that "following parsing".

I'd really like to see the threadscope report from this case:

Well, I also tried to write code such that all of the files were read at once, and then only pure computations were performed in parallel, but results were the same. I can later show same graphs for this case.

@Martoon-00
Copy link
Member

Although 0.07 seconds that this unparallelized call to cmark-gfm currently takes is really not that much, I agree we could stop optimizing here if we get stuck at it.

@YuriRomanowski
Copy link
Contributor Author

Although 0.07 seconds that this unparallelized call to cmark-gfm currently takes is really not that much, I agree we could stop optimizing here if we get stuck at it.

I'm sorry for confusing you, but the highlighted period is not the full period of CMark parsing, just an example.

Well, I also tried to write code such that all of the files were read at once, and then only pure computations were performed in parallel, but results were the same. I can later show same graphs for this case.

Yep, later I'll show more accurate examples for different cases.

@Martoon-00
Copy link
Member

but the highlighted period is not the full period of CMark parsing, just an example

Oh I see, okay. Then we should try hard I suppose 😸


That's just a wild guess, but as one of the measures I would try replacing sparks with IO threading, maybe sparks really have problems with running foreign calls in parallel.

Mm, probably to have this organized, I'd also suggest preserving all your attempts in separate branches, maybe call them yuri/#220-experiment-xxx. And load your reports there as separate commits too. This way someone else can play with the approaches you tried.

@Martoon-00
Copy link
Member

On the current state of this PR:

  • Yet need to understand whether can we get higher CPU consumption.
  • In Avoid resource leaking in scanners #220 I'm asking to fully document the behaviour of ScanAction. If I write my own scanner, what should I do to make on strict on reading and then initialize parsing in parallel with parsing other files? What should I do to read + parse on the fly?

@YuriRomanowski I'm not sure if you manage to work on this PR during the remaining time, but that's fine, we can transfer this PR to someone.

In case you manage to allocate time on this, I suggest asking in Slack in Haskell channel, maybe someone will manage to explain what we observe. If not, let's suggest that we did quite a lot to make it performant and stop on what we have gained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid resource leaking in scanners
3 participants