Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async/await multi curl stuck #7485

Closed
wake-up-neo opened this issue Nov 10, 2016 · 5 comments
Closed

async/await multi curl stuck #7485

wake-up-neo opened this issue Nov 10, 2016 · 5 comments

Comments

@wake-up-neo
Copy link
Contributor

wake-up-neo commented Nov 10, 2016

Here we go again:)

HHVM Version

HipHop VM 3.15.0 (rel)
Compiler: tags/HHVM-3.15.0-0-g92a682ebaa3c85b84857852d8621f528607fe27d
Repo schema: 225d4323575bbc8a498dc809a1c41354f6bca830
OS: Debian 8.3

Standalone code, or other way to reproduce the problem

<?hh

$gate = new TestGate_Example();
$iterations = 0;

while (true) {

    $urls = Vector{};
    $iterations++;
    echo 'starting iteration '.$iterations.'...';
    for ($i= 0; $i<100; $i++) {
        $urls->add('http://hhvm.com/?test='.$i);
    }
    $gate->call($urls);
    echo 'complete!' . PHP_EOL;

}

class TestGate_Example {

    const CRITICAL_TIMEOUT = 60;

	public function call(Vector<string> $urls):void {
		$awaitable_tasks = Vector{};
		foreach ($urls as $url) {
			$awaitable_tasks->add($url);
		}
		$call_data = $this->callAsync($awaitable_tasks)->getWaitHandle()->join();
	}

	private async function callAsync(Vector<string> $urls):Awaitable<Vector<string>> {
		$awaitable_calls = Vector{};
		foreach ($urls as $url) {
			$awaitable_calls->add($this->createCall($url));
		}
		return await \HH\Asio\v($awaitable_calls);
	}

	private async function createCall(string $url):Awaitable<string> {
		return await $this->curl_multi_exec($url);;
	}

	private async function curl_multi_exec(string $url):Awaitable<string> {
		echo '>';
		$active = -1;
		$raw_data = null;
		$master = curl_multi_init();
		$ch = curl_init($url);
		curl_setopt_array($ch, $this->createOptions());
		curl_multi_add_handle($master, $ch);
		$t = microtime(true);
		do $status = curl_multi_exec($master, $active);
		while ($status === CURLM_CALL_MULTI_PERFORM);
		while ($active && $status === CURLM_OK) {
			$sel = await curl_multi_await($master);
			if ($sel === -1) await \HH\Asio\usleep(256);
			do {
				$status = curl_multi_exec($master, $active);
				$read = curl_multi_info_read($master);
				if ($read) $raw_data = curl_multi_getcontent($read['handle']);
				if ($this->timedOut($t)) break;
			} while ($status === CURLM_CALL_MULTI_PERFORM);
			if ($this->timedOut($t)) break;
		}
		curl_multi_remove_handle($master, $ch);
		curl_close($ch);
		curl_multi_close($master);
		echo '<';
		return $raw_data ?? '';
	}

	private function timedOut(num $t):bool {
		return (self::CRITICAL_TIMEOUT < (microtime(true) - $t));
	}

	private function createOptions():array<int,mixed> {
       		return [
			\CURLOPT_RETURNTRANSFER => true,
			\CURLOPT_CONNECTTIMEOUT => 3,
			\CURLOPT_TIMEOUT => 60,
			\CURLOPT_MAXREDIRS => 3,
			\CURLOPT_FOLLOWLOCATION => true
		];
	}

}

Expected result

hhvm test_core_dump.hh
starting iteration 1...>>>>>>>>>>>>>>>>>>>>...<<<<<<<<<<<<<<<<<<<<complete!
starting iteration 2...>>>>>>>>>>>>>>>>>>>>...<<<<<<<<<<<<<<<<<<<<complete!
starting iteration 3...>>>>>>>>>>>>>>>>>>>>...<<<<<<<<<<<<<<<<<<<<complete!

...infinite...

Actual result

hhvm test_core_dump.hh
starting iteration 1...>>>>>>>>>>>>>>>>>>>>...<<<<<<<<<<<<<<<<<<<<complete!
starting iteration 2...>>>>>>>>>>>>>>>>>>>>...<<<<<<<<<<<<<<<<<<<<complete!
starting iteration 3...>>>>>>>>>>>>>>>>>>>>...<<<<<<

Then just stuck.

strace
futex(0x7ffd8689ac74, FUTEX_WAIT_PRIVATE, 1, NULL

Also I did tests on another (old) hhvm version on MacOSX:
HipHop VM 3.11.0 (rel)
Compiler: 1455186033_N
Repo schema: 4d3d19cb46fb2651317d6269a2f601f2d8e29601

And there I have core dumped when it stuck:

[err] evmap.c:641: Assertion change->fd == fd failed in event_changelist_get_or_construct
Core dumped: Abort trap: 6
Stack trace in /tmp/stacktrace.62238.log
Abort trap: 6

cat /tmp/stacktrace.62238.log

PHP Stacktrace:

#0 HH\WaitHandle->join() called at [/Users/.../test_core_dump.hh:28]
#1 TestGate_Example->call(Vector) called at [/Users/.../test_core_dump.hh:14]

On a different servers with a different hardware scope it needs a different number of multi urls to catch this ($i in for cycle). The moment when it happens is very unstable, it could be iteration 2, or iteration 80, but you will definitely get it.

On the very slow urls (when response goes after 10-50 seconds, what is very common for some external APIs) this "stuck" will happens earlier, after several iterations.

Now, the questions is - what is wrong with this code?

p.s. And why this implementation is so strange you may ask - because in the original code there are also lambda callbacks in createCall function, some cache levels and additional logic. But they are not the reason of this issue because it happens without all this stuff.

I would greatly appreciate any suggestions.

@wake-up-neo
Copy link
Contributor Author

wake-up-neo commented Nov 10, 2016

Why public function call is doing nothing then just cycling the same urls you may ask - because I'm trying to publish the code structure very similar to the original one. Here is the original function:

    public function call(Vector<GateTask> $tasks):Map<string,mixed> {
        $awaitable_tasks = Vector{};
        foreach ($tasks as $task) {
            $raw_data = $this->getCachedContent($task->hash());
            if ($raw_data !== null) {
                $this->callback_cache->set($task->contentAssocKey(), $this->applyCallback($task, $raw_data));
            } else {
                $awaitable_tasks->add($task);
            }
        }
        $call_data = $this->callAsync($awaitable_tasks)->getWaitHandle()->join();
        return $this->callback_cache;
    }

@wake-up-neo
Copy link
Contributor Author

Well, I'm trying to use the original code from the docs :D https://docs.hhvm.com/hack/reference/function/curl_multi_await/

just modified run function

<?hh

async function get_curl_content(Set<string> $urls): Awaitable<Vector<string>> {

  $chs = Vector {};
  foreach ($urls as $url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $chs[] = $ch;
  }

  $mh = curl_multi_init();
  foreach ($chs as $ch) {
    curl_multi_add_handle($mh, $ch);
  }

  $active = -1;
  do {
    $ret = curl_multi_exec($mh, $active);
  } while ($ret == CURLM_CALL_MULTI_PERFORM);

  while ($active && $ret == CURLM_OK) {
    $select = await curl_multi_await($mh);
    if ($select === -1) {
      // https://bugs.php.net/bug.php?id=61141
      await \HH\Asio\usleep(100);
    }
    do {
      $ret = curl_multi_exec($mh, $active);
    } while ($ret == CURLM_CALL_MULTI_PERFORM);
  }

  $content = Vector {};

  foreach ($chs as $ch) {
    $str = (string) curl_multi_getcontent($ch);
    $content[] = substr($str, 0, 10);
    curl_multi_remove_handle($mh, $ch);
  }

  curl_multi_close($mh);

  return $content;
}

function run(): void {
  $urls = Set {};
  for($i=1;$i<10;$i++) $urls->add('http://example.com/?test='.$i);
  $content = \HH\Asio\join(get_curl_content($urls));
  var_dump($content);
}


while (true) {
    run();
}

The result on Debian with HHVM 3.15 after about 10 cycles

futex(0x7ffca34ef644, FUTEX_WAIT_PRIVATE, 1, NULL

The result on MacOSX with HHVM 3.11

hhvm sandbox/test_core_dump2.hh
object(HH\Vector)#7 (9) {
[0]=>
string(10) "<!doctype "
[1]=>
string(10) "<!doctype "
[2]=>
string(10) "<!doctype "
[3]=>
string(10) "<!doctype "
[4]=>
string(10) "<!doctype "
[5]=>
string(10) "<!doctype "
[6]=>
string(10) "<!doctype "
[7]=>
string(10) "<!doctype "
[8]=>
string(10) "<!doctype "
}
...
object(HH\Vector)#61 (9) {
[0]=>
string(10) "<!doctype "
[1]=>
string(10) "<!doctype "
[2]=>
string(10) "<!doctype "
[3]=>
string(10) "<!doctype "
[4]=>
string(10) "<!doctype "
[5]=>
string(10) "<!doctype "
[6]=>
string(10) "<!doctype "
[7]=>
string(10) "<!doctype "
[8]=>
string(10) "<!doctype "
}
[err] event_queue_remove: 0x219d05e0(fd 8) not on queue 2
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1110 20:33:14.972592 2954469376 Singleton-inl.h:134] Singleton of type N4HPHP13AsioEventBaseE has a living reference at destroyInstances time; beware! Raw pointer is 0x21589300. It is very likely that some other singleton is holding a shared_ptr to it. Make sure dependencies between these singletons are properly defined.
E1110 20:33:14.972760 2954469376 Singleton.cpp:41] Singletons of the following types had living references after destroyInstances was finished:
N4HPHP13AsioEventBaseE
beware! It is very likely that those singleton instances are leaked.

@wake-up-neo
Copy link
Contributor Author

However, if I change

$select = await curl_multi_await($mh);

to

$select = curl_multi_select($mh);

it works infinitely without any issues :
Looks like a race condition in the events scheduler?

@wake-up-neo
Copy link
Contributor Author

Hi again!

just took the code from the post above, which is originally a code from the docs, with

$ hhvm --version
HipHop VM 3.18.0-dev (rel)
Compiler: heads/master-0-g80d3365522af2039e7f2f6cd2a8f9950bc339e70
Repo schema: 7629af4d30aa1b6a1f31922256a9b40a4f6953c4

The results after 25 loops:

[err] evmap.c:641: Assertion change->fd == fd failed in event_changelist_get_or_construct
Core dumped: Abort trap: 6
Stack trace in /tmp/stacktrace.79667.log
Abort trap: 6

$ cat /tmp/stacktrace.79667.log
Host: ....local
ProcessID: 79667
ThreadID: 140735272305408
ThreadPID: -1
Name: /usr/local/bin/hhvm
Type: Abort trap: 6
Runtime: hhvm
Version: heads/master-0-g80d3365522af2039e7f2f6cd2a8f9950bc339e70
DebuggerCount: 0

Arguments: test.hh
ThreadType: CLI

PHP Stacktrace:

#0 get_curl_content()
#1 HH\Asio\join() called at [...test.hh:49]
#2 run() called at [...test.hh:55]

@wake-up-neo
Copy link
Contributor Author

I continue to investigate.
When I comment the lines with curl_multi_getcontent part it cause a new error, which was logically predictive.
Will try to debug directly in hhvm and libevent source next days. Any ideas? Looks like we gonna catch a big fish.

...
< 
* Closing connection 0
< HTTP/1.1 200 OK
< Cache-Control: max-age=604800
< Content-Type: text/html
< Date: Mon, 13 Mar 2017 00:56:02 GMT
< Etag: "359670651+ident"
< Expires: Mon, 20 Mar 2017 00:56:02 GMT
< Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
* Server ECS (iad/182A) is not blacklisted
< Server: ECS (iad/182A)
< Vary: Accept-Encoding
< X-Cache: HIT
< Content-Length: 1270
< 
* Closing connection 1

[err] event_queue_remove: 0x1fa1ad60(fd 12) not on queue 2

fredemmott added a commit to fredemmott/hhvm that referenced this issue Sep 4, 2018
…emlib one"

fixes facebook#8156
refs facebook#7485
refs facebook#8043

The previous stability issues with the native impl were likely not
actually an issue with async curl, but with async more generally with
libevent2.

facebook@93c31ec#diff-e9e6b8e9c00d40c3ec104ff79c2037a7
appears to have fixed the original stability issues (and the issues with
stream_await()); it just looked like `curl_multi_await()` and
`stream_await()` were buggy as:
- FB rarely uses these functions (in particular, for our use case, async
curl is generaly less performant than sync curl - but that depends on
what other load you have, and how long requests take)
- FB uses libevent1
- `curl_multi_await()` and `stream_await()` are among the most useful
external features
... so, an HHVM issue when using libevent2 looks like a curl + stream
async issue

Test Plan: ran test programs from previous curl issues

Crashes or gets stuck (doesn't on master or with this diff):

```Hack
<?hh

async function get_curl_content(Set<string> $urls): Awaitable<Vector<string>> {

  $chs = Vector {};
  foreach ($urls as $url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $chs[] = $ch;
  }

  $mh = curl_multi_init();
  foreach ($chs as $ch) {
    curl_multi_add_handle($mh, $ch);
  }

  $active = -1;
  do {
    $ret = curl_multi_exec($mh, &$active);
  } while ($ret == CURLM_CALL_MULTI_PERFORM);

  while ($active && $ret == CURLM_OK) {
    $select = await curl_multi_await($mh);
    if ($select === -1) {
      // https://bugs.php.net/bug.php?id=61141
      await \HH\Asio\usleep(100);
    }
    do {
      $ret = curl_multi_exec($mh, &$active);
    } while ($ret == CURLM_CALL_MULTI_PERFORM);
  }

  $content = Vector {};

  foreach ($chs as $ch) {
    $str = (string) curl_multi_getcontent($ch);
    $content[] = substr($str, 0, 10);
    curl_multi_remove_handle($mh, $ch);
  }

  curl_multi_close($mh);

  return $content;
}

function run(): void {
  $urls = Set {};
  for($i=1;$i<10;$i++) $urls->add('https://hhvm.com/?test='.$i);
  $content = \HH\Asio\join(get_curl_content($urls));
  var_dump($content);
}

while (true) {
    run();
}
```

Spins 100% CPU (does on master, doesn't with this diff):

```Hack
<?hh

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://httpbin.org/delay/10");

// this line takes 100% cpu for 10 seconds
\HH\Asio\join(\HH\Asio\curl_exec($ch));

// this line *does not* takes 100% cpu for 10 seconds
// curl_exec($ch);

curl_close($ch);
```
hhvm-bot pushed a commit that referenced this issue Sep 12, 2018
…emlib one" (#8313)

Summary:
fixes #8156
refs #7485
refs #8043

The previous stability issues with the native impl were likely not
actually an issue with async curl, but with async more generally with
libevent2.

93c31ec appears to have fixed the original stability issues (and the issues with
stream_await()); it just looked like `curl_multi_await()` and
`stream_await()` were buggy as:
- FB rarely uses these functions (in particular, for our use case, async
curl is generaly less performant than sync curl - but that depends on
what other load you have, and how long requests take)
- FB uses libevent1
- `curl_multi_await()` and `stream_await()` are among the most useful
external features

... so, an HHVM issue when using libevent2 looks like a curl + stream
async issue

Pull Request resolved: #8313

Reviewed By: jano

Differential Revision: D9648538

Pulled By: fredemmott

fbshipit-source-id: 4c1b7bb9a77fbcdff9c28211c2f22f1b4ead5930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant