Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed tests: php_linux_dbg_native:ChannelTest.php #12886

Closed
ZhouyihaiDing opened this issue Oct 7, 2017 · 20 comments · Fixed by #13274
Closed

failed tests: php_linux_dbg_native:ChannelTest.php #12886

ZhouyihaiDing opened this issue Oct 7, 2017 · 20 comments · Fixed by #13274
Assignees
Labels
disposition/BUILDNURSE For all buildnurse related build/test failures and flakes lang/php priority/P0

Comments

@ZhouyihaiDing
Copy link
Contributor

ZhouyihaiDing commented Oct 7, 2017

From PR #12760

rc/core/lib/iomgr/load_file.cc","file_line":69,"filename":"/usr/local/share/grpc/roots.pem","referenced_errors":[{"created":"@1507332216.575520140","description":"OS Error","errno":2,"file":"src/core/lib/iomgr/load_file.cc","file_line":43,"os_error":"No such file or directory","syscall":"fopen"}]}
E1006 23:23:36.575568738    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.575573834    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.575578636    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
E1006 23:23:36.575585617    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.575588945    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.575592630    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
.
Starting test 'ChannelTest::testPersistentChannelDifferentChannelCredentials'.
E1006 23:23:36.576412586    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.576427696    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.576432043    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
E1006 23:23:36.576438316    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.576441855    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.576445341    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
.
Starting test 'ChannelTest::testPersistentChannelSameChannelCredentialsRootCerts'.
.
Starting test 'ChannelTest::testPersistentChannelDifferentSecureChannelCredentials'.
E1006 23:23:36.578347309    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.578362018    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.578366693    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
E1006 23:23:36.578372529    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.578375853    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.578379330    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
.
Starting test 'ChannelTest::testPersistentChannelSharedChannelClose'.
.
Starting test 'ChannelTest::testPersistentChannelCreateAfterClose'.
.
Starting test 'ChannelTest::testPersistentChannelSharedMoreThanTwo'.
.
Starting test 'ChannelTest::testPersistentChannelWithCallCredentials'.
E1006 23:23:36.580197566    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.580247050    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.580251591    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
E1006 23:23:36.580257610    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.580261141    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.580264582    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
.
Starting test 'ChannelTest::testPersistentChannelWithDifferentCallCredentials'.
E1006 23:23:36.580971747    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.580987494    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.580992300    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
E1006 23:23:36.580998614    8814 security_connector.cc:830]  Could not get default pem root certs.
E1006 23:23:36.581002114    8814 secure_channel_create.cc:113] Failed to create secure subchannel for secure name 'localhost:1'
E1006 23:23:36.581005655    8814 secure_channel_create.cc:145] Failed to create subchannel arguments during subchannel creation.
...
There was 1 failure:

1) ChannelTest::testWatchConnectivityState
Failed asserting that false is true.

/var/local/git/grpc/src/php/tests/unit_tests/ChannelTest.php:86

FAILURES!
Tests: 113, Assertions: 272, Failures: 1.
@jtattermusch
Copy link
Contributor

This seems to be broken 100% of cases.
@ZhouyihaiDing, can you please help finding the PR that caused the failure?

@jtattermusch
Copy link
Contributor

@ctiller Looks like this was caused by #12677 (which also caused a few other major breakages)

(started at build https://kokoro.corp.google.com/job/grpc/job/ubuntu/job/master/job/grpc_basictests_multilang/408/)

@jtattermusch
Copy link
Contributor

I see a similar issue for Node on mac also introduced by #12677:
https://sponge.corp.google.com/target?id=076c79a7-7e54-4fb6-91a0-a40deac3019d&target=github/grpc/node_macos_opt_native&searchFor=&show=FAILED&sortBy=STATUS

  channel constructor
       should require a string for the first argument: 0ms
       should require a credential for the second argument: 1ms
       should accept an object for the third argument: 0ms
       should only accept objects with string or int values: 0ms
       should succeed without the new keyword: 0ms

  Suite duration: 0.001 s, Tests: 5

  channel close
       should succeed silently: 0ms
       should be idempotent: 0ms

  Suite duration: 0 s, Tests: 2

  channel getTarget
       should return a string: 0ms

  Suite duration: 0 s, Tests: 1

  channel getConnectivityState
       should return IDLE for a new channel: 0ms

  Suite duration: 0 s, Tests: 1

  channel watchConnectivityState
       should time out if called alone: 1002ms
       should complete if a connection attempt is forced: 4ms
E1009 17:38:23.004352000 140735293353984 resource_quota.cc:779]        assertion failed: old >= amount
tools/run_tests/helper_scripts/run_node.sh: line 48: 46165 Abort trap: 6           JUNIT_REPORT_PATH=src/node/report.xml JUNIT_REPORT_STACK=1 ./node_modules/.bin/mocha --timeout $timeout --reporter mocha-jenkins-reporter $test_directory
real         2.25

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented Oct 25, 2017

I get log like this:
Starting test 'ChannelTest::testWatchConnectivityState'.

I1025 09:30:15.876514065  121933 channel_create.cc:88]       grpc_insecure_channel_create(target=localhost:0, args=0x7ffd4d948170, reserved=(nil))
D1025 09:30:15.876552319  121933 combiner.cc:87]             C:0x3cd7330 create
I1025 09:30:15.876592417  121933 channel_connectivity.cc:217] grpc_channel_watch_connectivity_state(channel=0x3cd7130, last_observed_state=1, deadline=gpr_timespec { tv_sec: 0, tv_nsec: 1000000, clock_type: 3 }, cq=0x3077ff0, tag=(nil))
D1025 09:30:15.876600768  121933 combiner.cc:165]            C:0x3cd7330 grpc_combiner_execute c=0x3cd65c0 last=1
D1025 09:30:15.876606045  121933 combiner.cc:228]            C:0x3cd7330 grpc_combiner_continue_exec_ctx contended=0 exec_ctx_ready_to_finish=1 time_to_execute_final_list=0
D1025 09:30:15.876610043  121933 combiner.cc:246]            C:0x3cd7330 maybe_finish_one n=0x3cd65c0
D1025 09:30:15.876615311  121933 timer_generic.cc:335]       TIMER 0x3cd7968: SET 1 now 206 call 0x3cd7918[0x7fc563fc0070]
D1025 09:30:15.876621223  121933 connectivity_state.cc:121]  CONWATCH: 0x3cd7280 client_channel: from CONNECTING [cur=IDLE] notify=0x3cd65c0
D1025 09:30:15.876626710  121933 combiner.cc:165]            C:0x3cd7330 grpc_combiner_execute c=0x3cd65c0 last=3
D1025 09:30:15.876648750  121933 combiner.cc:291]            C:0x3cd7330 finish old_state=5
D1025 09:30:15.876652719  121933 combiner.cc:165]            C:0x3cd7330 grpc_combiner_execute c=0x3cd7080 last=3
D1025 09:30:15.876658216  121933 combiner.cc:228]            C:0x3cd7330 grpc_combiner_continue_exec_ctx contended=0 exec_ctx_ready_to_finish=1 time_to_execute_final_list=0
D1025 09:30:15.876662064  121933 combiner.cc:246]            C:0x3cd7330 maybe_finish_one n=0x3cd65c0
D1025 09:30:15.876667304  121933 timer_generic.cc:422]       TIMER 0x3cd7968: CANCEL pending=false
I1025 09:30:15.876680132  121933 completion_queue.cc:698]    cq_end_op_for_pluck(exec_ctx=0x7ffd4d948220, cq=0x3077ff0, tag=(nil), error={"created":"@1508949015.876656907","description":"Timed out waiting for connection state change","file":"src/core/ext/filters/client_channel/channel_connectivity.cc","file_line":133}, done=0x7fc563fc00d0, done_arg=0x3cd78c0, storage=0x3cd79a0)
E1025 09:30:15.876685852  121933 completion_queue.cc:701]    Operation failed: tag=(nil), error={"created":"@1508949015.876656907","description":"Timed out waiting for connection state change","file":"src/core/ext/filters/client_channel/channel_connectivity.cc","file_line":133}
E1025 09:30:15.876691655  121933 ev_epoll1_linux.cc:1018]    PS:0x30780f0 KICK:(nil) curps=(nil) curworker=(nil) root=(nil)
E1025 09:30:15.876695466  121933 ev_epoll1_linux.cc:1029]     .. kicked_without_poller
D1025 09:30:15.876699677  121933 combiner.cc:291]            C:0x3cd7330 finish old_state=5
D1025 09:30:15.876703459  121933 combiner.cc:228]            C:0x3cd7330 grpc_combiner_continue_exec_ctx contended=0 exec_ctx_ready_to_finish=1 time_to_execute_final_list=0
D1025 09:30:15.876707227  121933 combiner.cc:246]            C:0x3cd7330 maybe_finish_one n=0x3cd7080
D1025 09:30:15.876711206  121933 combiner.cc:291]            C:0x3cd7330 finish old_state=3
I1025 09:30:15.876715831  121933 completion_queue.cc:1065]   grpc_completion_queue_pluck(cq=0x3077ff0, tag=(nil), deadline=gpr_timespec { tv_sec: 9223372036854775807, tv_nsec: 0, clock_type: 1 }, reserved=(nil))
I1025 09:30:15.876723073  121933 completion_queue.cc:1157]   RETURN_EVENT[0x3077ff0]: OP_COMPLETE: tag:(nil) ERROR

For watch connectivity state test,
grpc_channel_watch_connectivity_state find that CONWATCH: 0x3cd7280 client_channel: from CONNECTING [cur=IDLE] notify=0x3cd65c0.
The event get from cq_end_op_for_pluck is event.type=OP_COMPLETE while event.success=false.
Is event.success=false supposed to be the correct output here? If so,this php test Assert(event.success, true) need to be changed.

@jtattermusch
Copy link
Contributor

  • This has been failing 100% of runs on both linux and mac.
  • The issue is 3 weeks old and we haven't made much progress, what can we do to address this as soon as possible?

Increasing priority to P0.
CC @dgquintas, @ctiller who can take a look into this? @ZhouyihaiDing would probably use some help.

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented Oct 30, 2017

I think this is caused by #12677 .
Before this PR is 12769 and this test works.

@ctiller
Copy link
Member

ctiller commented Oct 30, 2017

Can I get some steps on how to setup a VM and build PHP and run the tests in such a way that I can get gdb to break within C core?

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented Oct 30, 2017

To install php, sudo apt-get install php7.0 php7.0-dev php-pear phpunit
To run the test,

git reset xxxxx
make -j8
sudo make install
cd src/php/ext/grpc
phpize
./configure
make
sudo make install
cat "extension = grpc.so \n" >> /etc/php/7.0/cli/php.ini
cat "extension = grpc.so \n" >> /etc/php/7.0/apache/php.ini
php -d extension=grpc.so -d max_execution_time=300 /usr/local/bin/phpunit -v --debug src/php/tests/unit_tests --filter ChannelTest::testWatchConnectivityState

@jtattermusch
Copy link
Contributor

it was posted before, but a similar issue is happening for Node too:
https://sponge.corp.google.com/target?id=d60fbd71-e36d-4b01-a855-95cab2197550&target=github/grpc/grpc-node_macos_dbg_native&searchFor=&show=FAILED&sortBy=STATUS

 channel watchConnectivityState
    [32m  [0m[90m should time out if called alone: [0m[31m1001ms[0m
    [32m  [0m[90m should complete if a connection attempt is forced: [0m[90m3ms[0m

    stderr: E1030 09:30:12.111835000 140735186743296 resource_quota.cc:779]        assertion failed: old >= amount

    failed: true
    signal: SIGABRT

@ZhouyihaiDing
Copy link
Contributor Author

The php test is:

$this->channel = new Grpc\Channel('localhost:0',
             ['credentials' => Grpc\ChannelCredentials::createInsecure()]);
        $time = new Grpc\Timeval(1000);
        $state = $this->channel->watchConnectivityState(1, $time);
        $this->assertTrue($state);

where inside it, the code related to c

completion_queue = grpc_completion_queue_create_for_pluck(NULL);
grpc_channel_watch_connectivity_state(channel->wrapper->wrapped,
                                        (grpc_connectivity_state)last_state,
                                        deadline->wrapped, completion_queue,
                                        NULL);
  grpc_event event =
      grpc_completion_queue_pluck(completion_queue, NULL,
                                  gpr_inf_future(GPR_CLOCK_REALTIME), NULL);
  RETURN_BOOL(event.success);

Log is printed in previews comment.

@ctiller
Copy link
Member

ctiller commented Nov 1, 2017 via email

@ctiller
Copy link
Member

ctiller commented Nov 1, 2017

It's probably better to post comments inline in the comment thread so they can be seen.

I'm still having problems:

ctiller@ctiller-debug ~/grpc> sudo apt-get install php7.0 php7.0-dev php-pear phpunit
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'php7.0-remctl' for regex 'php7.0'
Note, selecting 'php7.0-common' for regex 'php7.0'
Note, selecting 'php-remctl' instead of 'php7.0-remctl'
E: Unable to locate package php7.0-dev
E: Couldn't find any package by glob 'php7.0-dev'
E: Couldn't find any package by regex 'php7.0-dev'
ctiller@ctiller-debug ~/grpc> 

@ZhouyihaiDing
Copy link
Contributor Author

Sorry, I'll update the guides for php in grpc.io. It install php7 by default now.
You can try
sudo apt-get install -y php php-dev phpunit php-pear unzip zlib1g-dev
php command only build php extension using grpc. I think you need to build grpc C core first.

@jtattermusch
Copy link
Contributor

Btw, I took a look at the PHP test and the source codes and it looks like in this particular case it should be pretty easy to come up with a C-only repro:

This is the PHP test:

public function testWatchConnectivityState()

This is the implementation of Channel.watchConnectivityState (written in C actually)
PHP_METHOD(Channel, watchConnectivityState) {

@ZhouyihaiDing, to speed up the fixing process, can you come up with a C-only repro?

@jtattermusch
Copy link
Contributor

@ZhouyihaiDing actually, looking at what testWatchConnectivityState() is doing, it looks like it's not correct (I might be missing something):

  1. it creates a channel (which will start with GRPC_CONNECTIVITY_IDLE
  2. it wait a second for state change from GRPC_CONNECTIVITY_CONNECTING to something else - which should never happen as the channel is lazy and won't start connecting until you start a call on it. Also, why would last_state be "GRPC_CONNECTIVITY_CONNECTING" (we never observed that state before).
    So overall, is the test testing something meaningful?

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented Nov 6, 2017

@jtattermusch To my understanding, it just want to test whether it works or not.
The test pretends the former states changes from 1(GRPC_CONNECTIVITY_CONNECTING) to the current states 0(GRPC_CONNECTIVITY_IDLE).

grpc_channel_watch_connectivity_state(channel->wrapper->wrapped,
                                        (grpc_connectivity_state)last_state,
                                        deadline->wrapped, completion_queue,
                                        NULL);

When last_state = 1 but current states = 0,

grpc_event event =
      grpc_completion_queue_pluck(completion_queue, NULL,
                                  gpr_inf_future(GPR_CLOCK_REALTIME), NULL);

event.success = true before but now event.success = false.
I am not sure about this part. When the channel states changes from 1 to 0, is it supposed to be event false now? If the php use C-core API in a wrong way, I think it happens to the 2 functions above. Please give me some suggestions where it use wrong.

From the log I pastes beofre, the C-core API does know that the status changed from Connecting to Idle:

I1025 09:30:15.876592417  121933 channel_connectivity.cc:217] grpc_channel_watch_connectivity_state(channel=0x3cd7130, last_observed_state=1, deadline=gpr_timespec { tv_sec: 0, tv_nsec: 1000000, clock_type: 3 }, cq=0x3077ff0, tag=(nil))
D1025 09:30:15.876600768  121933 combiner.cc:165]            C:0x3cd7330 grpc_combiner_execute c=0x3cd65c0 last=1
D1025 09:30:15.876606045  121933 combiner.cc:228]            C:0x3cd7330 grpc_combiner_continue_exec_ctx contended=0 exec_ctx_ready_to_finish=1 time_to_execute_final_list=0
D1025 09:30:15.876610043  121933 combiner.cc:246]            C:0x3cd7330 maybe_finish_one n=0x3cd65c0
D1025 09:30:15.876615311  121933 timer_generic.cc:335]       TIMER 0x3cd7968: SET 1 now 206 call 0x3cd7918[0x7fc563fc0070]
D1025 09:30:15.876621223  121933 connectivity_state.cc:121]  CONWATCH: 0x3cd7280 client_channel: from CONNECTING [cur=IDLE] notify=0x3cd65c0

@jtattermusch
Copy link
Contributor

jtattermusch commented Nov 6, 2017

@ZhouyihaiDing I think I get what's happening - looks like the test sets a very short timeout of 1ms and that doesn't play well with recent C core's timing changes (C core timing code now has resolution of 1ms).

@ZhouyihaiDing
Copy link
Contributor Author

Set it to around 250ms can the test can pass in my PC without flaky.
$microseconds = 250000;
gpr_timespec time = gpr_time_from_micros(microseconds, GPR_TIMESPAN);

@jtattermusch
Copy link
Contributor

I think this magic number could have to do with the fact that passing GPR_TIMESPAN as a "timeout" to C-core might be currently broken (according to @ctiller).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
disposition/BUILDNURSE For all buildnurse related build/test failures and flakes lang/php priority/P0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants