Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When searching for co-workers in the origin server cluster, occasional inability to find the origin server nodes occurs. #1520

Closed
limjoe opened this issue Dec 17, 2019 · 2 comments
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Milestone

Comments

@limjoe
Copy link

limjoe commented Dec 17, 2019

Description
When compiling and installing version 3.0 alpha4 (3.0.71), there are occasional log reports of a connect error with code=3090. If this error continues to occur consecutively, it will result in the inability to stream.

Environment

  1. Operating System: Ubuntu 16.04
  2. SRS Version: 3.0 alpha4 (3.0.71)
  3. Source Server A (192.100.20.20) Configuration File:
listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;

http_server {
    enabled         on;
    listen          18080;
    dir             ./objs/nginx/html;
}
http_api {
    enabled         on;
    listen          1985;
    crossdomain     on;
}

vhost push-pek-test.xxx.com {
    min_latency     on;
    tcp_nodelay     on;

    publish {
        mr off;
    }

    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       192.100.20.9:1985;
    }

    hls {
        enabled         on;
        hls_fragment    6;
        hls_window      30;
        hls_path        ./objs/nginx/html;
        hls_m3u8_file   [app]/[stream].m3u8;
        hls_ts_file     [app]/[stream]/[timestamp].ts;
        hls_cleanup     on;
        hls_nb_notify   64;
        hls_wait_keyframe       on;

    }

    http_hooks {
        enabled         on;       
        on_hls          http://127.0.0.1:8086/v1/hls;
    }
}
  1. Source Server B (192.100.20.9) Configuration File:
listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;

http_server {
    enabled         on;
    listen          18080;
    dir             ./objs/nginx/html;
}

http_api {    
    enabled         on;
    listen          1985;
    crossdomain     on;
}
vhost push-pek-test.xxx.com {

    min_latency     on;
    tcp_nodelay     on;

    publish {
        mr off;
    }

    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       192.100.20.20:1985;
    }

    hls {
        enabled         on;
        hls_fragment    6;
        hls_window      30;
        hls_path        ./objs/nginx/html;
        hls_m3u8_file   [app]/[stream].m3u8;
        hls_ts_file     [app]/[stream]/[timestamp].ts;
        hls_cleanup     on;
        hls_nb_notify   64;
        hls_wait_keyframe       on;

    }
   http_hooks {
        enabled         on;       
        on_hls          http://127.0.0.1:8086/v1/hls;
    }
}

  1. The log of SRS is as follows:
[2019-12-17 17:56:19.580][Error][29866][3627][11] connect error code=3090 : service cycle : rtmp: stream service 
: discover coworkers, url=http://192.100.20.9:1985/api/v1/clusters?vhost=push-pek-test.xxx.com&ip=push-pek-test.xxx.com&app=live&stream=IlYJs0kpFw7B&coworker=192.100.20.9:1985 
: parse data {"code":0,"data":{"query":{"ip":"push-pek-test.xxx.com","vhost":"push-pek-test.xxx.com","app":"live","stream":"IlYJs0kpFw7B"},"origin":null}}
thread [3627]: do_cycle() [src/app/srs_app_rtmp_conn.cpp:210][errno=11]
thread [3627]: service_cycle() [src/app/srs_app_rtmp_conn.cpp:400][errno=11]
thread [3627]: playing() [src/app/srs_app_rtmp_conn.cpp:616][errno=11]
thread [3627]: discover_co_workers() [src/app/srs_app_http_hooks.cpp:453][errno=11](Resource temporarily unavailable)
  1. Edge configuration file
listen              1936;
pid                 ./objs/srs.1936.pid
max_connections     1000;
daemon              off;
srs_log_tank        console;

http_server {
    enabled         on;
    listen          18081;
    dir             ./objs/nginx/html;
}

http_api {
    enabled         on;
    listen          1986;
    crossdomain     on;
}

vhost play-pek-test.xxx.com {
    cluster {
        mode        remote;
        origin      192.100.20.9:1935 192.100.20.20:1935;
    }

    tcp_nodelay     on;
    min_latency     on;

    play {
        gop_cache       off;
        queue_length    10;
        mw_latency      100;
    }

    http_remux {
        enabled     on;
        mount       [vhost]/[app]/[stream].flv;
        hstrs       on;
    }
    vhost      push-pek-test.sensoro.com;
}

Reproduction
The steps to reproduce the bug are as follows:

  1. Start SRS and run
./objs/srs -c conf/A.conf
./objs/srs -c conf/B.conf
./objs/srs -c conf/edge.conf
  1. The bug has been reproduced, and the key information is as follows:
Frequent occurrence of [Error][29866][3918][11] connect error code=3090 : service cycle : rtmp: stream service : discover coworkers issue.

Sometimes it is able to successfully output the found logs.

 http: cluster redirect 192.100.20.
9:1935 ok, url=http://192.100.20.9:1985/api/v1/clusters?vhost=push-pek-test.xxx.com&ip=push-pek-test.xxx.com&app=live&stream=hJgeKBMPazyp&coworker=192.1
00.20.9:1985, response={"code":0,"data":{"query":{"ip":"push-pek-test.xxx.co
m","vhost":"push-pek-test.xxx.com","app":"live","stream":"hJgeKBMPazyp"},"or
igin":{"ip":"192.100.20.9","port":1935,"vhost":"push-pek-test.xxx.com","api"
:"192.100.20.9:1985","routers":["192.100.20.9:1985"]}}}

Expected Behavior
It is expected that the source nodes can find each other, stream normally, and the latency will not be affected by the number of source nodes in the polling process.

TRANS_BY_GPT3

@limjoe
Copy link
Author

limjoe commented Dec 17, 2019

Currently, this error is being reported continuously. I have deployed 2 sets, and one of the environments in the source station cluster does not report this error.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Dec 19, 2019

This problem may only occur when there are more than 2 servers in the source station cluster, for example:

  • origin serverA: 19350/9090, configure coworker as serverB/9091 and serverC/9092.
  • origin serverB: 19351/9091, configure coworker as serverA/9090 and serverC/9092.
  • origin serverC: 19352/9092, configure coworker as serverA/9090 and serverB/9091.

The configuration file has added a third origin server configuration, origin.cluster.serverC.conf, which can be used to reproduce this issue.

Start an edge server:

  • Edge server: 1935, origin server serverB/19351/9091.

Reproduction steps:

  1. Push the stream to serverC/19352 and play the stream on the edge. The edge will fetch the stream from serverB/9091 as the origin.
  2. ServerB will first ask serverA/9090 if there is a stream, and at this point, it returns an origin: null error.
  3. To reproduce, you can debug serverB to identify this issue.

When ServerB is in SrsRtmpConn::playing, which means it is fetching the stream from the origin (edge), it will first ask ServerA if it has the stream. This is because ServerB does not have the stream itself.

http://127.0.0.1:9090/api/v1/clusters?vhost=__defaultVhost__&ip=127.0.0.1
&app=live&stream=livestream&coworker=127.0.0.1:9090

After finding that there is no stream, it directly returns an error. However, if it is origin:null, which clearly indicates that there is no stream, it should continue to ask the next origin server.

TRANS_BY_GPT3

@winlinvip winlinvip self-assigned this Sep 6, 2021
@winlinvip winlinvip changed the title 源站集群查找 co-workers 时偶发性的查找不到源站节点 When searching for co-workers in the origin server cluster, occasional inability to find the origin server nodes occurs. Jul 29, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

2 participants