Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talk crashes the entire instance when doing public meetings. #2010

Closed
Windyo opened this issue Jul 17, 2019 · 40 comments
Closed

Talk crashes the entire instance when doing public meetings. #2010

Windyo opened this issue Jul 17, 2019 · 40 comments
Labels
Milestone

Comments

@Windyo
Copy link

Windyo commented Jul 17, 2019

Steps to reproduce

  1. install Talk
  2. start a new Public Meeting
  3. Share Link
  4. Entire instance slows down then crashes

Expected behaviour

Calls happen

Actual behaviour

Entire instance becomes ultra-slow on any operation.
Restarting App container gets the performance back to normal.
Netdata does not report any high CPU usage, iowait, or network issue. The intance just borks and ends up throwing a 502 error.

Browser

All

Microphone available: yes
Camera available: yes
Operating system: Windows
Browser name: All

Spreed app

Spreed app version: 6.0.2

Custom TURN server configured: no
Custom STUN server configured: no

Server configuration

Operating system: Ubuntu
Web server: Apache-fpm
Database: MySQL
PHP version: 7.3
Nextcloud Version: 16.0.3

working in a docker-container setup

List of activated apps:

Enabled:

  • accessibility: 1.2.0
  • activity: 2.9.1
  • admin_audit: 1.6.0
  • apporder: 0.7.1
  • audioplayer: 2.7.2
  • bruteforcesettings: 1.4.0
  • cloud_federation_api: 0.2.0
  • comments: 1.6.0
  • dav: 1.9.2
  • federatedfilesharing: 1.6.0
  • federation: 1.6.0
  • files: 1.11.0
  • files_downloadactivity: 1.5.0
  • files_pdfviewer: 1.5.0
  • files_rightclick: 0.13.0
  • files_sharing: 1.8.0
  • files_texteditor: 2.8.0
  • files_trashbin: 1.6.0
  • files_versions: 1.9.0
  • files_videoplayer: 1.5.0
  • firstrunwizard: 2.5.0
  • gallery: 18.3.0
  • groupfolders: 4.0.4
  • impersonate: 1.3.0
  • logreader: 2.1.0
  • lookup_server_connector: 1.4.0
  • mail: 0.15.1
  • nextcloud_announcements: 1.5.0
  • notes: 3.0.0
  • notifications: 2.4.1
  • oauth2: 1.4.2
  • password_policy: 1.6.0
  • privacy: 1.0.0
  • provisioning_api: 1.6.0
  • quota_warning: 1.5.0
  • radio: 0.6.5
  • recommendations: 0.4.0
  • serverinfo: 1.6.0
  • sharebymail: 1.6.0
  • sharerenamer: 2.6.0
  • socialsharing_email: 1.0.5
  • socialsharing_facebook: 1.0.4
  • spreed: 1.0.6
  • survey_client: 1.4.0
  • systemtags: 1.6.0
  • theming: 1.7.0
  • twofactor_backupcodes: 1.5.0
  • twofactor_nextcloud_notification: 1.1.2
  • twofactor_totp: 2.1.2
  • unsplash: 1.1.3
  • updatenotification: 1.6.0
  • user_ldap: 1.6.0
  • viewer: 1.0.0
  • workflowengine: 1.6.0
    Disabled:
  • dropit
  • encryption
  • extract
  • files_ebookreader
  • files_external
  • files_external_gdrive
  • support

server config:

{ "system": { "memcache.local": "\\OC\\Memcache\\APCu", "apps_paths": [ { "path": "\/var\/www\/html\/apps", "url": "\/apps", "writable": false }, { "path": "\/var\/www\/html\/custom_apps", "url": "\/custom_apps", "writable": true } ], "memcache.distributed": "\\OC\\Memcache\\Redis", "memcache.locking": "\\OC\\Memcache\\Redis", "redis": { "host": "***REMOVED SENSITIVE VALUE***", "port": 6379 }, "mail_smtpmode": "smtp", "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpport": "465", "mail_smtpsecure": "ssl", "mail_smtpauth": true, "mail_smtpauthtype": "LOGIN", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_domain": "***REMOVED SENSITIVE VALUE***", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "cloud.bessereau.eu" ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "dbtype": "mysql", "version": "16.0.3.0", "overwrite.cli.url": "https:\/\/cloud.bessereau.eu", "overwriteprotocol": "https", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbport": "", "dbtableprefix": "oc_", "mysql.utf8mb4": true, "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "instanceid": "***REMOVED SENSITIVE VALUE***", "installed": true, "maintenance": false, "loglevel": 2 }

Server log (data/nextcloud.log)

``` empty ```
@nickvergessen
Copy link
Member

Mind to send me a link to a public conversation to <my github name>@nextcloud.com ?

Works very fine here.

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019 via email

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

Ok, just sent the invite now.

As with previous cases, the moment that Talk is up, the entire instance slows down to a crawl. Loading any function is slow as hell. This does not affect other containers, and netdata reports no load:

image

@nickvergessen
Copy link
Member

I receive a "502 Bad Gateway". Can you check your apache2/nginx logs? There should be something somewhere.

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

Yeah that's the web container crashing...
I'm checking logs but they're empty. Wondering if I'm looking in the wrong place

@nickvergessen
Copy link
Member

When exactly does it crash, when you make the conversation public, when you join the chat as a guest or when you start the call?

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

As soon as a user is in the call the web container starts getting sluggish which ends up with 502 errors.

@nickvergessen
Copy link
Member

We do long-polling requests (2 in parallel, 1 for chat messages and one for call related webrtc signaling messages). Maybe that is the problem?

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

I'm not intelligent enough to understand that answer unfortunately.

I'm still investigating the nginx logs but they seem to be completely empty on /var/log/ nginx/error.log, in both web and app containers. Which sounds rather weird.

@nickvergessen
Copy link
Member

basically every user has 2 constant connections open to your server. Maybe something in your configuration limits the number of possible open connections and therefore causes this problem.

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

I don't recall doing any specific setup, apart setting php-fpm to static and limiting it to 2 instances.

re: nginx:
lrwxrwxrwx 1 root root 11 Jun 4 22:30 /var/log/nginx/access.log -> /dev/stdout

That would explain why I'm not getting any logs; but I didn't set that up at all - is that standard in the docker-nextcloud conf?

update

If someone's reading this later for some reason, yes it's normal. This allows you to watch errors direcly using docker, or portainer if you use that, without going to the file.

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

I just checked my nginx-conf but for me this looks OK re basic setup, I double checked against the official docker one:


error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    set_real_ip_from  10.0.0.0/8;
    set_real_ip_from  172.16.0.0/12;
    set_real_ip_from  192.168.0.0/16;
    real_ip_header    X-Real-IP;

    #gzip  on;

    upstream php-handler {
        server app:9000;
    }

    server {
        listen 80;

        # Add headers to serve security related headers
        # Before enabling Strict-Transport-Security headers please read into this
        # topic first.
        # add_header Strict-Transport-Security "max-age=15768000;
        # includeSubDomains; preload;";
        #
        # WARNING: Only add the preload option once you read about
        # the consequences in https://hstspreload.org/. This option
        # will add the domain to a hardcoded list that is shipped
        # in all major browsers and getting removed from this list
        # could take several months.
        add_header X-Content-Type-Options nosniff;
        add_header X-XSS-Protection "1; mode=block";
        add_header X-Robots-Tag none;
        add_header X-Download-Options noopen;
        add_header X-Permitted-Cross-Domain-Policies none;
        add_header Referrer-Policy no-referrer;

        root /var/www/html;

        location = /robots.txt {
            allow all;
            log_not_found off;
            access_log off;
        }

        # The following 2 rules are only needed for the user_webfinger app.
        # Uncomment it if you're planning to use this app.
        rewrite ^/.well-known/host-meta /public.php?service=host-meta last;
        rewrite ^/.well-known/host-meta.json /public.php?service=host-meta-json last;
        rewrite ^/.well-known/webfinger /public.php?service=webfinger last;
        location = /.well-known/carddav {
            return 301 $scheme://$host/remote.php/dav;
        }
        location = /.well-known/caldav {
            return 301 $scheme://$host/remote.php/dav;
        }

        # set max upload size
        client_max_body_size 10G;
        fastcgi_buffers 64 4K;

        # Enable gzip but do not remove ETag headers
        gzip on;
        gzip_vary on;
        gzip_comp_level 4;
        gzip_min_length 256;
        gzip_proxied expired no-cache no-store private no_last_modified no_etag auth;
        gzip_types application/atom+xml application/javascript application/json application/ld+json application/manifest+json application/rss+xml application/vnd.geo+json application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/bmp image/svg+xml image/x-icon text/cache-manifest text/css text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/x-cross-domain-policy;

        # Uncomment if your server is build with the ngx_pagespeed module
        # This module is currently not supported.
        #pagespeed off;

        location / {
            rewrite ^ /index.php$request_uri;
        }

        location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)/ {
            deny all;
        }
        location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console) {
            deny all;
        }

        location ~ ^/(?:index|remote|public|cron|core/ajax/update|status|ocs/v[12]|updater/.+|ocs-provider/.+)\.php(?:$|/) {
            fastcgi_split_path_info ^(.+\.php)(/.*)$;
            include fastcgi_params;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            fastcgi_param PATH_INFO $fastcgi_path_info;
            # fastcgi_param HTTPS on;
            #Avoid sending the security headers twice
            fastcgi_param modHeadersAvailable true;
            fastcgi_param front_controller_active true;
            fastcgi_pass php-handler;
            fastcgi_intercept_errors on;
            fastcgi_request_buffering off;
        }

        location ~ ^/(?:updater|ocs-provider)(?:$|/) {
            try_files $uri/ =404;
            index index.php;
        }

        # Adding the cache control header for js and css files
        # Make sure it is BELOW the PHP block
        location ~ \.(?:css|js|woff2?|svg|gif)$ {
            try_files $uri /index.php$request_uri;
            add_header Cache-Control "public, max-age=15778463";
            # Add headers to serve security related headers (It is intended to
            # have those duplicated to the ones above)
            # Before enabling Strict-Transport-Security headers please read into
            # this topic first.
            # add_header Strict-Transport-Security "max-age=15768000;
            #  includeSubDomains; preload;";
            #
            # WARNING: Only add the preload option once you read about
            # the consequences in https://hstspreload.org/. This option
            # will add the domain to a hardcoded list that is shipped
            # in all major browsers and getting removed from this list
            # could take several months.
            add_header X-Content-Type-Options nosniff;
            add_header X-XSS-Protection "1; mode=block";
            add_header X-Robots-Tag none;
            add_header X-Download-Options noopen;
            add_header X-Permitted-Cross-Domain-Policies none;
            add_header Referrer-Policy no-referrer;

            # Optional: Don't log access to assets
            access_log off;
        }

        location ~ \.(?:png|html|ttf|ico|jpg|jpeg)$ {
            try_files $uri /index.php$request_uri;
            # Optional: Don't log access to other assets
            access_log off;
        }
    }

}

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

Also checked the main proxy log to check if there was anything there, but acccording to it all requests get forwarded normally with no issue.

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

ok I now have logs for the proxy, the web container, the app container in front me of - ALL requests end up with a 200 status code. I'm at a loss as to where to look from here.

@Windyo
Copy link
Author

Windyo commented Jul 17, 2019

So running the test again with every logs active, I haave some sight. Seems the client is closing the conenction before crashing everything, but I still dono't see why

.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /ocs/v2.php/apps/spreed/api/v1/room/ainx9ydy HTTP/1.1" 404 79 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "DELETE /ocs/v2.php/apps/spreed/api/v1/room/ainx9ydy/participants/active HTTP/1.1" 200 138 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /ocs/v2.php/apps/spreed/api/v1/room/ainx9ydy HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /login HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /apps/apporder/getOrder HTTP/1.1" 200 182 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /login HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /avatar/Ombi/64 HTTP/1.1" 201 951 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /login HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "10.0.4.20"

10.0.4.20 - - [17/Jul/2019:22:43:31 +0000] "GET /login HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" "10.0.4.20"

@Windyo
Copy link
Author

Windyo commented Jul 18, 2019

Ok I've even tried just putting the nextcloud container online online with no other load balancer or anything running - still fails. I'm 99% sure this has to do with docker, and I think what @nickvergessen wrote is logical - as it happens as soon as a call it started, it could be the long polls.

As is I don't know how to investigate any further and I am kinda stuck. I'll keep everything as-is for further testing and debug if someone can help.

@tanguy-opendsi
Copy link

Hi everyone. I got the same error. When i use 7.0 php the problem disapear

@nickvergessen
Copy link
Member

Which version were you using before?

@tanguy-opendsi
Copy link

tanguy-opendsi commented Jul 26, 2019

@nickvergessen i used php 7.3
Our nextcloud is 15.0.10 and 15.0.5 and the problem is same.
When we use php7.0 fpm the problem disapear

@Windyo
Copy link
Author

Windyo commented Jul 30, 2019

Ok I'm back - I'll try to do this next week. Just changing PHP version in the container was annoying so I'll clone into https://github.com/nextcloud/docker/tree/060cf0883ff12241081778714507e2823d84e629/16.0/fpm

change the reference to PHP7.0 then build the image from that. If anyone has concerns about this way of testing the issue let me know. Probably will do this tomorrow or smth.

@tanguy-opendsi
Copy link

@Windyo can you give me all your compose after that ? ;)

@tdm4
Copy link

tdm4 commented Aug 1, 2019

Is there an issue with using PHP 7.3 with Nextcloud talk? I'm on PHP 7.3 and I sometimes can't get reliable joins from people in a video call. Considering dropping back to 7.2 as a test.

@tdm4
Copy link

tdm4 commented Aug 1, 2019

@Windyo - PHP 5.6 and 7.0 are both end of life and no longer supported. I think you need 7.1 at a minimum, 7.2 probably better. (7.1 goes end of life in December 2019)

@nickvergessen
Copy link
Member

did it work for you too @tdm4 ?

@tdm4
Copy link

tdm4 commented Aug 19, 2019

@nickvergessen yes, downgrading to PHP 7.2 fixed my issues.. I think there's some problems with PHP 7.3.

@tanguy-opendsi
Copy link

@tdm4 for us we got obviously bad performances with 7.1 7.2 7.3 but we use fpm.
Do you use fpm too ???

@tdm4
Copy link

tdm4 commented Aug 20, 2019

@tanguy-opendsi Yes, I use fpm too. It doesn't matter whether you use php-fpm or some kind of apache prefork.. it's PHP itself here. I just use php-fpm with ondemand and max 10 children.

@tanguy-opendsi
Copy link

@tdm4 thx for your reply but, when i switch to fpm i got same troubles.
You use apache2 with fpm ?

@tdm4
Copy link

tdm4 commented Aug 20, 2019

@tanguy-opendsi I don't use apache webserver.

@tanguy-opendsi
Copy link

@tdm4,
Ok best regards.

@Windyo
Copy link
Author

Windyo commented Aug 20, 2019

quick update - I'm still trying to get the 7.2 container running with no errors. buliding the image from https://github.com/nextcloud/docker/tree/060cf0883ff12241081778714507e2823d84e629/16.0/fpm with the reference changed to 7.2 works, but then I get a slew of server errors complaining about wrong permissions for some reason.

Still haven't figured that out, so ATM I can't test if this fixes Talk.

@nickvergessen nickvergessen added this to the 💔 Backlog milestone Sep 27, 2019
@EliterScripts
Copy link

Hi, I am getting similar issues wih public chats being very buggy and slow. I use Mail-in-a-Box ( https://mailinabox.email/ ), and modified it to install Nextcloud Talk (spreed).

# php -v
PHP 7.2.24-0ubuntu0.18.04.2 (cli) (built: Jan 13 2020 18:39:59) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.2.24-0ubuntu0.18.04.2, Copyright (c) 1999-2018, by Zend Technologies
# sudo -u www-data php /usr/local/lib/owncloud/occ -V
Nextcloud 15.0.8
# nginx -v
nginx version: nginx/1.14.0 (Ubuntu)

You can see the basic nginx configuration on their Github: https://github.com/mail-in-a-box/mailinabox/tree/master/conf but I think you are mostly looking at https://github.com/mail-in-a-box/mailinabox/blob/master/conf/nginx-primaryonly.conf

@nickvergessen
Copy link
Member

Well Nextcloud 15 is 3 major versions behind, have you considered doing an update?

@EliterScripts
Copy link

It looks like Mail-in-a-Box hard coded Nextcloud version 17.0.2 with hash of 8095fb46e9e0c536163708aee3d17fab8b498ad6. I would like to propose a change, and there any concerns I might want to address with the project?

@EliterScripts
Copy link

I ran this command, and got that it is up-to-date.

# sudo -u www-data php /usr/local/lib/owncloud/occ upgrade
Nextcloud is already latest version
# sudo -u www-data php /usr/local/lib/owncloud/occ -V
Nextcloud 15.0.8

@nickvergessen
Copy link
Member

Well that updates the database based on the current files.
try:

sudo -u www-data php /usr/local/lib/owncloud/updater/updater.phar

@Windyo
Copy link
Author

Windyo commented Feb 12, 2020 via email

@nickvergessen
Copy link
Member

I guess so, but we can't fix a misconfigured php/webserver setup in our software.
Very sorry about this, but at teh same time there is still #2211 open and I guess that is what you face aswell

@Windyo
Copy link
Author

Windyo commented Feb 12, 2020

On one side i'm100% with you that you can't fix a misconfigured server, on the other side it's a docker instance, so config shouldn't really be an issue, esp as I've tested with a brand new config...

I'll check the HTTP2 thing out, but I don't think that's it as in this case the entire container crashes and restarts.

@dbeniamine
Copy link

Hi,

I was having a similar issue and I solved it.

Server configuration

  • Debian 10
  • Nexcloud 18.0.1.4
  • talk 8.0.3
  • php-fpm : 7.3.14-1~deb10u1
  • apache2 2.4.38

Description

Starting a call makes php extremly slow. Once the conversion is closed and php restarted, everything is back to normal.

Solution

After looking at php logs I saw that max_children was reached. After raising it (from 5 to 20, 10 was not enough) and restarting php7.3-fpm, everything is working again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants