Bittorent download does not always work #83

Open
poeml opened this Issue Jun 5, 2015 · 0 comments

1 participant

@poeml
Owner
                                                                                  [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue83

Title    Bittorent download does not always work
 Priority   bug       Status     chatting
Superseder           Nosy List   Kelson, poeml
Assigned To poeml    Keywords

msg315 (view) Author: Kelson Date: 2011-02-21.17:40:16

I use packaged version of mirrorbrain (Version: 2.15.0-1) on Ubuntu 10.04

It seems that downloading a .torrent file does not always work properly.

If I try many times to download a .torrent file with wget, there will be always
a try where wget will start to download the real file instead of getting the
bittorrent content.

Hier a typical example (first try OK, second try NOK):

$ rm torrent ; wget --server-response
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
--2011-02-21 18:33:43--
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
Résolution de download.kiwix.org... 88.190.13.50
Connexion vers download.kiwix.org|88.190.13.50|:80... connecté.
requête HTTP transmise, en attente de la réponse...
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 17:33:43 GMT
Server: Apache/2.2.14 (Ubuntu)
Connection: close
Content-Type: application/x-bittorrent
Longueur: non spécifié [application/x-bittorrent]
Sauvegarde en : «kiwix-0.5.iso.torrent»

  [ <=>
                          ] 34 488      --.-K/s   ds 0s

2011-02-21 18:33:43 (328 MB/s) - «kiwix-0.5.iso.torrent» sauvegardé [34488]

$ rm torrent ; wget --server-response
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
--2011-02-21 18:33:44--
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
Résolution de download.kiwix.org... 88.190.13.50
Connexion vers download.kiwix.org|88.190.13.50|:80... connecté.
requête HTTP transmise, en attente de la réponse...
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 17:33:44 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/x-bittorrent
Longueur: 442310656 (422M) [application/x-bittorrent]
Sauvegarde en : «kiwix-0.5.iso.torrent»

19% [==================>
] 84 684 800 64,9M/s ^C
$file kiwix-0.5.iso.torrent
kiwix-0.5.iso.torrent: # ISO 9660 CD-ROM filesystem data 'Kiwix-0.5'

msg316 (view) Author: poeml Date: 2011-02-21.18:12:58

Hi! That's interesting behaviour - I would not expect it. Maybe a bug. The only
rational explanation that I would have for the moment would be either that
sometimes no mirror is marked available in the database, or that connection to the
database fails intermittently. Delivering the whole file is the fallback behaviour
in some conditions, although it shouldn't happen for a failure to generate a
torrent (and I believe that that would rather result into a 404).

To find out more, could you please enable debug logging, by setting
"MirrorBrainDebug on" in Apache config for the directory in question? And could
you then reproduce the bug and send me (or attach) the log?

The log will be long, but should give details why a torrent wasn't generated.

msg317 (view) Author: poeml Date: 2011-02-21.18:13:52

The bug seems to be easy to reproduce. I just tried your URL a few times, and I
already see the two different responses:

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:25 GMT
Server: Apache/2.2.14 (Ubuntu)
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:27 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:32 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:41 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:48 GMT
Server: Apache/2.2.14 (Ubuntu)
Content-Type: application/x-bittorrent

msg318 (view) Author: poeml Date: 2011-02-21.18:17:16

Maybe some Apache children fail to connect to the database, and some succeed.
Incoming requests are handled by the children taking turns, which could result
into this picture. The output of "ps afuxw | grep apache2" and "netstat -tupan |
grep apache2 | grep 5432" could be interesting.

msg319 (view) Author: Kelson Date: 2011-02-21.19:52:29

So the problem is that not connection was possible to the DB and it seems to
fall back in this case to the ISO content. Unfortunately, after a postgresql
restart, I'm not able to reproduce the error anymore. Unfortunately, I do not
have the log anymore, but this was a DB connection error and I post it to the
#mirrorbrain IRC channel.

Now I have sporadic 404 errors, which is still NOK... but better. Here is the
Apache error log

Mon Feb 21 20:44:38 2011] [warn] [client 88.190.13.50] [mod_mirrorbrain]
Successfully acquired database connection.
[Mon Feb 21 20:44:38 2011] [warn] client 88.190.13.50Unknown error
18446744073709551615: [mod_mirrorbrain] Could not retrieve row from database for
archive/kiwix/kiwix-0.5.iso (size: 442310656, mtime 1293534789): ERREUR:
l'instruction pr\xc3\xa9par\xc3\xa9e \xc2\xab mirrorbrain_dbd_hash_18 \xc2\xbb
n'existe pas\n Likely cause: there are no hashes in the database (yet).
[Mon Feb 21 20:44:38 2011] [warn] [client 88.190.13.50] [mod_mirrorbrain] no
hashes found in database, but needed for torrent representation

msg320 (view) Author: poeml Date: 2011-02-21.21:11:22

[Mon Feb 21 19:33:18 2011] [warn] [client 88.190.13.50] [mod_mirrorbrain] SQL
file to look up: archive/kiwix/kiwix-0.5.iso [Mon Feb 21 19:33:18
2011] errorInternal error: DBD: failed to initialise
[Mon Feb 21 19:33:18 2011] [error] [client 88.190.13.50] [mod_mirrorbrain] Error
acquiring database connection

I saw the log you pasted into the irc channel, thanks. The database connection
was indeed the problem there. One or more Apache children could not connect. A
clue for the reason might exist in the PostgreSQL log. That would probably
/var/log/postgresql/postgresql-8.4-main.log on Ubuntu.

That this problem disappears after a restart doesn't necessarily mean that it
might not appear again after a while -- maybe depending on the number of Apache
children or number of open connections. How did you configure the connection
pool? It might look like this, when using the worker MPM:

  DBDMin  0
  DBDMax  12
  DBDKeep 3
  DBDExptime 10

For the prefork MPM, no such config is needed, and then every child opens a
connection to the database. This means that PostgreSQL needs to allow as many
connections as are needed. The limit is set via max_connections in
/etc/postgresql/8.4/main/postgresql.conf.

Now to the new messages you get. The message "Could not retrieve row" is where
mod_mirrorbrain runs into an error. The reason provided by the database adapter
(" l'instruction ... mirrorbrain_dbd_hash_18 n'existe pas") is the key -- the
precompiled database query does not exist, as it should be. Normally, the
queries are compiled once when a new Apache child start. (If that fails, it
should log an error in Apache's error log by the way.)

The other message, "Likely cause: there are no hashes", is wrong. That's not
true, and misleading. Since the precompiled query doesn't exist, the query
cannot even executed, which would find or not find hashes in the database.
mod_mirrorbrain should not log this wrong interpretation in this case.

All in all, the usage of a high number of connections, more than PostgreSQL
allows, could be the underlying issue maybe!

msg321 (view) Author: Kelson Date: 2011-02-22.22:27:57

About the mirrorbrain_dbd_hash_18, I have nothing additional in the apache2
error.log file. Nothing additional is tracked in the pgsql log file (only
oneline with the error concerning the prepared instruction
mirrorbrain_dbd_hash_18). But I do not have the error since yesterday!

$ sudo cat /var/log/postgresql/postgresql-8.4-main.log | grep
mirrorbrain_dbd_hash | tail
2011-02-21 20:42:49 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:50 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:50 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:52 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:52 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:53 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:18 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:22 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:23 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:38 CET ERREUR: l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas

I have upgraded max_connection to 200.
It seems that Ubuntu (in fact the kernel) is not really able to deal with more
than that.

I use the Apache2 prefork MPM.

I have currently difficulties to get errors, but I still see pretty much erros
in the pgsql log file, for example:

2011-02-22 09:34:03 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:08 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:09 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:17 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:20 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:26 CET FATAL: limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:35:01 CET ERREUR: la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:35:01 CET INSTRUCTION :
SELECT pg_catalog.pg_get_constraintdef(oid) as condef
FROM pg_catalog.pg_constraint r
WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'
2011-02-22 09:36:02 CET ERREUR: la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:36:02 CET INSTRUCTION :
SELECT pg_catalog.pg_get_constraintdef(oid) as condef
FROM pg_catalog.pg_constraint r
WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'
2011-02-22 09:37:01 CET ERREUR: la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:37:01 CET INSTRUCTION :
SELECT pg_catalog.pg_get_constraintdef(oid) as condef
FROM pg_catalog.pg_constraint r
WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'
2011-02-22 09:38:02 CET ERREUR: la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:38:02 CET INSTRUCTION :
SELECT pg_catalog.pg_get_constraintdef(oid) as condef
FROM pg_catalog.pg_constraint r
WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'

So it seems to that:

  • I have a serious problem with the number of psql connections with psql with reach often the limit. Does it should be better with worker MPM?
  • Something seems to be wrong with pfx2asn
msg322 (view) Author: poeml Date: 2011-02-23.05:09:29

It fits to our other findings that the pgsql logfile says that the prepared
statement "mirrorbrain_dbd_hash_18" does not exist.

A first question should be, why do you have so many Apache children? I would
suggest now to make sure that KeepAliveTimeout is not set to a high value like
15, but to 2. You could even switch it off. You can check the server status
(/server-status via mod_status) to see how many processes are in keepalive
state.

If a large number of Apache processes is unavoidable, e.g. because there are
really many long-running requests in parallel, then the worker MPM would make
sense. If mod_php is used, you'd need to switch to mod_fcgid (which will save
memory from unused PHP interpreters at the same time).

It should not be needed, but if really required one can also configure
PostgreSQL to open more connections (kernel tuning needed as well), but all
those connections aren't really used, and the fact that Apache spawns "too" many
children should be fixed first.

(You can ignore the pfx2asn message in the postgresql log.)

msg325 (view) Author: Kelson Date: 2011-02-26.12:46:36

So, there is like you have written a direct relation between the number of
parallel Apache processes and the pgsql connection count problem.

I didn't switch to MPM Worker because I don't know it.

But I increased massively the max_connections config to 500 (and also modify the
kernel config).

It seems to work now and I keep monitoring this problem.

It would be maybe interesting to warn user in the mirrorbrain documentation
about this max_connection "issue". 100 or 200 apache processes are not so
difficult to get on the prod. server.

Thank Poeml for you help.

msg380 (view) Author: poeml Date: 2012-04-15.20:36:22

Is the situation here still okay for you? Then I would like to close the bug.

I can only recommend the worker MPM. If PHP is used, it is required however to
switch from the in-process mod_php to mod_fcgid (or a similar model where PHP
interpreters are started separately from Apache processes). mod_php is not
compatible with threaded MPMs like worker. If you have hundreds of processes, it
sounds as if the server could save quite some resources by switching to such a
model. Most PHP apps work flawlessly with such a setup.

The thing is that with 500 Apache processes, mod_php will mean there are 500 PHP
processes too, and since database connections can be shared in a pool only with
the worker MPM, but not with the prefork MPM, there'll be 500 database
connections, too. There's no way around that.

But even with the prefork MPM, having hundreds of processes sounds wrong. Maybe
it must be that way - I don't know your setup - but usually, most of the
hundreds of processes are just idling and mainly keeping connections open for
far too long. Reducing the KeepaliveTimeout from 15 seconds to one or two, or
setting keepalive off, can often work wonders.

msg381 (view) Author: poeml Date: 2012-04-15.20:40:20

for cross reference: https://issues.jenkins-ci.org/browse/JENKINS-12297

History
         Date          User  Action            Args
2012-04-15 20:40:20 poeml  set    messages: + msg381
2012-04-15 20:36:22 poeml  set    priority: urgent -> bug
                                    messages: + msg380
2011-02-26 12:46:37 Kelson set    messages: + msg325
2011-02-23 05:09:29 poeml  set    messages: + msg322
2011-02-22 22:27:57 Kelson set    messages: + msg321
2011-02-21 21:11:22 poeml  set    messages: + msg320
2011-02-21 19:52:29 Kelson set    messages: + msg319
2011-02-21 18:17:17 poeml  set    messages: + msg318
2011-02-21 18:13:52 poeml  set    messages: + msg317
                                    status: unread -> chatting
2011-02-21 18:12:59 poeml  set    assignedto: poeml
                                    messages: + msg316
                                    nosy: + poeml
2011-02-21 17:40:17 Kelson create

(end of migrated issue)
@poeml poeml added the bug label Jun 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment