Drop ftp:// urls from metalinks #99

Closed
nirik opened this Issue Jun 22, 2015 · 11 comments

Projects

None yet

5 participants

@nirik
Member
nirik commented Jun 22, 2015

ftp causes issues with many firewalls and is in general a horrible protocol. We should stop offerering them in metalink urls.

We might want to check/contact any mirrors that have only ftp urls and ask them to fix it or update to add a http{s} url.

@adrianreber
Contributor

I had a look at the URLs in the database. From all mirrors there are 37 (total 2009) categories with FTP only links:

  1 ftp-redirect
  1 http-forbidden (403)
  4 private
  5 no-dns
  7 http-works
  8 http-not-found (404)
 11 no-http

So this looks like we could disable FTP in mirrormanager. I will provide a patch which will prevent that FTP URLs are added in the future. Then we can remove the existing FTP URLs from the database.

@pypingou
Member
pypingou commented Nov 9, 2015

@adrianreber do we want to make this configurable is other system want to support ftp?

@adrianreber
Contributor

@pypingou good point. Other MM users might want to use FTP.

@adrianreber adrianreber added a commit to adrianreber/mirrormanager2 that referenced this issue Nov 10, 2015
@adrianreber adrianreber Optionally exclude certain protocols from MM
This change adds the possibility to exclude certain
protocols to be entered in MirrorManager. The default
value in the configuration file of

 MM_PROTOCOL_REGEX = '^(?!ftp)(.*)$'

will prohibit that FTP based URLs are entered at all.

The goal (as discussed in #99) is to remove FTP URLs from Fedora's
MirrorManager setup.

Signed-off-by: Adrian Reber <adrian@lisas.de>
b414de8
@henrysher henrysher pushed a commit to henrysher/fedora-infra-ansible that referenced this issue Dec 17, 2015
@adrianreber adrianreber First step to disable FTP in MirrorManager
As discussed in

fedora-infra/mirrormanager2#99

This is the first step to remove FTP from MirrorManager. With this
change it is no longer possible to enter FTP URLs into MM.

Signed-off-by: Adrian Reber <adrian@lisas.de>
59c954e
@ralphbean ralphbean added the medium label Jan 11, 2016
@nirik
Member
nirik commented Mar 7, 2016

Whats the status here? can we drop these yet?

@adrianreber
Contributor

We probably could. New mirrors can only be added without FTP URLs and there has not been any negative feedback until now. When I am looking at mirrors with problems I am manually deleting FTP URLs when I see them. It would be nice to remove those URLs all at once, but I see right now no problem removing them slowly for now. If anybody wants to remove all the FTP URLs right now, there are no objections from me.

@nirik
Member
nirik commented Mar 8, 2016

So, we just need to go into the db and remove all the ftp containing items?

I'm for doing this sooner rather than later. I get a pretty constant stream of people asking me why we have ftp:// urls and when they are going to go away.

@pypingou any thoughts?

@mdomsch
Member
mdomsch commented Mar 8, 2016

FYI, the crawler crawls via rsync if available, falling back to FTP if
available, and finally http, priority being the fastest and least intrusive
way of getting the list of files. Removing FTP will slow down the crawler
for any mirror that doesn't offer rsync but has offered FTP.

On Tue, Mar 8, 2016 at 2:12 PM, Kevin Fenzi notifications@github.com
wrote:

So, we just need to go into the db and remove all the ftp containing items?

I'm for doing this sooner rather than later. I get a pretty constant
stream of people asking me why we have ftp:// urls and when they are
going to go away.

@pypingou https://github.com/pypingou any thoughts?


Reply to this email directly or view it on GitHub
#99 (comment)
.

@adrianreber
Contributor

Unfortunately that (RSYNC > FTP > HTTP) is not true for quite some time now already (I think). Even MirrorManager1 preferred HTTP over FTP according to

https://git.fedorahosted.org/cgit/mirrormanager.git/tree/server/crawler_perhost#n393

But then there is also the logic to first crawl per category (RSYNC), per directory (FTP) and then per file (HTTP). So there is still the possibility that mirrors are crawled using FTP but I haven't seen it very often. I think that in cases where we need too much time to crawl RSYNC is the only sane option. Especially as the crawlers seem to be behind some kind of NAT using FTP to crawl might become (or already is) problematic.

@adrianreber
Contributor

Debian is also planning to remove FTP mirrors:

https://lists.debian.org/debian-mirrors/2016/04/msg00000.html

@adrianreber
Contributor

All FTP URLs have been removed from Fedora's MirrorManager DB. Adding new FTP URLs to the DB is no longer possible. See commit above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment