Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is a HTTP connection supposed to work? #36

Closed
wdlkmpx opened this issue Jan 25, 2020 · 7 comments
Closed

How is a HTTP connection supposed to work? #36

wdlkmpx opened this issue Jan 25, 2020 · 7 comments

Comments

@wdlkmpx
Copy link
Collaborator

wdlkmpx commented Jan 25, 2020

Select HTTP.

These 2 sites should be compatible with gFTP in HTTP mode?
http://distro.ibiblio.org
http://ftp.gnu.org

I'm seeing this anomaly in all gFTP versions

gFTP 1.0.8 GTK1 [hitting enter in the password field]
z1

all gFTP GTK2 versions: apparently the only way to trigger a HTTP connection is through Remote [menu] -> Open Location
z2

Bonus: http://ftp.redhat.com/

Trying ftp.redhat.com:80
Connected to ftp.redhat.com:80
Loading directory listing / from server (LC_TIME=en_US.UTF-8)
GET http://ftp.redhat.com/ HTTP/1.1
User-Agent: gFTP 2.0.19
Host: ftp.redhat.com
HTTP/1.1 400 Bad Request
Date: Sat, 25 Jan 2020 17:49:18 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1
Disconnecting from site ftp.redhat.com
@wdlkmpx
Copy link
Collaborator Author

wdlkmpx commented Jan 25, 2020

If you run: wget ftp.redhat.com it downloads index.html

Filezilla does not support HTTP
https://forum.filezilla-project.org/viewtopic.php?t=24557
Screenshot

https://stackoverflow.com/questions/4496182/getting-directory-listing-over-http

Most HTTP servers do not allow access to directory listings, and those that do are doing so as a feature of the server, not the HTTP protocol. For those HTTP servers, they are deciding to generate and send an HTML page for human consumption, not machine consumption. You have no control over that, and would have no choice but to parse the HTML.

@wdlkmpx
Copy link
Collaborator Author

wdlkmpx commented Jan 25, 2020

I edited rfc2068.c to make gFTP send a proper HTTP request.. something like this:

GET / HTTP/1.1
User-Agent: gFTP 2.0.19
Host: ftp.gnu.org
Accept: */*
Accept-Encoding: identity

And I finally get something.. that gFTP cannot process, the program becomes unresponsive trying to process the html code
w

That's it.

@masneyb
Copy link
Owner

masneyb commented Jan 26, 2020

If I recall, this was written to parse apache2 directory listings. I'm sure that format has likely changed since then and may no longer work. I bet there is now a nice, clean small library that can be used instead these days. Maybe something like libcurl could be used? https://curl.haxx.se/libcurl/

@wdlkmpx
Copy link
Collaborator Author

wdlkmpx commented Jan 26, 2020

It certainly no longer works. But I see different syntax in different sites. And I don't think libcurl can handle html directory listings and provide all the needed info to gftp.

This site doesn't even provide file sizes
http://ftp.redhat.com/pub/redhat/

html code is a nightmare to work with, unless you use libxml2 or something, then the requirements start to pile up.

https://curl.haxx.se/mail/archive-2010-05/0047.html

But libcurl could replace many things indeed

@wdlkmpx
Copy link
Collaborator Author

wdlkmpx commented Feb 6, 2020

Google Chrome 80 Released With WebVR 1.1, Dropping FTP Support
https://www.phoronix.com/scan.php?page=news_item&px=Google-Chrome-80-Released

Another fact is that FTP provides a structured directory listing while HTTP does not have a standard way to list directories (you can format directory listings with HTTP but everyone invents their own way to do it rather than use a standardized means to do so). This makes things really difficult if you need to write scripts to query directory listings.

Support for HTTP is probably not feasible anymore. I never used gftp to download files from HTTP servers, so I don't know when it got broken. But the whole codebase was really old, so I assume it happened between 2009-2011

So I never got to see how it works, and how it dealt with sites that didn't provide certain info. Was it able to resume downloads by querying the file size from the server?

It's quite easy to download files with wget, curl. So you use a browser or an ftp client to browse directories, me thinks. And I guess the main reason to use an ftp client is to upload files or connect to one of your servers, transfer files between PC's

@masneyb
Copy link
Owner

masneyb commented Feb 7, 2020

If I recall, the HTTP support was written to parse the standard Apache directory listing format. Looking at an example from kernel.org (https://cdn.kernel.org/pub/linux/kernel/v5.x/) shows that it's using nginx and the format is nothing like what I remember.

Since the HTTP support is broken, let's take that out too. :) Unless there's a nice library somewhere that can parse the various HTTP directory listing formats.

Regarding FTP, it does provide a structured directory listing format, however there's no standard there as well. The gftp code base has support for a large number of different dirctory listing formats. In hindsight, I wish that I would have created a test suite for the various types of directory listing formats that are supported.

@wdlkmpx
Copy link
Collaborator Author

wdlkmpx commented Feb 7, 2020

Here is a directory listing that seems to be "unique", the html code is quite simple in fact
https://downloads.haskell.org/~ghc/

gFTP does not how to retrieve ~ghc/ , I guess it's easy to make the relevant change.

HTTP support is not essential for an FTP client.. some servers even send compressed html pages or something. It's a different world with so many possible complexities.. and potential segfaults, so I'll open a pull request to remove it, but it could be reimplemented in some other way in the [very] distant feature.

@masneyb masneyb closed this as completed in 9a5b095 Feb 8, 2020
wdlkmpx added a commit that referenced this issue Dec 2, 2021
It was so broken that even HTTP requests didn't use CRLF (only LF)

I fixed a few things, file tranfers can potentially work,
but gftp requires dir listings, something that is broken (beyond repair)

Tests reveal that HTTP/HTTPS connections are made and communication happens

an alternate method should be devised to support protocols without dir listings
... in CLI mode (gftp-text)

ref #131
ref #137

Revert "remove HTTP support (closes #36)"

This reverts commit 9a5b095.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants