Skip to content

gumblex/htmllisting-parser

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

htmllisting-parser

Python parser for Apache/nginx-style HTML directory listing

import htmllistparse
cwd, listing = htmllistparse.fetch_listing(some_url, timeout=30)

# or you can get the url and make a BeautifulSoup yourself, then use
# cwd, listing = htmllistparse.parse(soup)

where cwd is the current directory, listing is a list of FileEntry named tuples:

  • name: File name, str. Have a trailing / if it's a directory.
  • modified: Last modification time, time.struct_time or None. Timezone is not known.
  • size: File size, int or None. May be estimated from the prefix, such as "K", "M".
  • description: File description, file type, or any other things found. str as HTML, or None.

Supports:

  • Vanilla Apache/nginx/lighttpd/darkhttpd autoindex
  • Most <pre>-style index
  • Many other <table>-style index
  • <ul>-style

Note

Please wrap the functions in a general try... except block. It may throw exceptions unexpectedly.

ReHTTPFS

Reinvented HTTP Filesystem.

  • Mounts most HTTP file listings with FUSE.
  • Gets directory tree and file stats with less overhead.
  • Supports Range requests.
  • Supports Keep-Alive.
usage: rehttpfs.py [-h] [-o OPTIONS] [-t TIMEOUT] [-u USER_AGENT] [-v] [-d]
                   url mountpoint

Mount HTML directory listings.

positional arguments:
  url                   URL to mount
  mountpoint            filesystem mount point

optional arguments:
  -h, --help            show this help message and exit
  -o OPTIONS            comma separated FUSE options
  -t TIMEOUT, --timeout TIMEOUT
                        HTTP request timeout
  -u USER_AGENT, --user-agent USER_AGENT
                        HTTP User-Agent
  -v, --verbose         enable debug logging
  -d, --daemon          run in background

About

Python parser for Apache/nginx-style HTML directory listing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages