findtb is a Python 3 package which finds tarball URLs within an HTML (HTTP) page or FTP directory.
findtb finds tarball URLs directly into the URL’s resource as well as in minor/path version directories, if any.
Install findtb from PyPI:
$ sudo pip3 install findtb
Use findtb.find_tarball_urls()
to get a set of all the tarball URLs
found from a given URL:
import findtb
import pprint
# find tarball URLs
urls = findtb.find_tarball_urls('ansible',
'https://releases.ansible.com/ansible/')
# print URLs
pprint.pprint(urls)
{'https://releases.ansible.com/ansible/ansible-1.1.tar.gz', 'https://releases.ansible.com/ansible/ansible-1.2.1.tar.gz', 'https://releases.ansible.com/ansible/ansible-1.2.2.tar.gz', 'https://releases.ansible.com/ansible/ansible-1.2.3.tar.gz', 'https://releases.ansible.com/ansible/ansible-1.2.tar.gz', 'https://releases.ansible.com/ansible/ansible-1.3.0.tar.gz', ... 'https://releases.ansible.com/ansible/ansible-2.9.0rc5.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.1.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.2.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.3.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.4.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.5.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.6.tar.gz', 'https://releases.ansible.com/ansible/ansible-2.9.7.tar.gz'}
The first parameter is the project’s name as expected in the tarball names. The second parameter is the URL from which to start the search.
You can pass a logging.Logger
object to findtb.find_tarball_urls()
.
This can be useful as the search process can be long:
import findtb
import pprint
import logging
# configure logging
logging.basicConfig(style='{',
format='{asctime} [{name}] {{{levelname}}}: {message}')
# create logger
logger = logging.getLogger('findtb')
logger.setLevel(logging.INFO)
# find tarball URLs
urls = findtb.find_tarball_urls('glib',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/',
logger)
# print URLs
pprint.pprint(urls)
2020-04-20 12:08:22,200 [findtb] {INFO}: FTP: connecting to `ftp.gnome.org`. 2020-04-20 12:08:27,908 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/`. 2020-04-20 12:08:28,061 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/`. 2020-04-20 12:08:28,824 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.39`. 2020-04-20 12:08:28,983 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.39`. 2020-04-20 12:08:29,770 [findtb] {INFO}: FTP: changing working directory to `..`. 2020-04-20 12:08:29,922 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.19`. 2020-04-20 12:08:30,079 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.19`. 2020-04-20 12:08:30,871 [findtb] {INFO}: FTP: changing working directory to `..`. 2020-04-20 12:08:31,027 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.60`. 2020-04-20 12:08:31,177 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.60`. ... {'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.1/glib-1.1.12.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.1/glib-1.1.15.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.0.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.10.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.2.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.5.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.6.tar.gz', ... 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.4.tar.bz2', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.4.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.5.tar.bz2', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.5.tar.gz', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.6.tar.bz2', 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.6.tar.gz'}
findtb.find_tarball_urls()
is quite low-level: it returns a set of
strings. Use findtb.find_tarballs()
to get a set of findtb.Tarball
objects instead.
A findtb.Tarball
object contains useful properties such as name
(tarball file name) and version
(a packaging.version.Version
object):
import findtb
# find tarballs
tarballs = findtb.find_tarballs('python',
'https://www.python.org/ftp/python/')
# print names and versions
for tarball in tarballs:
print(tarball.name, tarball.version.release)
Python-3.7.0b2.tgz (3, 7, 0) Python-3.4.0a1.tgz (3, 4, 0) Python-2.4.6.tgz (2, 4, 6) Python-3.3.4rc1.tar.xz (3, 3, 4) Python-3.1.3rc1.tar.bz2 (3, 1, 3) python-3.6.0b4-embed-win32.zip None Python-3.7.0a1.tgz (3, 7, 0) python-3.5.0rc3-embed-amd64.zip None ... Python-2.2.1.tgz (2, 2, 1) python-3.9.0a4-embed-win32.zip None Python-2.7.1rc1.tgz (2, 7, 1) Python-3.4.0a3.tgz (3, 4, 0) Python-3.5.2.tar.xz (3, 5, 2) Python-3.0a2.tgz (3, 0) Python-3.6.6rc1.tgz (3, 6, 6)
Any networking/parsing error raises findtb.Error
.