Skip to content

eepp/findtb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

findtb

Table of Contents

findtb

findtb is a Python 3 package which finds tarball URLs within an HTML (HTTP) page or FTP directory.

findtb finds tarball URLs directly into the URL’s resource as well as in minor/path version directories, if any.

Installation

Install findtb from PyPI:

$ sudo pip3 install findtb

Usage

Use findtb.find_tarball_urls() to get a set of all the tarball URLs found from a given URL:

import findtb
import pprint

# find tarball URLs
urls = findtb.find_tarball_urls('ansible',
                                'https://releases.ansible.com/ansible/')

# print URLs
pprint.pprint(urls)
{'https://releases.ansible.com/ansible/ansible-1.1.tar.gz',
 'https://releases.ansible.com/ansible/ansible-1.2.1.tar.gz',
 'https://releases.ansible.com/ansible/ansible-1.2.2.tar.gz',
 'https://releases.ansible.com/ansible/ansible-1.2.3.tar.gz',
 'https://releases.ansible.com/ansible/ansible-1.2.tar.gz',
 'https://releases.ansible.com/ansible/ansible-1.3.0.tar.gz',
 ...
 'https://releases.ansible.com/ansible/ansible-2.9.0rc5.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.1.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.2.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.3.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.4.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.5.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.6.tar.gz',
 'https://releases.ansible.com/ansible/ansible-2.9.7.tar.gz'}

The first parameter is the project’s name as expected in the tarball names. The second parameter is the URL from which to start the search.

You can pass a logging.Logger object to findtb.find_tarball_urls(). This can be useful as the search process can be long:

import findtb
import pprint
import logging

# configure logging
logging.basicConfig(style='{',
                    format='{asctime} [{name}] {{{levelname}}}: {message}')

# create logger
logger = logging.getLogger('findtb')
logger.setLevel(logging.INFO)

# find tarball URLs
urls = findtb.find_tarball_urls('glib',
                                'ftp://ftp.gnome.org/pub/gnome/sources/glib/',
                                logger)

# print URLs
pprint.pprint(urls)
2020-04-20 12:08:22,200 [findtb] {INFO}: FTP: connecting to `ftp.gnome.org`.
2020-04-20 12:08:27,908 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/`.
2020-04-20 12:08:28,061 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/`.
2020-04-20 12:08:28,824 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.39`.
2020-04-20 12:08:28,983 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.39`.
2020-04-20 12:08:29,770 [findtb] {INFO}: FTP: changing working directory to `..`.
2020-04-20 12:08:29,922 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.19`.
2020-04-20 12:08:30,079 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.19`.
2020-04-20 12:08:30,871 [findtb] {INFO}: FTP: changing working directory to `..`.
2020-04-20 12:08:31,027 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.60`.
2020-04-20 12:08:31,177 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.60`.
...
{'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.1/glib-1.1.12.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.1/glib-1.1.15.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.0.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.10.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.2.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.5.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.6.tar.gz',
 ...
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.4.tar.bz2',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.4.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.5.tar.bz2',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.5.tar.gz',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.6.tar.bz2',
 'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.6.tar.gz'}

findtb.find_tarball_urls() is quite low-level: it returns a set of strings. Use findtb.find_tarballs() to get a set of findtb.Tarball objects instead.

A findtb.Tarball object contains useful properties such as name (tarball file name) and version (a packaging.version.Version object):

import findtb

# find tarballs
tarballs = findtb.find_tarballs('python',
                                'https://www.python.org/ftp/python/')

# print names and versions
for tarball in tarballs:
    print(tarball.name, tarball.version.release)
Python-3.7.0b2.tgz (3, 7, 0)
Python-3.4.0a1.tgz (3, 4, 0)
Python-2.4.6.tgz (2, 4, 6)
Python-3.3.4rc1.tar.xz (3, 3, 4)
Python-3.1.3rc1.tar.bz2 (3, 1, 3)
python-3.6.0b4-embed-win32.zip None
Python-3.7.0a1.tgz (3, 7, 0)
python-3.5.0rc3-embed-amd64.zip None
...
Python-2.2.1.tgz (2, 2, 1)
python-3.9.0a4-embed-win32.zip None
Python-2.7.1rc1.tgz (2, 7, 1)
Python-3.4.0a3.tgz (3, 4, 0)
Python-3.5.2.tar.xz (3, 5, 2)
Python-3.0a2.tgz (3, 0)
Python-3.6.6rc1.tgz (3, 6, 6)

Any networking/parsing error raises findtb.Error.

Limitations

findtb is not guaranteed to work for all projects. Its very sophisticated algorithms rely on nasty regular expressions and there are dozens of software versioning schemes in the wild.

Feel free to contribute if findtb does not work for you.