If you are like most people are today, you have searched for torrents to download stuff. And if you are like me, you have used torrentz.com to conduct those searches, because it aggregates results from various sites.

In this post we will see how to do those searches programatically from the terminal. This in it itself will reduce effort and time required to search for something, but using scripts allows us to build upon this basic search functionality. Here we will build up to a system with a pre-set list of items to search for and a way to exclude results that we have already seen.

## The Address

If you go to your browser and use your normal workflow to search for 'Ubuntu 16.04' you will see something like this. 

<img src="/images/ubuntusearch1.png">

This means that if you were to instead type in the whole address as https://torrentz.eu/search?f=Ubuntu%2016.04 you would have got the same results. We now know that if we can somehow construct the search URL from the items we want to search for, and if we can send a request to the website, then we can get back the data we want.

In [1]:
base_url = 'https://torrentz.eu/search?f='
search_term = 'ubuntu 16.04'
# Forget about the '%20's for spaces. Append search term to URL with '+' in between.
url = base_url + '+'.join(map(str, search_term.split(' ')))
print(url)

https://torrentz.eu/search?f=ubuntu+16.04


## The Request

Sending requests over the internet is a common need and Python had the 'requests' module to help us out.

In [2]:
import requests
response = requests.get(url)
print(response)

<Response [200]>


So we got a Response object with status 200. This means that there were no problems and we have a webpage. If you got other codes like the familiar 404 you would know that your request failed. You can also check if the response was ok by checking the value of the 'ok' attribute.

In [3]:
response.ok

True

The page contents itself is stored in the 'text' attribute. Let's see how it looks.

In [4]:
len(response.text)

23455

In [5]:
response.text[500:1000]

'="/secureopensearch.xml" type="application/opensearchdescription+xml" title="Torrentz Search" />\n<meta name="viewport" content="width=820">\n</head>\n<body>\n<div class="top">\n<h1><a href="/" title="Search Engine">Torrentz</a></h1>\n<ul>\n<li><a href="/search" title="Torrentz Search">Search</a></li>\n<li><a href="/my" title="Personal Search">myTorrentz</a></li>\n<li><a href="/profile" title="My Profile">Profile</a></li>\n<li><a href="/help" title="Get Help">Help</a></li>\n</ul>\n</div>\n\n<form action="/sea'

It looks like we got a lot of HTML which is not what we needed. We need some way to extract the useful names of the torrents from this noise.

## The Extraction

If you print out the whole response.text and keep staring at it for some time you will eventually figure out that the results you want are enclosed in the < dt > tags which themseleves are enclosed in < dl > tags.

But there is an easier way to find that if you are using Chrome. Right click and choose 'Inspect' or just press Ctrl+Shift+I to reveal a pane with the HTML of the current page laid out in a much more organized fashion. The best part is when you move your mouse over the HTML tags the corresponding area in the page gets highlighted allowing you to quickly find out in which tags the data you are interested in is boxed up.

<img src="/images/ubuntusearch2.png">

So all we need now is a way to get out only the text which are in < dt > tags inside < dl > tags.

Without getting into a discussion about regular expressions, or HTML tag trees, let's use the BeautifulSoup module which was built just for this purpose.

In [6]:
import bs4
soup = bs4.BeautifulSoup(response.text, 'lxml')
elements = soup.select('dl > dt') # note how we specify that we want <dt>s inside <dl>s
print('The raw element: \n', elements[1])
print('\nThe relevant text: \n', elements[1].getText())

The raw element: 
 <dt><a href="//ads.ad-center.com/offer?prod=7&amp;ref=5052214" rel="nofollow"><strong>ubuntu 16.04</strong> [Verified]</a></dt>

The relevant text: 
 ubuntu 16.04 [Verified]


And we are done!

Let's put this all together into a function, and while we are at it let's also do a few more useful things:
- Keep the number of results down to Top N (using 5 here) because we don't really want all the results
- Add one more base url with 'searchA?f' instead of 'search?f' to get the date ordered results too
- Make a list of items to search automatically every time the script is run

In [7]:
BASE_URLS = ['https://torrentz.eu/search?f=',
             'https://torrentz.eu/searchA?f=']

NUMBER = 5

SEARCH_LIST = ['ubuntu 16.04',
               'elementary os']


import requests, bs4

def search(name):
    results = []
    for base_url in BASE_URLS:
        url = base_url + '+'.join(map(str, name.split(' ')))
        response = requests.get(url)
        soup = bs4.BeautifulSoup(response.text, 'lxml')
        elems = soup.select('dl > dt')
        elems = [x.getText().split(' » ')[0] for x in elems[:NUMBER]]
        for elem in elems:
            results.append(elem)
    return results
            
for name in SEARCH_LIST:
    results = search(name)
    print("-----------------------------------------")
    print('Results for ' + name.upper())
    print("-----------------------------------------")
    print('\n'.join(results) + '\n')

-----------------------------------------
Results for UBUNTU 16.04
-----------------------------------------
Full ubuntu 16.04 Download
ubuntu 16.04 [Verified]
Direct ubuntu 16.04 Download
ubuntu 16 04 desktop 64 bit lts iso
Ubuntu 16 04 Desktop x86 iso
Full ubuntu 16.04 Download
ubuntu 16.04 [Verified]
Direct ubuntu 16.04 Download
Ubuntu Studio 16 04 DVD i386 ISO DISTRO Netchup
Ubuntu Studio 16 04 DVD x86 64 ISO DISTRO Netchup

-----------------------------------------
Results for ELEMENTARY OS
-----------------------------------------
Full elementary os Download
elementary os [Verified]
Direct elementary os Download
Windows 7 Elementary 2016 by axeswy & Tomecar =TEAM OS=
Elementary OS Freya 64 Bit
Full elementary os Download
elementary os [Verified]
Direct elementary os Download
Windows 7 Elementary 2016 by axeswy & Tomecar =TEAM OS=
Elementary OS 0 3 1 64 bit



If you save the code in the above cell in a Python file you can run it in your terminal with a simple "python file.py" to see the results. Alternatively, if you are using Linux you can put "#! /path/to/python" on the first line, make the script executable by running "chmod +x file.py" in the terminal, and then run it more simply by running "./file.py"

Now there are a couple of features that we would want from this script to really make it useful. We don't really want to go into the file and change the SEARCH_LIST whenever we want to search for a new thing. So we could take the search term from the terminal instead. Also, every time we repeat a search we really want to see only the results which weren't shown the last time. To solve this all we need to do is to write out the results into a text file and each time remove the results which are present in the file.

You can find the complete script with these added functionalities and the Jupyter notebook for this post on [Github](https://github.com/rithwik/torrent-tracker).