# 6. How To Download Multiple Images In Python

## Learning Outcomes

- To learn how to download multiple images in Python using sychronous and asynchronous code.

------------------------------------------------------

Automatically downloading images from a number of your HTML pages is an essential skill, in this guide you'll be learning 4 methods on how to download images using Python! 

---------------------------------------------------------------

Let's begin with the easiest of the two methods, if you already have a list of image URLs then we can follow this process:

1. Change into a directory where we would like to store all of the images.
2. Make a request to download all of the images, one by one.
3. We will also include error handling so that if a URL no longer exists the code will still work.

------------------------------------------------------------------------------------------------

## Python Imports

In [1]:
!pip install tldextract



In [2]:
import requests
import os
import subprocess
import urllib.request
from bs4 import BeautifulSoup
import tldextract

----------------

In [3]:
!mkdir all_images

In [4]:
!ls

[34mall_images[m[m                            how-to-download-multiple-images.ipynb
asyncio-aiofiles.py


Changing into the directory of the folder called all_images, this can be done by either:

~~~

cd all_images
os.chdir('path')

~~~

In [5]:
os.chdir('all_images')

In [6]:
!pwd

/Users/jamesaphoenix/Desktop/Imran_And_James/Python_For_SEO/6_downloading_multiple_images/all_images


---------------------

## Method One: How To Download Multiple Images From A Python List

In order to download the multiple images, we can use the [requests library](https://requests.readthedocs.io/en/master/). We'll also create a python list to store any images that didn't have a 200 status code:

☝️ See how simple that is! ☝️

If you check your folder, you will have now downloaded all of the images that contained a status code of 200! 

------------------------------------------------

![downloading images correctly with python](https://sempioneer.com/wp-content/uploads/2020/06/how-to-download-images-with-python.png)

----------------

## Method Two: How To Download Multiple Images From Many HTML Web Pages

If we don't yet have the exact image URLs, we will need to do the following:

1. Download the HTML content of every web page.
2. Extract all of the image URLs for every page.
3. Create the file names.
4. Check to see if the image status code is 200.
5. Write all of images to your local computer.

This website [internetingishard.com](https://www.internetingishard.com/html-and-css/links-and-images/) has some relative image URLs. Therefore we will need to ensure that our code can handle for the following two types of image source URLs:

---

- <strong> Exact Filepath: https://www.internetingishard.com/html-and-css/links-and-images/html-attributes-6f5690.png </strong>
- <strong> Relative Filepath: /html-and-css/links-and-images/html-attributes-6f5690.png </strong>

---------------------------------------------------------------

In [None]:
web_pages = ['https://understandingdata.com/', 
             'https://understandingdata.com/data-engineering-services/',
             'https://www.internetingishard.com/html-and-css/links-and-images/']

We will also extract the domain of every URL whilst we loop over the webpages like so:
    
~~~

for page in webpages:
    domain_name = tldextract.extract(page).registered_domain

~~~

In [14]:
url_dictionary = {}

--------------------------------------------------------

Now let's double check and filter our dictionary so that we only look at web pages where there was at least 1 image tag:

--------------------------------------------------------

An easier way to write the above code would be via a dictionary comprehension:

We can now clean all of the image URLs inside of every dictionary key and change all of the relative URL paths to exact URL paths.

Let's start by printing out all of the different image sources to see how we might need to clean up the data below:

----------------------------------------------------------------------

For the scope of this tutorial, I have decided to:
    
- Remove the logo links with the //
- Add on the domain to the relative URLs

-------------------------------------------------------------------------------------

After cleaning the image URLs, we can now refer to method one for downloading the images to our computer! 

This time let's convert it into a function:

Fantastic! 

Now there are some things that we didn't necessarily cover for which include:

- http:// only image urls.
- http://www. only image urls.

But for the most part, you'll be able to download images in bulk!

---------------------------------------------

![how to download multiple images within python](https://sempioneer.com/wp-content/uploads/2020/06/all_images.png)

------------------------------------------------------------------------

## How To Speed Up Your Image Downloads

Its important when working with 100's or 1000's of URLs to avoid using as synchronous approach to downloading images. An asynchronous approach means that we can download multiple web pages or multiple images in parallel.

<strong> This means that the overall execution time will be much quicker! </strong>

--------------------

### ThreadPoolExecutor()

The ThreadPoolExecutor is one of python's built in I/O packages for creating an asynchronous behaviour via multiple threads. In order to utilise it, we will make sure that the function will only work on a single URL.

Then we will pass the image URL list into multiple workers ;) 

------------------------------------------------------------------------------------------

The below code will create a new directory and then make it the current active working directory:

You should've downloaded the images but at a much faster rate! 

-------------------------------------------------------------------------------------

### Async Programming! 

Just like JavaScript, Python 3.6+ comes bundled with native support for co-routines called [asyncio](https://docs.python.org/3/library/asyncio.html). Similar to NodeJS, there is a method available to you for creating custom event loops for async code. 

We will also need to download an async code HTTP requests library called [aiohttp](https://docs.aiohttp.org/en/stable/)

In [30]:
!pip install aiohttp



We will also download aiofiles that allows us to write multiple image files asynchronously:

In [31]:
!pip install aiofiles



In [32]:
import aiohttp
import aiofiles
import asyncio

------------------------------------------------------


----------------------------------------

--------------------------------------------

## How To Download 1 File Asychronously

![Downloading one image with aiofiles](https://sempioneer.com/wp-content/uploads/2020/06/image_files.png)

---------------------------------------------------------------------------------------------------------


We will need to structure our code slightly different for the async version to work across multiple file:

1. We will have a fetch function to query every image URL.
2. We will have a main function that creates, then executes a series of co-routines.



☝️☝️☝️ Notice how when we call this function, it doesn't actually run and produces a [co-routine!](https://docs.python.org/3/library/asyncio-task.html) ☝️☝️☝️

We can then use asyncio as method for executing all of the fetch callables that need to be completed:

![Error with asyncio.run](https://sempioneer.com/wp-content/uploads/2020/06/error-downloading-python-files.png)

If you receive this type of error when running the following command:

~~~

asyncio.run(main(all_images))

~~~


---

<strong> It is likely because you're trying to run asyncio within an event loop which is not natively possible. (Jupyter notebook runs in an event loop!). </strong>

-----------------------------------------------------------------------------

---------------------------------------------------------------

## How To Download Multiple Python Files Inside Of A Python File (.py)

Let's save the variable containing our URLs to a .txt file:

------------------------------------------

### Create A Python File

Then you will need to create a python file and add the following code to it:

------------------------------------------

Then run the python script in <strong> either your terminal / command line with: </strong>
    
    
~~~

python3 python_file_name.py


~~~

---------------------------------------------------------------

Let's break down what's happening in the above code snippet:
    
1. We are importing all of the relevant packages for async programming with files.
2. Then we create a new directory.
3. After creating the new folder we change that folder to be the active working directory.
4. We then read the variable data which was previously saved from the file called images.txt
5. Then we create a series of co-routines and execute them within a main() function with asyncio.
6. As these co-routines are executed every file is asynchronously saved to your computer.


![downloading multiple files with asyncio-aiohttp](https://sempioneer.com/wp-content/uploads/2020/06/asyncio-with-aiofiles.png)

------------------------------------------------------

Finally let's clear up and delete all of the folders to clean up our environment:

In [None]:
import shutil

---------------------------------------------------------------------------------------------------------

Whether you decide to download images synchronously or asynchronously, its important to realise that although you can do this in tools such as ScreamingFrog or with Google Chrome Extensions. Being able to download images with python allows you to extend your automation capabilities and what other programs, APIs etc you might use that image data with! 