# Download files from Bozner Zeitung

--------------------------install specifically required packages-------------------------------------

    conda install beautifulsoup4
    conda install urllib3

This is an example for how to download files from a complex webpage. This code relies havily on analyzing the structure of the given page, in this case the Tessmann digital archive. Therefore the code only works within this structure. If you want to download files from another webpage or archive, you need to analyze the structure of that page first, insert the different url-string-structures and follow the specific link-structure of the page. The tools and python packages used here are however very versatile, provide many more options and can therefore be used for a large range of webpages.

In [1]:
import shutil

In [2]:
import re, os

In [3]:
import urllib3
from bs4 import BeautifulSoup

## First we look at the journal startpage and get a list of available years

it contains the dates for all the years

In [4]:
# this url needs to be provided
start_url = 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Jahresuebersicht/Zeitung/2'

In [5]:
http = urllib3.PoolManager()

In [6]:
response = http.request('GET', start_url)



In [7]:
soup = BeautifulSoup(response.data)

In [8]:
# find lists in HTML
ylist = soup.find_all('ul')
ylist2 = ylist[2]
#print(ylist2.text)

Now we generate a list with all links to the month overview (starting page) for each year.
For this we need to know the page structure of the month overview, which is inserted in the 'adr_string'-variable.

In [9]:
adr_string = 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/'
year_link = []
splitlist = ylist2.text.split('\n')
for item in splitlist:
    if item.startswith('1'):
        adr = adr_string+item
        year_link.append(adr)

In [10]:
# generate starting page for each year
year_list = []
for num in range(len(year_link)):
    lsplit = year_link[num].split('/')
    year = lsplit[len(lsplit)-1]
    print (year, year_link[num])
    year_list.append(year)

1842 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1842
1843 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1843
1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1844
1845 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1845
1846 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1846
1847 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1847
1848 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1848
1849 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1849
1850 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1850
1851 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1851
1852 https://digital.tessmann.it/tessman

# Now we work with the list for a given year and download the pictures

We could also loop over all the years, but in order to keep things simple we just download one year at the time. The year is selected by the year index below.

In [11]:
# set year index
year_index = 2
print("year = ", year_list[year_index])
month_url =  year_link[year_index]
month_url

year =  1844


'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Monatsuebersicht/Zeitung/2/1844'

In [12]:
# get page data
http = urllib3.PoolManager()
response = http.request('GET', month_url)



In [13]:
# analyze with beautiful soup
soup = BeautifulSoup(response.data)

In [14]:
# you can search for specific tags in page
l1 = soup.find_all('a')

In [15]:
# from the tag list we can get the links to different issues
example = str(l1[20])
example.split('=')

['<a href',
 '"/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/26.01.1844">26</a>']

Now we generate a list with the adresses of all issues.
For generating the complete links the first part of the address needs to be provided as adr_string2.

In [16]:
adr_string2 = 'https://digital.tessmann.it'
datelist = []
linklist = []
for item in l1:
    splititem = str(item).split('=')
    if 'Tagesausgabe' in splititem[1]:
        split2 = splititem[1].split('>')
        split3 = split2[0].split('/')
        split4 = split3[6].split('"')
        datelist.append(split4[0])
        lsplit = split2[0].split(('"'))
        linklist.append(adr_string2+str(lsplit[1]))

In [17]:
# now we have a list of all available issues
for num in range(len(datelist)):
    print(datelist[num], linklist[num])

05.01.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/05.01.1844
12.01.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/12.01.1844
19.01.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/19.01.1844
26.01.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/26.01.1844
02.02.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/02.02.1844
09.02.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/09.02.1844
16.02.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/16.02.1844
23.02.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/23.02.1844
01.03.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Tagesausgabe/Zeitung/2/01.03.1844
08.03.1844 https://digital.tessmann.it/tessmannDigital/Zeitungsa

### Access issue list

We can now get information form the issue list

In [18]:
# testloop; this is for finding the required information, in this case an index number for the download link
num = 0
issue_url = linklist[num]
http = urllib3.PoolManager()
response = http.request('GET', issue_url)
soup = BeautifulSoup(response.data)
for item in soup.find_all('a'):
    adress = item.attrs['href']
    if adress.find('Seite') != -1 :
        split = adress.split('/')
        if int(split[(len(split)-1)]) > 3:
            print(split[(len(split)-2)])
            #ind_list.append

19631
19631




In [19]:
# dataloop: we get the information found above for all available dates of this year
ind_list = ["" for x in range(len(datelist))]
for num in range(len(datelist)):
    issue_url = linklist[num]
    http = urllib3.PoolManager()
    response = http.request('GET', issue_url)
    soup = BeautifulSoup(response.data)
    for item in soup.find_all('a'):
        adress = item.attrs['href']
        if adress.find('Seite') != -1 :
            split = adress.split('/')
            if int(split[(len(split)-1)]) > 3:
                #print(split[(len(split)-2)])
                ind_list[num] = split[(len(split)-2)]





### Sort datelist

We sort the list by date. This is not really required, but makes the download structure easier to understand

In [20]:
# sort datelist
new_list = []
for num, item in enumerate(datelist):
    #print(item.split('.'))
    strl = item.split('.')
    new_list.append([])
    new_list[num].append(str(strl[1])+'-'+str(strl[0])+'-'+str(strl[2]))
    new_list[num].append(item)
    new_list[num].append(ind_list[num])
new_list.sort()
new_list

[['01-05-1844', '05.01.1844', '19631'],
 ['01-12-1844', '12.01.1844', '19632'],
 ['01-19-1844', '19.01.1844', '19633'],
 ['01-26-1844', '26.01.1844', '19634'],
 ['02-02-1844', '02.02.1844', '19635'],
 ['02-09-1844', '09.02.1844', '19636'],
 ['02-16-1844', '16.02.1844', '19637'],
 ['02-23-1844', '23.02.1844', '19638'],
 ['03-01-1844', '01.03.1844', '19639'],
 ['03-08-1844', '08.03.1844', '19640'],
 ['03-15-1844', '15.03.1844', '19641'],
 ['03-22-1844', '22.03.1844', '19642'],
 ['03-29-1844', '29.03.1844', '19643'],
 ['04-05-1844', '05.04.1844', '19644'],
 ['04-12-1844', '12.04.1844', '19645'],
 ['04-19-1844', '19.04.1844', '19646'],
 ['04-26-1844', '26.04.1844', '19647'],
 ['05-03-1844', '03.05.1844', '19648'],
 ['05-10-1844', '10.05.1844', '19649'],
 ['05-17-1844', '17.05.1844', '19650'],
 ['05-24-1844', '24.05.1844', '19651'],
 ['05-31-1844', '31.05.1844', '19652'],
 ['06-07-1844', '07.06.1844', '19653'],
 ['06-14-1844', '14.06.1844', '19654'],
 ['06-21-1844', '21.06.1844', '19655'],


## Go to page 4 of specific issues 

The data of interest is on page 4, so we just need to download this page for each issue.
Again, adstr1 and adstr2 are obtained by analyzing the page structure. Together with the index number they provide us with the download link

In [21]:
num = 0
adstr1 = 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/'
adstr2 = '/'+str(ind_list[num])+'/4'
page_url = adstr1+new_list[0][1]+adstr2
page_url

'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/05.01.1844/19631/4'

In [22]:
# loop over issues for this year
page_url_list = []
for num in range(len(new_list)):
    adstr2 = '/'+str(ind_list[num])+'/4'
    page_url_list.append(adstr1+new_list[num][1]+adstr2)

In [23]:
# list of links with pages 4 for all issues in this year
page_url_list

['https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/05.01.1844/19631/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/12.01.1844/19632/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/19.01.1844/19633/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/26.01.1844/19634/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/02.02.1844/19635/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/09.02.1844/19636/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/16.02.1844/19637/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/23.02.1844/19638/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/01.03.1844/19639/4',
 'https://digital.tessmann.it/tessmannDigital/Zeitungsarchiv/Seite/Zeitung/2/1/08.03.1844/19640/4',


In [24]:
# from these links, we get a list of the picture links
pictures = []
for num in range(len(page_url_list)):
    http = urllib3.PoolManager()
    response = http.request('GET', page_url_list[num])
    soup = BeautifulSoup(response.data)
    for item in soup.find_all('a'):
        adress = item.attrs['href']
        if adress.find('media/image') != -1 :
            print (num, adress)
            pictures.append(adress)



0 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/05_01_1844/BZZ_1844_01_05_4_object_460267.png?auth=be15d03b6659c46595ffb798047b2d8a




1 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/12_01_1844/BZZ_1844_01_12_4_object_460274.png?auth=94283663b542fc3ac3c866c5712db754




2 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/19_01_1844/BZZ_1844_01_19_4_object_460280.png?auth=39ddfe4d41a886e98c8a7154b32fa7fb




3 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/26_01_1844/BZZ_1844_01_26_4_object_460290.png?auth=7d6d99c576bc04fad59c1b233c47111a




4 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/02_02_1844/BZZ_1844_02_02_4_object_460304.png?auth=8ee297adf951ea8ea1ea7a2eb593962d




5 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/09_02_1844/BZZ_1844_02_09_4_object_460062.png?auth=ff3d73e2c1c73a86a5a041ca3b8c4316




6 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/16_02_1844/BZZ_1844_02_16_4_object_460068.png?auth=7c7ba7c44c538e64eec15bd48fd70956




7 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/23_02_1844/BZZ_1844_02_23_4_object_460075.png?auth=a5334aa516e3160f95ac61ff3d22bb6e




8 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/01_03_1844/BZZ_1844_03_01_4_object_460082.png?auth=3a64c0f84f97f155905131577d1c2000




9 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/08_03_1844/BZZ_1844_03_08_4_object_460092.png?auth=db9abaf42d6a5c0823dc5b02559cf3a0




10 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/15_03_1844/BZZ_1844_03_15_4_object_460105.png?auth=b4dcacea589a1f93c65c70922baeb597




11 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/22_03_1844/BZZ_1844_03_22_4_object_460121.png?auth=faa93d2945265d2c8db45c901c0cc90d




12 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/29_03_1844/BZZ_1844_03_29_4_object_460138.png?auth=fa1ba26ff147e4879769d1055a74ee98




13 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/05_04_1844/BZZ_1844_04_05_4_object_460151.png?auth=fa78b6aebbd48d14960fb668870a9177




14 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/12_04_1844/BZZ_1844_04_12_4_object_460161.png?auth=207611cbad9c3fe96c70741eba41a8d7




15 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/19_04_1844/BZZ_1844_04_19_4_object_460171.png?auth=abfb6b24145750ee2396f0c51b6cbd68




16 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/26_04_1844/BZZ_1844_04_26_4_object_459880.png?auth=6d9c8ac92d7f789be8b8d14df9151d47




17 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/03_05_1844/BZZ_1844_05_03_4_object_459904.png?auth=74b745432f5a75d71d5befcfaea4c1fb




18 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/10_05_1844/BZZ_1844_05_10_4_object_459927.png?auth=7869e296bb7622fc9171876958b45511




19 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/17_05_1844/BZZ_1844_05_17_4_object_459947.png?auth=9297ab3dee72a9e6576d81101fe89722




20 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/24_05_1844/BZZ_1844_05_24_4_object_459960.png?auth=b24405da67cf0fb6f396cd92c02ff27d




21 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/31_05_1844/BZZ_1844_05_31_4_object_459984.png?auth=fc4833469cbfaabaa3c4b3e7d4d9ca31




22 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/07_06_1844/BZZ_1844_06_07_4_object_460002.png?auth=b2119fd5210185574a04fb20f5ea270e




23 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/14_06_1844/BZZ_1844_06_14_4_object_460021.png?auth=eb5bc3acc0d7d67a6dd2c284702077d4




24 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/21_06_1844/BZZ_1844_06_21_4_object_460035.png?auth=b6f7f3e32b8f368cb80eb95a46de8e96




25 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/28_06_1844/BZZ_1844_06_28_4_object_460049.png?auth=b52d956d5c93a69263d08b67787c53c5




26 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/05_07_1844/BZZ_1844_07_05_4_object_459710.png?auth=156d2c31f2529e946a729860093a6e38




27 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/12_07_1844/BZZ_1844_07_12_4_object_459723.png?auth=0c2c5e9a8c2656df8cdf21bd409bd4bf




28 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/19_07_1844/BZZ_1844_07_19_4_object_459737.png?auth=cca58b5d38fab7bb6ce1e4fd0a62918a




29 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/26_07_1844/BZZ_1844_07_26_4_object_459751.png?auth=1b4f56e6ae9b739c9ee90f39a2c074ad




30 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/02_08_1844/BZZ_1844_08_02_4_object_459770.png?auth=e0f7e8cf97861c0fecafb959fc37e6ac




31 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/09_08_1844/BZZ_1844_08_09_4_object_459789.png?auth=43db956975a4d0a33235e7f08256d36c




32 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/16_08_1844/BZZ_1844_08_16_4_object_459798.png?auth=0f0675c0711eb700d1980134a5738541




33 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/23_08_1844/BZZ_1844_08_23_4_object_459807.png?auth=b86b9c259e38f4fa400b818206e8d565




34 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/30_08_1844/BZZ_1844_08_30_4_object_459821.png?auth=4aeb56b61561766e966aa874b3bb72a3




35 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/06_09_1844/BZZ_1844_09_06_4_object_459845.png?auth=2bd5cd0e14cc4660f49b2d2b3818d77d




36 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/13_09_1844/BZZ_1844_09_13_4_object_459873.png?auth=210766091aa5cf322ee8c2220fbb5e73




37 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/20_09_1844/BZZ_1844_09_20_4_object_459544.png?auth=3adf153f05e560eaa0e0dae2946f7387




38 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/27_09_1844/BZZ_1844_09_27_4_object_459563.png?auth=458c628950edf0bce14d884743bbc389




39 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/04_10_1844/BZZ_1844_10_04_4_object_459582.png?auth=43dc63853193b2eba2483c5a17aa019f




40 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/11_10_1844/BZZ_1844_10_11_4_object_459595.png?auth=74efd3c2ff14af97d4960f871f0006e8




41 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/18_10_1844/BZZ_1844_10_18_4_object_459609.png?auth=523cae8c17d5e91eb0a18708391d8308




42 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/25_10_1844/BZZ_1844_10_25_4_object_459623.png?auth=d85d17e2668750d03273177388febe13




43 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/01_11_1844/BZZ_1844_11_01_4_object_2617685.png?auth=15e7fc66f36011a4b6f2d39e5687f48d




44 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/08_11_1844/BZZ_1844_11_08_4_object_2617701.png?auth=8379a1dd679a691ba2ce6d796d364cbc




45 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/15_11_1844/BZZ_1844_11_15_4_object_2617716.png?auth=546a1b636a78581dc5b71a8c6f9f6410




46 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/22_11_1844/BZZ_1844_11_22_4_object_2617731.png?auth=bb7f24a51c052e6304db856102f9395e




47 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/29_11_1844/BZZ_1844_11_29_4_object_2617742.png?auth=8ef05bffe83e795c12f5214a004d0179




48 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/06_12_1844/BZZ_1844_12_06_4_object_459637.png?auth=ab4aea6dca3d9ee9fabb3de0dc7cabfb




49 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/13_12_1844/BZZ_1844_12_13_4_object_459651.png?auth=314412b4e2c95fcf1fed710bc5341259




50 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/20_12_1844/BZZ_1844_12_20_4_object_459665.png?auth=a21f19554a779561cc2d9d8923133d8f




51 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/27_12_1844/BZZ_1844_12_27_4_object_459674.png?auth=34c73e81197ce40865d3b01483000625


Now we have all the download links. We create a folder for the given year and download page 4 of all issues.

In [25]:
# make folder
folder = 'bozner_zeitung_'+str(year_list[year_index])
if not os.path.exists(folder):
    os.mkdir(folder)

In [26]:
# show list
for num in range(len(datelist)):
    print (new_list[num][1], pictures[num])

05.01.1844 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/05_01_1844/BZZ_1844_01_05_4_object_460267.png?auth=be15d03b6659c46595ffb798047b2d8a
12.01.1844 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/12_01_1844/BZZ_1844_01_12_4_object_460274.png?auth=94283663b542fc3ac3c866c5712db754
19.01.1844 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/19_01_1844/BZZ_1844_01_19_4_object_460280.png?auth=39ddfe4d41a886e98c8a7154b32fa7fb
26.01.1844 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/26_01_1844/BZZ_1844_01_26_4_object_460290.png?auth=7d6d99c576bc04fad59c1b233c47111a
02.02.1844 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/02_02_1844/BZZ_1844_02_02_4_object_460304.png?auth=8ee297adf951ea8ea1ea7a2eb593962d
09.02.1844 https://digital.tessmann.it/mediaArchive/media/image/Page/BZZ/1844/09_02_1844/BZZ_1844_02_09_4_object_460062.png?auth=ff3d73e2c1c73a86a5a041ca3b8c4316
16.02.1844 https://digital.t

In [27]:
### testblock; test downloading for a single file here!!!!
num = 0
filename = os.path.join(folder,(str(new_list[num][0])+'.png'))
http = urllib3.PoolManager()
c = urllib3.PoolManager()

with c.request('GET',pictures[0], preload_content=False) as resp, open(filename, 'wb') as out_file:
    shutil.copyfileobj(resp, out_file)

resp.release_conn()  



In [28]:
## Download loop: here we download all the files for the given year
for num in range(len(new_list)):
    filename = os.path.join(folder,(str(new_list[num][0])+'.png'))
    http = urllib3.PoolManager()
    c = urllib3.PoolManager()

    with c.request('GET',pictures[num], preload_content=False) as resp, open(filename, 'wb') as out_file:
        shutil.copyfileobj(resp, out_file)

    resp.release_conn()  



