# Récupération d'images avec Bing

Material for the hackathon ENSAE / BRGM / 2018. Les images sont extraites de tweets mais sont retweetées sans être retweetées.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

In [2]:
from jyquickhelper import add_notebook_menu
add_notebook_menu()

## Récupération d'images

### Télécharger des images depuis ImageNet

Sources possibles: [ImageNet](http://www.image-net.org/)

In [3]:
from ensae_projects.hackathon.image_helper import stream_download_images
dest_folder = "c:/temp/suricatenat_images/imagenet7"
res = list(stream_download_images("c:/temp/suricatenat_images/imagenet.synset7.txt", 
                                  dest_folder, fLOG=print, skip=100))
len(res)

### Télécharger des images depuis Bing

On peut s'inscrire sur l'[API Bing](https://azure.microsoft.com/fr-fr/services/cognitive-services/bing-image-search-api/) ou télécharger quelques images. Les exemples suivant ont été récupérant en sauvant la page [inondations 2016](https://www.bing.com/images/search?q=inondations%202016&qs=n&form=QBIR&sp=-1&pq=inondations%20201&sc=8-15&sk=&cvid=4C8CC7DB41C84CF2824790956E443756) après avoir fait défiler les résultats pour en afficher beaucoup.

In [4]:
from ensae_projects.hackathon.web_search_helper import extract_bing_result
urls = extract_bing_result("c:/temp/suricatenat_clean/urls/small bridge france - Bing images.html")
print(len(urls))
urls[:5]

735


['https://d2v9y0dukr6mq2.cloudfront.net/video/thumbnail/NyjYWBnugijqylcxa/videoblocks-eiffel-tower-sunrise-timelapse-with-boats-on-seine-river-and-in-paris-france-view-from-grenelle-bridge_sohfmyy2w_thumbnail-small01.jpg',
 'https://frank.itlab.us/france_2005/small_france/dsc_1739.jpg',
 'http://www.nickbooth.id.au/Europe11/images/Honfleur/H-reflects.jpg',
 'http://www.tauck.com/~/media/Images/Brand+Landing/Bridges/Bridges+Home+page+Banner/BridgesHome_ZS.ashx',
 'https://frank.itlab.us/france_2005/small/jul_23_paris_eurosat.jpg']

### Télécharger des images depuis une liste d'urls

In [5]:
from ensae_projects.hackathon.image_helper import stream_download_images
got = list(stream_download_images(urls, dest_folder="c:/temp/suricatenat_clean/bing/", fLOG=print))
got[:5]

[stream_download_images] ... 1/735: 'https://d2v9y0dukr6mq2.cloudfront.net/video/thumbnail/NyjYWBnugijqylcxa/videoblocks-eiffel-tower-sunrise-timelapse-with-boats-on-seine-river-and-in-paris-france-view-from-grenelle-bridge_sohfmyy2w_thumbnail-small01.jpg'
[stream_download_images] error 403 for url 'http://www.nickbooth.id.au/Europe11/images/Honfleur/H-reflects.jpg'.
[stream_download_images] error 404 for url 'http://www.tauck.com/~/media/Images/Brand+Landing/Bridges/Bridges+Home+page+Banner/BridgesHome_ZS.ashx'.
[stream_download_images] wrong filename for url 'https://a0.muscache.com/im/pictures/0d274072-bb89-45d2-9188-20b9535a36c4.jpg?aki_policy=x_large'.
[stream_download_images] wrong filename for url 'https://laughingsquid.com/wp-content/uploads/2017/09/a-logging-truck-turns-and-impressively-crosses-a-small-bridge-in-france-with-a-full-load.png?w=750'.
[stream_download_images] error 503 for url 'http://stuffpoint.com/bridges-architectural-wonders-around-the-world/image/406101-bridg

[stream_download_images] cannot load image for url 'http://www.mountain-lifestyle.com/images/graphics/small/ski-area-planards-pistes-winter-chamonix-01.jpg' due to cannot identify image file <_io.BytesIO object at 0x000001F812D22990>
[stream_download_images] wrong filename for url 'http://il4.picdn.net/shutterstock/videos/4559468/thumb/1.jpg?i10c=img.resize(height:160)'.
[stream_download_images] wrong filename for url 'https://hikeminded.files.wordpress.com/2018/11/dsc_0240.jpg?w=1200'.
[stream_download_images] error 404 for url 'http://www.gagnac-sur-cere.com/photos/Millau+Viaduct.jpg'.
[stream_download_images] ... 501/735: 'https://i.pinimg.com/736x/95/7f/d3/957fd3418f8bf8bfc09dd17eaac063c5--garden-ponds-koi-ponds.jpg'
[stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/flat/weekender-tote-bag/images-medium-5/2-valentre-bridge-in-cahors-france-elena-elisseeva.jpg?transparent=0&targetx=0&targety=-328&imagewidth=779&imageheight=1163&

['videoblocks-eiffel-tower-sunrise-timelapse-with-boats-on-seine-river-and-in-paris-france-view-from-grenelle-bridge_sohfmyy2w_thumbnail-small01.jpg',
 'dsc_1739.jpg',
 'jul_23_paris_eurosat.jpg',
 'Bridge_Stowe_Landscape_Gardens_BY_ROBERT_KILPIN.jpg',
 'falais_03.jpg']

### Echantillon aléatoire

In [6]:
from ensae_projects.hackathon.image_helper import last_element, stream_random_sample
rnd = last_element(stream_random_sample("c:/temp/suricatenat_images/"))
rnd[:5]

In [7]:
len(rnd)

In [8]:
from ensae_projects.hackathon.image_helper import plot_gallery_random_images
ax, rnd = plot_gallery_random_images("c:/temp/suricatenat_images/")
ax;

In [9]:
rnd

In [10]:
ax, rnd = plot_gallery_random_images("c:/temp/suricatenat_images/imagenet4/")
ax;

In [11]:
rnd

### Renommer les images de l'échantillon

In [12]:
from ensae_projects.hackathon.image_helper import enumerate_image_class
ech = list(enumerate_image_class("c:/temp/sample_labelled/"))

In [13]:
len(ech)

309

In [14]:
import os
echnames = set(os.path.split(n[0])[-1] for n in ech)
list(sorted(echnames))[:5]

['1051720569702555648_Dph2SGiXUAAImpp.jpg',
 '1051787907109974016_Dpizg3kW4AAdpaM.jpg',
 '1051812613062098946_DpjJ5NWXUAAtfeR.jpg',
 '1051914579549282309_DpkmtpOX4AAunhI.jpg',
 '106994_5349_big_200907_voyager11.jpg']

In [15]:
images = list(enumerate_image_class("c:/temp/suricatenat_clean/"))

In [16]:
rename = []
for img, sub in images:
    name = os.path.split(img)[-1]
    if name in echnames:
        rename.append(img)  
rename[:5]

['c:/temp/suricatenat_clean/bing\\1200px-Chateau_de_Chenonceau_2008E.jpg',
 'c:/temp/suricatenat_clean/bing\\1970699-presse-papiers-1.jpg',
 'c:/temp/suricatenat_clean/bing\\2243060331_6496d792a2_o_d.jpg',
 'c:/temp/suricatenat_clean/bing\\28080031282_6d00c773d6_b.jpg',
 'c:/temp/suricatenat_clean/bing\\28183949335_04578a6bdd_b.jpg']

In [17]:
len(rename)

306

In [18]:
for img in rename:
    os.rename(img + "~", img + ".ech")

### Split train test

In [19]:
from ensae_projects.hackathon.image_helper import folder_split_train_test
tr, te = folder_split_train_test("c:/temp/sample_labelled/",
                                 "c:/temp/sample_labelled_train/",
                                 "c:/temp/sample_labelled_test/",
                                 test_size=0.66)
len(tr), len(te)

(105, 204)