[View in Colaboratory](https://colab.research.google.com/github/sungchun12/image-labeling-and-translation-data-analysis-Google-Cloud/blob/master/LabelsAcrossBorders.ipynb)


# **Labels Across Borders**

**By: **Sung Won Chung

**LinkedIn: **https://www.linkedin.com/in/sungwonchung1/

**Vision: ** I made this because demystifying programming, machine learning, and every other buzzword under the sun is the name of the game. 

Whoever's reading this, I hope you feel more liberated to explore all the cutting edge tech freely available and accessible literally at your fingertips. 

**Use Case: **Label images and share end results in both English and foreign language

**Technologies:** Colaboratory, Cloud Vision API, Cloud Translate API, Bigquery

**Languages: **Python, SQL

**Order of Operations: **


*  Import packages and API in Colaboratory
*  Get images from: https://bigquery.cloud.google.com/table/bigquery-public-data:the_met.images?tab=preview
*  Label images
*  Translate results into another language(Spanish)
*  Export into BigQuery for SQL analysis
*  Share end results with someone who speaks another language(most important part!)


**Reference:** https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/CPB100/lab4c/mlapis.ipynb





#**Import Packages**



*   Enable FREE access to Colaboratory: https://colab.research.google.com
*   Enable Google Cloud APIs: Cloud Vision API, Cloud Translate API, Bigquery
*   Set up the access credentials and packages necessary. These essentially make the code lightweight
*   Extra packages are added for other analysis



In [0]:
#@title Credentials { run: "auto", display-mode: "both" }

# Replace with your API key
APIKEY = "REPLACE" #@param {type:"string"} 

#Replace wth your project ID
project_id = "demos-sung" #@param {type:"string"}

#setup authentication for GCP project and APIs
from google.colab import auth #authorize colaboratory Google Cloud access
auth.authenticate_user()

In [9]:
#import custom packages for analysis

import pandas as pd #package for dataframes
import numpy as np
import argparse
import io
import base64 #encodes images to pass through Cloud Vision API
from IPython.display import Image #package for displaying images
from IPython.core.display import HTML #package for displaying url images
!pip install --upgrade google-api-python-client #install necessary api functionality
from googleapiclient.discovery import build #import api function


Looking in indexes: https://pypi.org/simple, https://legacy.pypi.org/simple
Requirement already up-to-date: google-api-python-client in /usr/local/lib/python2.7/dist-packages (1.6.6)
Requirement not upgraded as not directly required: six<2dev,>=1.6.1 in /usr/local/lib/python2.7/dist-packages (from google-api-python-client) (1.11.0)
Requirement not upgraded as not directly required: httplib2<1dev,>=0.9.2 in /usr/local/lib/python2.7/dist-packages (from google-api-python-client) (0.11.3)
Requirement not upgraded as not directly required: oauth2client<5.0.0dev,>=1.5.0 in /usr/local/lib/python2.7/dist-packages (from google-api-python-client) (4.1.2)
Requirement not upgraded as not directly required: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python2.7/dist-packages (from google-api-python-client) (3.0.0)
Requirement not upgraded as not directly required: rsa>=3.1.4 in /usr/local/lib/python2.7/dist-packages (from oauth2client<5.0.0dev,>=1.5.0->google-api-python-client) (3.4.2)
Requirement no

# Import Data


*   Extract sample data from BigQuery into pandas dataframe



In [10]:
#sample the image library within BigQuery public dataset and print row count
#ability to write SQL queries 

sample_count = 10
row_count = pd.io.gbq.read_gbq('''
  SELECT 
    COUNT(*) as total
  FROM [bigquery-public-data:open_images.images]''', project_id=project_id, verbose=False).total[0]

df = pd.io.gbq.read_gbq('''
  SELECT
    *
  FROM
    [bigquery-public-data:open_images.images]
  WHERE RAND() < %d/%d
''' % (sample_count, row_count), project_id=project_id, verbose=False)

print('Full dataset has %d rows' % row_count)

Full dataset has 9178275 rows


In [11]:
#show sample data

df

Unnamed: 0,image_id,subset,original_url,original_landing_url,license,author_profile_url,author,title,original_size,original_md5,thumbnail_300k_url
0,8e352b4e13bb97cc,train,https://c5.staticflickr.com/8/7005/6843277839_...,https://www.flickr.com/photos/jodastephen/6843...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/jodastephen/,Stephen Colebourne,Venice,2549790,rg6jFPICH0lzwJa+TBb7Jw==,https://c2.staticflickr.com/8/7005/6843277839_...
1,342cea1b9508e2e7,train,https://farm8.staticflickr.com/2079/2513092437...,https://www.flickr.com/photos/jrover/2513092437,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/jrover/,Jeremy Rover,Diving the Canyon,312222,fp9BMt4meO6XXJCunh6yeA==,https://c3.staticflickr.com/3/2079/2513092437_...
2,fd78e8da277df7f6,train,https://farm7.staticflickr.com/6225/6374789629...,https://www.flickr.com/photos/franciscojgonzal...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/franciscojgonzalez/,Francisco Gonzalez,Tour Eiffel et Sacré Coeur au coucher du soleil,1977486,mc1AT1hGbd+WuYoOjPP30w==,https://c2.staticflickr.com/7/6225/6374789629_...
3,fb4e7c677293a9ad,train,https://c6.staticflickr.com/1/44/149484582_c93...,https://www.flickr.com/photos/pallo/149484582,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/pallo/,PålLøberg,Big logo in the lobby,1357592,C5/m7r1MSAQvKKf+Oic+uw==,https://c3.staticflickr.com/1/44/149484582_c93...
4,c17a47bc70c6a5de,train,https://c7.staticflickr.com/2/1008/542976933_6...,https://www.flickr.com/photos/jmctee/542976933,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/jmctee/,John McTarnaghan,Skyline Drive Va,918713,7Brm7waAdbwlp6QtxlTw1A==,https://c7.staticflickr.com/2/1008/542976933_4...
5,4d90078b55ed5353,train,https://farm4.staticflickr.com/7649/1684106346...,https://www.flickr.com/photos/shehal/16841063460,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/shehal/,Shehal Joseph,weekend cooking project with @gayeshaw,65790,kky6flnktVCVKrR5qOVGlw==,https://c3.staticflickr.com/8/7649/16841063460...
6,b6926a434a945555,train,https://farm6.staticflickr.com/3830/9680643729...,https://www.flickr.com/photos/dumbledad/968064...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/dumbledad/,Tim Regan,Fiona Kitchman & Sarah Morehead's altitude jac...,4465171,+VhDNUmh8WSas1X13dDi2g==,https://c5.staticflickr.com/4/3830/9680643729_...
7,93db5501a66e3abc,train,https://c5.staticflickr.com/9/8029/8044767268_...,https://www.flickr.com/photos/snowpeak/8044767268,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/snowpeak/,John Fowler,The Ladigue twins,562585,4YvSKAs4D490DZQHeRlNgQ==,https://c5.staticflickr.com/9/8029/8044767268_...
8,670b7e0ab8321d3a,train,https://c3.staticflickr.com/9/8500/8373552555_...,https://www.flickr.com/photos/squirmy21/837355...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/squirmy21/,SquirmyBeluga,crystal ball,4515184,TBoHKXv25zgJM+/Cf15tYw==,https://c1.staticflickr.com/9/8500/8373552555_...
9,454df3b1588a8f3d,train,https://c6.staticflickr.com/7/6169/6174709881_...,https://www.flickr.com/photos/circulaseguro/61...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/circulaseguro/,Circula Seguro,Renault 4 en rotonda,296112,6GtZccSsKdUrkQX7i6RkPg==,https://c6.staticflickr.com/7/6169/6174709881_...


# Invoke Cloud Vision API

In [12]:
#demo image-this one's for all the hypebeasts out there!
#helpful link: https://stackoverflow.com/questions/32370281/how-to-include-image-or-picture-in-jupyter-notebook
Image(url= "https://www.purseblog.com/images/2017/01/Louis-Vuitton-Supreme-Bags-Fall-2017-5.jpg", width=500, height=350)

In [0]:
#Define function to run label detection based on image url
#Encodes image as ASCII text based on base64 module
#displays 5 labels
def label_detection_example(url): 
  vservice = build('vision', 'v1', developerKey=APIKEY)
  request = vservice.images().annotate(body={
          'requests': [{
                  'image': {
                      'source': {
                          'image_uri': url
                      }
                  },
                  'features': [{
                      'type': 'LABEL_DETECTION', #replace with LOGO_DETECTION,FACE_DETECTION, etc.
                      'maxResults': 5,
                  }]
              }],
          })
  responses = request.execute(num_retries=3)
  return responses

In [14]:
#display list of image labels with newly built function
#term definitions: https://cloud.google.com/vision/docs/reference/rest/v1/images/annotate

IMAGE="https://www.purseblog.com/images/2017/01/Louis-Vuitton-Supreme-Bags-Fall-2017-5.jpg"

label_detection_example(IMAGE)

{u'responses': [{u'labelAnnotations': [{u'description': u'red',
     u'mid': u'/m/06fvc',
     u'score': 0.974761,
     u'topicality': 0.974761},
    {u'description': u'handbag',
     u'mid': u'/m/080hkjn',
     u'score': 0.96870184,
     u'topicality': 0.96870184},
    {u'description': u'bag',
     u'mid': u'/m/0n5v01m',
     u'score': 0.95867664,
     u'topicality': 0.95867664},
    {u'description': u'product',
     u'mid': u'/m/02n3pb',
     u'score': 0.9035368,
     u'topicality': 0.9035368},
    {u'description': u'fashion accessory',
     u'mid': u'/m/0463sg',
     u'score': 0.8855615,
     u'topicality': 0.8855615}]}]}

# Image Labeling Function

In [0]:
#Define function to run label detection based on table containing image urls
#Encodes image as ASCII text based on base64 module
#prints the plain English responses along with confidence scores
#helpful link: https://stackoverflow.com/questions/25261434/how-to-print-multiple-items-from-dictionary

#empty list to store labels and scores
list=''

def label_detection(url): 
  vservice = build('vision', 'v1', developerKey=APIKEY)
  request = vservice.images().annotate(body={
          'requests': [{
                  'image': {
                      'source': {
                          'image_uri': url
                      }
                  },
                  'features': [{
                      'type': 'LABEL_DETECTION',
                      'maxResults': 5,
                  }]
              }],
          })
  responses = request.execute(num_retries=3)
  list=''
  for x in range(len(responses['responses'][0]['labelAnnotations'])):
    list=list+responses['responses'][0]['labelAnnotations'][x]['description']+'('+str(responses['responses'][0]['labelAnnotations'][x]['score'])+')'+'|'
    
  return list[:-1]

In [16]:
#test label detection function
label_detection('https://www.purseblog.com/images/2017/01/Louis-Vuitton-Supreme-Bags-Fall-2017-5.jpg')

u'red(0.974761)|handbag(0.968702)|bag(0.9586767)|product(0.9035368)|fashion accessory(0.88556147)'

In [0]:
#apply to go through the sample images and put into dataframe
df['ImageLabel'] = df['original_url'].apply(label_detection)

In [18]:
#check the dataframe to ensure the labels are in the new column
df['ImageLabel']

0     waterway(0.9653903)|property(0.9045235)|neighb...
1     scuba diving(0.9854413)|underwater diving(0.98...
2     sky(0.9858208)|afterglow(0.9636433)|dawn(0.952...
3                           interior design(0.65215605)
4     sky(0.962839)|grassland(0.92956024)|mountainou...
5     dessert(0.763286)|biscuit(0.70349556)|baked go...
6     fur(0.79217327)|shoe(0.71800923)|plush(0.53875...
7     rock(0.8242284)|adventure(0.69788873)|geology(...
8                                  organism(0.64027643)
9     car(0.97778934)|road(0.9350879)|residential ar...
10    white(0.9623204)|photograph(0.9575152)|black(0...
Name: ImageLabel, dtype: object

# Invoke Cloud Translate API

In [0]:
# running Translate API
def label_detection_foreign(x):
  service = build('translate', 'v2', developerKey=APIKEY)

  #language parameters
  original_lang='en'
  foreign_lang='es'

  # use the service
  # Loops through all the rows within the respective column
  # print outputs
  inputs = x
  outputs = service.translations().list(source=original_lang, target=foreign_lang, q=inputs).execute()
  for input, output in zip(inputs, outputs['translations']):
    return output['translatedText']

In [20]:
#display an example, some words will appear more clearly when exported to BigQuery
label_detection_foreign(df['ImageLabel'][0])

u'v\xeda fluvial (0.9653903) | propiedad (0.9045235) | vecindario (0.8928297) | ciudad (0.890791) | canal (0.8344499)'

In [21]:
#apply function to all values in English-labeled column
df['ImageLabel_Translated'] = df['ImageLabel'].apply(label_detection_foreign)

df['ImageLabel_Translated'] 

0     vía fluvial (0.9653903) | propiedad (0.9045235...
1     submarinismo (0.9854413) buceo submarino (0.98...
2     sky (0.9858208) | afterglow (0.9636433) | aman...
3                     diseño de interiores (0.65215605)
4     cielo (0.962839) | pradera (0.92956024) | acci...
5     postre (0.763286) | galleta (0.70349556) | pro...
6     piel (0.79217327) | zapato (0.71800923) | pelu...
7     rock (0.8242284) | aventura (0.69788873) | geo...
8                                organismo (0.64027643)
9     automóvil (0.97778934) | camino (0.9350879) | ...
10    blanco (0.9623204) | fotografía (0.9575152) | ...
Name: ImageLabel_Translated, dtype: object

# Finals Results



*   Preview final table with English and Spanish image labels with confidence scores
*   Export to BigQuery



In [22]:
#show final dataframe
df

Unnamed: 0,image_id,subset,original_url,original_landing_url,license,author_profile_url,author,title,original_size,original_md5,thumbnail_300k_url,ImageLabel,ImageLabel_Translated
0,8e352b4e13bb97cc,train,https://c5.staticflickr.com/8/7005/6843277839_...,https://www.flickr.com/photos/jodastephen/6843...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/jodastephen/,Stephen Colebourne,Venice,2549790,rg6jFPICH0lzwJa+TBb7Jw==,https://c2.staticflickr.com/8/7005/6843277839_...,waterway(0.9653903)|property(0.9045235)|neighb...,vía fluvial (0.9653903) | propiedad (0.9045235...
1,342cea1b9508e2e7,train,https://farm8.staticflickr.com/2079/2513092437...,https://www.flickr.com/photos/jrover/2513092437,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/jrover/,Jeremy Rover,Diving the Canyon,312222,fp9BMt4meO6XXJCunh6yeA==,https://c3.staticflickr.com/3/2079/2513092437_...,scuba diving(0.9854413)|underwater diving(0.98...,submarinismo (0.9854413) buceo submarino (0.98...
2,fd78e8da277df7f6,train,https://farm7.staticflickr.com/6225/6374789629...,https://www.flickr.com/photos/franciscojgonzal...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/franciscojgonzalez/,Francisco Gonzalez,Tour Eiffel et Sacré Coeur au coucher du soleil,1977486,mc1AT1hGbd+WuYoOjPP30w==,https://c2.staticflickr.com/7/6225/6374789629_...,sky(0.9858208)|afterglow(0.9636433)|dawn(0.952...,sky (0.9858208) | afterglow (0.9636433) | aman...
3,fb4e7c677293a9ad,train,https://c6.staticflickr.com/1/44/149484582_c93...,https://www.flickr.com/photos/pallo/149484582,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/pallo/,PålLøberg,Big logo in the lobby,1357592,C5/m7r1MSAQvKKf+Oic+uw==,https://c3.staticflickr.com/1/44/149484582_c93...,interior design(0.65215605),diseño de interiores (0.65215605)
4,c17a47bc70c6a5de,train,https://c7.staticflickr.com/2/1008/542976933_6...,https://www.flickr.com/photos/jmctee/542976933,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/jmctee/,John McTarnaghan,Skyline Drive Va,918713,7Brm7waAdbwlp6QtxlTw1A==,https://c7.staticflickr.com/2/1008/542976933_4...,sky(0.962839)|grassland(0.92956024)|mountainou...,cielo (0.962839) | pradera (0.92956024) | acci...
5,4d90078b55ed5353,train,https://farm4.staticflickr.com/7649/1684106346...,https://www.flickr.com/photos/shehal/16841063460,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/shehal/,Shehal Joseph,weekend cooking project with @gayeshaw,65790,kky6flnktVCVKrR5qOVGlw==,https://c3.staticflickr.com/8/7649/16841063460...,dessert(0.763286)|biscuit(0.70349556)|baked go...,postre (0.763286) | galleta (0.70349556) | pro...
6,b6926a434a945555,train,https://farm6.staticflickr.com/3830/9680643729...,https://www.flickr.com/photos/dumbledad/968064...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/dumbledad/,Tim Regan,Fiona Kitchman & Sarah Morehead's altitude jac...,4465171,+VhDNUmh8WSas1X13dDi2g==,https://c5.staticflickr.com/4/3830/9680643729_...,fur(0.79217327)|shoe(0.71800923)|plush(0.53875...,piel (0.79217327) | zapato (0.71800923) | pelu...
7,93db5501a66e3abc,train,https://c5.staticflickr.com/9/8029/8044767268_...,https://www.flickr.com/photos/snowpeak/8044767268,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/snowpeak/,John Fowler,The Ladigue twins,562585,4YvSKAs4D490DZQHeRlNgQ==,https://c5.staticflickr.com/9/8029/8044767268_...,rock(0.8242284)|adventure(0.69788873)|geology(...,rock (0.8242284) | aventura (0.69788873) | geo...
8,670b7e0ab8321d3a,train,https://c3.staticflickr.com/9/8500/8373552555_...,https://www.flickr.com/photos/squirmy21/837355...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/squirmy21/,SquirmyBeluga,crystal ball,4515184,TBoHKXv25zgJM+/Cf15tYw==,https://c1.staticflickr.com/9/8500/8373552555_...,organism(0.64027643),organismo (0.64027643)
9,454df3b1588a8f3d,train,https://c6.staticflickr.com/7/6169/6174709881_...,https://www.flickr.com/photos/circulaseguro/61...,https://creativecommons.org/licenses/by/2.0/,https://www.flickr.com/people/circulaseguro/,Circula Seguro,Renault 4 en rotonda,296112,6GtZccSsKdUrkQX7i6RkPg==,https://c6.staticflickr.com/7/6169/6174709881_...,car(0.97778934)|road(0.9350879)|residential ar...,automóvil (0.97778934) | camino (0.9350879) | ...


In [0]:
#export to BigQuery for data analysis using SQL
#ensure you create a dataset in BigQuery first before exporting
df.to_gbq('colaboratory_demo.demo_data', "demos-sung", chunksize=2000, verbose=True, if_exists='append')