# GoogleCloud_API

Using Google Cloud Vision API and based on the images provided by the Yelp API we are able structure new data that could enrich our analysis, in particular:
- Labels: Google Cloud recognition uses image recognition algorithms to extract labels based on elements of the picture.
- Colors: Fundamental colors that appear on the picture.

Here fundamentally:
1. The connection to the google API is made with the google.cloud library.
2. Using the API methods we extract labels and color properties.
3. Cleaning and structuring of the data response obtained

## Google API Conection

For the use of the google API library It is needed to download Google Cloud Software, create an account, start a google project, hire the google vision services, initialize the software on the shell and conect to your own account and credentials. 

In [174]:
import pandas as pd
import numpy as np
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision

In [2]:
photos = pd.read_csv('Ad_image.csv')

In [3]:
photos.head()

Unnamed: 0,image_url
0,https://s3-media1.fl.yelpcdn.com/bphoto/SKAdDh...
1,https://s3-media2.fl.yelpcdn.com/bphoto/DLmidv...
2,https://s3-media4.fl.yelpcdn.com/bphoto/weedog...
3,https://s3-media1.fl.yelpcdn.com/bphoto/yzyTg5...
4,https://s3-media2.fl.yelpcdn.com/bphoto/tbGiSu...


In [4]:
photos.iloc[0]

image_url    https://s3-media1.fl.yelpcdn.com/bphoto/SKAdDh...
Name: 0, dtype: object

In [175]:
# Instantiates a client
vision_client = vision.Client()

## Label Extraction

Funtion for the label extraction of each picture.

In [6]:
photos.iloc[3].values[0]

'https://s3-media1.fl.yelpcdn.com/bphoto/yzyTg5QidNVxlEqH1BXOXg/o.jpg'

In [7]:
def detect_labels(pic):
    try:
        pic_open = pic
        #Transform the picture to a google image object
        googleimg = vision_client.image(source_uri=pic_open)
        #Get the labels from the object
        labels = googleimg.detect_labels()
        #put the labels into a list
        labellist = []
        for label in labels:
            labellist.append(label.description)
        return labellist
    except:
        return "Error in detect_labels"

Get the labels of all the pictures

In [8]:
#With lambda function:
#photos['Labels'] = photos['image_url'][0:].apply(lambda x: detect_labels(x))

In [9]:
photos ['Labels'] = range(len(photos))

In [10]:
#with a loop
for i in range(2999):
    try:
        photos ['Labels'] [i] = detect_labels(photos['image_url'][i])
    except:
        print "Error"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [11]:
photos.head()

Unnamed: 0,image_url,Labels
0,https://s3-media1.fl.yelpcdn.com/bphoto/SKAdDh...,"[table, drink, lighting, shape, material, drum]"
1,https://s3-media2.fl.yelpcdn.com/bphoto/DLmidv...,"[dish, food, meal, dinner, restaurant, cuisine..."
2,https://s3-media4.fl.yelpcdn.com/bphoto/weedog...,"[dish, food, meal, produce, cuisine, vegetable..."
3,https://s3-media1.fl.yelpcdn.com/bphoto/yzyTg5...,"[modern art, art gallery, exhibition, picture ..."
4,https://s3-media2.fl.yelpcdn.com/bphoto/tbGiSu...,"[transport, road, city, town, urban area, neig..."


## Color Extraction

Function for the color extraction of each picture

Query the color properties of google

In [12]:
def piture_properties(image):
    properties = image.detect_properties()
    frame = []
    #the result of the query is a dictionary that we query and join to get a frame
    for color in properties.colors:
        #Adding together all colors in one list
        c = [format(color.pixel_fraction), format(color.color.red), format(color.color.green), format(color.color.blue), format(color.color.alpha)]
        frame.append(c)
    #Making the colors list a dataframe, transforming it to float and sorting it by the mos common one.
    color_frame = pd.DataFrame(frame, columns= ['pixel_fraction', 'red', 'green', 'blue', 'alpha'])
    color_frame = color_frame.applymap(lambda x: float(x))
    color_frame.sort_values('pixel_fraction', ascending=False, inplace=True)
    color_frame.reset_index(drop=True, inplace=True)
    return color_frame

This function will return a frame with the most comon colors in the picture we are analysing.

A frame like the one we are seeing underneath which bring the top ten colors by the fraction of pixels that the represent.

In [97]:
color_frame = piture_properties(vision_client.image(source_uri=photos.iloc[0].values[0]))

In [98]:
color_frame

Unnamed: 0,pixel_fraction,red,green,blue,alpha
0,0.291338,66.0,60.0,58.0,0.0
1,0.080447,109.0,85.0,67.0,0.0
2,0.04198,182.0,141.0,100.0,0.0
3,0.035269,132.0,106.0,84.0,0.0
4,0.033062,179.0,144.0,102.0,0.0
5,0.023197,181.0,141.0,107.0,0.0
6,0.020269,250.0,223.0,197.0,0.0
7,0.019684,144.0,109.0,85.0,0.0
8,0.006306,164.0,121.0,80.0,0.0
9,0.000586,185.0,96.0,37.0,0.0


As a last step for the most color extraction and with this dataframe we have now obtained, we make a fuction that summerizes the most important color information. Bringing two summary statistics:

1. The mean color adding up 40% of the pixels.
2. The most common color.

In [146]:
def colorsum(df):
    i = 0
    redsum = 0
    greensum = 0
    bluesum = 0
    pix_sum = 0
    while pix_sum < 0.4 and i < 7:
        pix_sum += df.pixel_fraction[i]
        redsum = redsum + df.red[i]*df.pixel_fraction[i]
        greensum = greensum + df.green[i]*df.pixel_fraction[i]
        bluesum = bluesum + df.blue[i]*df.pixel_fraction[i]
        i += 1
    return pd.DataFrame([[redsum, greensum, bluesum,df.pixel_fraction[0],df.red[0],df.green[0],df.blue[0]]],
                        columns= ['Red', 'Green', 'Blue','TOP Pixel_prop','TOP Red','TOP Green','TOP Blue'])

In [183]:
color_carac = pd.DataFrame()

In [184]:
for i in range(len(photos)):
    try:
        nr = pd.DataFrame(colorsum(piture_properties(vision_client.image(source_uri=photos.iloc[i].values[0]))))
        color_carac = pd.concat([color_carac, nr], axis=0)
    except:
        nr = pd.DataFrame([['no picture','no picture','no picture','no picture','no picture','no picture','no picture']],
                          columns= ['Red', 'Green', 'Blue','TOP Pixel_prop','TOP Red','TOP Green','TOP Blue'])
        color_carac = pd.concat([color_carac, nr], axis=0)

In [185]:
color_carac.head()

Unnamed: 0,Red,Green,Blue,TOP Pixel_prop,TOP Red,TOP Green,TOP Blue
0,35.6374,30.2375,26.4856,0.291338,66,60,58
0,58.7311,42.5261,24.0796,0.203647,146,114,68
0,102.41,97.7666,89.2599,0.305369,200,190,168
0,72.3724,64.6825,57.8979,0.276127,225,204,180
0,25.271,24.5523,25.3966,0.262404,83,81,82


In [193]:
photos.shape

(3000, 2)

In [194]:
color_carac.shape

(3000, 7)

In [195]:
color_carac.reset_index(drop=True, inplace= True)

In [196]:
image_properties = pd.concat([photos, color_carac], axis = 1)

In [197]:
image_properties.tail()

Unnamed: 0,image_url,Labels,Red,Green,Blue,TOP Pixel_prop,TOP Red,TOP Green,TOP Blue
2995,https://s3-media4.fl.yelpcdn.com/bphoto/JIGZTo...,"[dish, food, cuisine, meal, breakfast, produce...",66.6043,56.6967,48.8742,0.132371,216,190,165
2996,https://s3-media3.fl.yelpcdn.com/bphoto/n6TRCA...,"[drink, alcoholic beverage, beer, food, produc...",54.6979,34.0199,20.9528,0.205717,149,69,14
2997,https://s3-media4.fl.yelpcdn.com/bphoto/PeiZV8...,"[sign, signage, advertising, street sign, rest...",56.6479,30.8364,30.5406,0.0781999,148,64,65
2998,https://s3-media1.fl.yelpcdn.com/bphoto/4HQ1p1...,"[text, font, brand, logo, number]",27.593,27.593,27.593,0.788372,35,35,35
2999,https://s3-media1.fl.yelpcdn.com/bphoto/7tFObd...,2999,68.6406,69.5822,70.204,0.284357,187,195,200


In [199]:
image_properties.to_csv('./image_properties.csv', index=False)