# Welcome to the DFW Python Meetup - Dec 5, 2019

# My talk today is focused on how we as Python Developers can use the various ML tools available to us to build more complex and insightful models.
---------

## Quick recap of what exactly is ML:
## "Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task."
- https://en.wikipedia.org/wiki/Machine_learning


# To state this in another way, we are using math and statistics with a computer to make predictions about data
<img src="ml.gif" width="500"/>

# I will be using the tools developed by Google. 
# Inspiration for this talk is based on a presentation that I saw at PyTexas 2019 -  Austin, TX
<img src="pytexas_2019.png" width="400"/>
<img src="google-cloud-platform-logo.jpg" width="400"/>

In [None]:
import sys
sys.version

# Import Dependencies/Packages

In [None]:
import os
import io
import base64
import json
import pandas as pd
import seaborn as sns
from bs4 import BeautifulSoup
import requests
from google.cloud import pubsub_v1
from google.cloud import translate_v2 as translate
from google.cloud import storage
from google.cloud import vision
from google.cloud import language
from google.cloud.vision import types as v_types
from google.cloud.language import enums
from google.cloud.language import types
## Google Packages Pip Installs:
# pip install google-cloud
# pip install google-cloud-pubsub
# pip install google-cloud-translate
# pip install google-cloud-storage
# pip install google-cloud-vision
# pip install google-cloud-language

In [None]:
# Python google cloud packages/versions as of presentation
# google-cloud==0.34.0
# google-cloud-core==1.0.3
# google-cloud-language==1.3.0
# google-cloud-pubsub==1.0.2
# google-cloud-storage==1.22.0
# google-cloud-translate==1.7.0
# google-cloud-vision==0.39.0

In [None]:
%%HTML
<h1 style="color:green;">To use Google's ML API's, you will need a google cloud account and credentials.</h1>

https://cloud.google.com/products/ai/

In [None]:
CURRENT_DIR = os.getcwd()
GOOGLE_CRED_FILE = "image-to-sentiment-9462fc808e0f.json"
GOOGLE_CRED_FILE_PATH = os.path.join(CURRENT_DIR, GOOGLE_CRED_FILE)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = GOOGLE_CRED_FILE_PATH

In [None]:
## Instantiate Necessary Google Vision and other Google Cloud Objects
vision_client = vision.ImageAnnotatorClient()
# translate_client = translate.Client()
translate_client = translate.Client()
publisher = pubsub_v1.PublisherClient()
storage_client = storage.Client()
language_client = language.LanguageServiceClient()

with open('config.json') as f:
    data = f.read()
config = json.loads(data)

## Let's start with a basic image. I won't tell you nor Google what it is. Let's call it "something".

In [None]:
some_image = os.path.join(CURRENT_DIR, "something.jpg")

In [None]:
# The name of the image file to annotate
file_name = some_image

# Load the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = v_types.Image(content=content)

# Performs label detection on the image file
response = vision_client.label_detection(image=image)
labels = response.label_annotations

print('Labels:')
for label in labels:
    print(label.description)

## Google Vision is telling us that the image is a cat. Well Let's see how it did?

In [None]:
%%HTML
<img src="something.jpg" width="400"/>

## Great Job Google! Well let's try another picture...

In [None]:
some_image2 = os.path.join(CURRENT_DIR, "some_image_2.jpg")

file_name = some_image2

with io.open(file_name, 'rb') as image_file:
    content = image_file.read()
    image = v_types.Image(content=content)
    
response = vision_client.label_detection(image=image)
labels = response.label_annotations

print('Labels:')
for label in labels:
    print(label.description)

# Google Vision is telling us that the image is a dog. Well is it?

In [None]:
%%HTML
<img src="some_image_2.jpg" width="600" style="transform:rotate(90deg);"/>

# Yep, my pet dog called Duke.
# Well Google was right again!
# Noticed how I tried to trick the algorithm by introducing noise of other animals in the background but it was able to still determine the main image.
# Hmm, how good is Google vision on reading images with text...?

In [None]:
text_image = os.path.join(CURRENT_DIR, "text_image_1.jpg")


def detect_document(path):
    """Detects document features in an image."""
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print(f'\nBlock confidence: {block.confidence}\n')

            for paragraph in block.paragraphs:
                print(f'Paragraph confidence: {paragraph.confidence}')
                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print(f'Word text: {word_text} (confidence: {word.confidence}')

                    for symbol in word.symbols:
                        print(f'\tSymbol: {symbol.text} (confidence: {symbol.confidence}')
                        
# Call function to detect document                        
detect_document(text_image)

## Well I saw a lot of 99%'s, it seems Google Vision is pretty confident of it's abilities! Let's see this again but with just the text


In [None]:
def detect_document_text(path):
    """Detects document features in an image."""
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print(f'Word text: {word_text}')
                    
detect_document_text(text_image)

## Can anyone guess what the image was?

In [None]:
%%HTML
<img src="text_image_1.jpg" width="600" style="transform:rotate(90deg);"/>

## Wow, Google Vision got it right again!

# Okay, Google is too smart for this simple ML stuff. Let's really try to trick it now!
# Let's give it an image of some foreign language that is not commonly spoken.

In [None]:
text_image_2 = os.path.join(CURRENT_DIR, "IMG_0155.png")

In [None]:
%%HTML
<img src="IMG_0155.png" width="600"/>

In [None]:
client = vision.ImageAnnotatorClient()

with io.open(text_image_2, 'rb') as image_file:
    content = image_file.read()

image = vision.types.Image(content=content)

response = client.document_text_detection(image=image)
document = response.full_text_annotation

for page in response.full_text_annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    en_word = translate_client.translate(word_text, target_language="en")
                    en_word = en_word['translatedText']
                    print('Word text: {} = {}'.format(
                        word_text, en_word))

## Let's see this annotation of text in a more readable format

In [None]:
list_of_words = []
for page in response.full_text_annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    en_word = translate_client.translate(word_text, target_language="en")
                    en_word = en_word['translatedText']
                    list_of_words.append(en_word)
translated_text = ' '.join(list_of_words)
print(translated_text)

## Okay, so Google can translate a picture into text, then translate Icelandic into English! 
## Now that is cool!

https://www.frettabladid.is/sport/liverpool-vann-i-fjorugum-grannaslag/

## But wait, can Google tells us the sentiment of this article?

In [None]:
document = types.Document(content=translated_text, type=enums.Document.Type.PLAIN_TEXT)
annotations = language_client.analyze_sentiment(document=document)

def print_result(annotations):
    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        print('Sentence {} has a sentiment score of {}'.format(
            index, sentence_sentiment))

    print('Overall Sentiment: score of {} with magnitude of {}'.format(
        score, magnitude))
    
print_result(annotations=annotations)

# Yes, Google Language can!

# Well maybe we are hundred years from now and find some ancient texts laying around. We take a picture on our future device and see what it can tell us?

In [None]:
ancient_text_image = os.path.join(CURRENT_DIR, "ancient_text2.jpg")

In [None]:
%%HTML
<img src="ancient_text2.jpg" width="600" style="transform:rotate(90deg);"/>

In [None]:
client = vision.ImageAnnotatorClient()

with io.open(ancient_text_image, 'rb') as image_file:
    content = image_file.read()

image = vision.types.Image(content=content)

response = client.document_text_detection(image=image)
document = response.full_text_annotation

for page in response.full_text_annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    en_word = translate_client.translate(word_text, target_language="en")
                    en_word = en_word['translatedText']
                    print('Word text: {} = {} | Confidence: {}'.format(
                        word_text, en_word, word.confidence))

## In a more readable format...

In [None]:
list_of_words = []
for page in response.full_text_annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    en_word = translate_client.translate(word_text, target_language="en")
                    en_word = en_word['translatedText']
                    list_of_words.append(en_word)
translated_text = ' '.join(list_of_words)
print(translated_text)

## Wow, so Google ML could potentially help us with archaelogy research...?!

# Okay, well that is all cool and fine but my boss wants me to deliver business products with results. I mean like show me the numbers. How can Python &  Google ML help me?
<img src="show_me_numbers.gif" width="600"/>

## Okay, so we are asked right before lunch where is a good place to go to eat around here for some of our visitors. 

## A simple Google search won't do - we are data scientist, right?

## 1) Let's scrape a website, 2) clean the html data, 3) convert it to a dataframe, 4) perform some ML sentiment analysis and then 5) display it as a histogram. We can then email this off to our boss and really impress him.

In [None]:
url_to_scrape = "https://www.tripadvisor.com/Restaurant_Review-g56032-d2039635-Reviews-Our_Place_Indian_Cuisine-Irving_Texas.html#REVIEWS"

In [None]:
r  = requests.get(url_to_scrape)

data = r.text

soup = BeautifulSoup(data)

## Well here is our first view of the html data:

In [None]:
print(data)

## Well that is not usable, so let's try using a quick "for loop" to get something that is human readable...

In [None]:
review_list = []
counter = 0
for review in soup.find_all('p'):
    counter += 1
    review_list.append(review.text)
    print(counter, review.text)
    
review_text = ' '.join(review_list)

## Well, so we can scrape a website, but what does that really mean? Let's get something useful like the sentiment of these reviews? 

## Should we take our visiting guest there?

In [None]:
reviews_sentiment_dict = {}
def print_result(annotations):
    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        print('Sentence {} has a sentiment score of {}'.format(
            index, sentence_sentiment))
        reviews_sentiment_dict[index] = sentence_sentiment

    print('Overall Sentiment: score of {} with magnitude of {}'.format(
        score, magnitude))
    
document = types.Document(content=review_text, type=enums.Document.Type.PLAIN_TEXT)
annotations = language_client.analyze_sentiment(document=document)
print_result(annotations=annotations)

# Well this can't really be a Data Science talk without having a dataframe and chart, can it? 
## What does this data mean really?

In [None]:
sentiment_df = pd.DataFrame.from_dict(reviews_sentiment_dict, orient='index', columns=['sentiment_score'])
sentiment_df

In [None]:
sns.distplot(sentiment_df, kde=False, bins=5, rug=True);

# Well it looks like overall people like this restaurant, so I think we can recommend it.
<img src="emoji_thumbs_up.jpg" width="300"/>

--------
# Hopefully you got inspired with some ideas with working with ML

# Here is the github link of this talk:
https://github.com/jcamier/dfw_google_vision_talk

# Thank you!