# Elegir el modelo

Existen diversos modelos que podemos elegir de acuerdo al objetivo que tengamos: utilizaremos algoritmos de clasificación, predicción, regresión lineal, clustering (ejemplo k-means ó k-nearest neighbor),  Deep Learning (ej: red neuronal), bayesiano, etc y podrá haber variantes si lo que vamos a procesar son imágenes, sonido, texto, valores numéricos. 

En la siguiente tabla veremos algunos modelos y sus aplicaciones

| Modelo | Aplicaciones (Ejemplo de uso) |
| --------- | --------- |
| Logistic Regression | Predicción de precios de inmuebles |
| Fully connected networks | Clasificación |
| Convolutional Neural Networks | Procesamiento de imágenes para poder encontrar gatitos en las fotos |
| Recurrent Neural Networks | Reconocimiento de Voz |
| Random Forest | Detección de Fraude |
| Reinforcement Learning | Enseñarle a la máquina a jugar videojuegos y vencer! |
| Generative Models | Creación de imágenes |
| K-means Crear | Clusters a partir de datos sin etiquetar. Segmentar audiencias o Inventarios |
| k-Nearest Neighbors | motores de recomendación (por similitud/cercanía) |
| Bayesian Clasifiers | Clasificación de emails: Spam o no |

## Analisis de Sentimiento

Enable: "Google developers console API activation"<br>
  
Cloud Natural Language API<br>
Google Enterprise API<br>
Provides natural language understanding technologies, such as sentiment analysis, entity…<br>

Descripción general<br>
Provides natural language understanding technologies, such as sentiment analysis, entity recognition, entity sentiment analysis, and other text annotations, to developers.

### Se importan las librerias

In [5]:
import pandas as pd
import numpy as np

from google.cloud import language_v2
from google.cloud import bigquery

### Se obtienen los datos para analizar.

In [39]:
client = bigquery.Client()
sql = """
      SELECT yr.review_id, yr.stars, yr.useful, yr.cool, yr.text, yb.categories 
        FROM `eternal-empire-399016.gmy_bq.df_yelp_reviews` AS yr
        INNER JOIN `eternal-empire-399016.gmy_bq.df_yelp_business` AS yb 
          ON yr.business_id = yb.business_id
        LIMIT 1000
      """
df_yelp_reviews_business = client.query(sql).to_dataframe()
df_yelp_reviews_business.head(5)

Unnamed: 0,review_id,stars,useful,cool,text,categories
0,-APjtj_8EjttWsB9R-AXGg,1,20,0,I am so so disappointed with the results I had...,"Medical Spas, Professional Services, Health & ..."
1,lQ2R0SfHo82OYL0UYYJV9Q,1,14,0,3 Crow used to be one of my favorite spots in ...,"Bars, Sandwiches, Dive Bars, Nightlife, Restau..."
2,yOOIOo-cOkNbyWHbMgi3hA,1,19,11,**Update**\n\nI take back everything I said in...,"Shopping, Mobile Phones, Telecommunications, L..."
3,4xunj8khtmWPKM2aX4PyfA,1,8,1,"I have loved Old Town Coffee in the past, but ...","Coffee & Tea, Food, Bakeries, Coffee Roasteries"
4,4CgLDd2VMtdUCmYTbrJcVA,1,7,2,While I completely understand that the Post Of...,"Post Offices, Shipping Centers, Public Service..."


### Información importante

El análisis de opiniones intenta determinar la actitud general (positiva o negativa) y se representa mediante 
valores numéricos de score y magnitude

La siguiente tabla muestra algunos valores de muestra y cómo interpretarlos:

| Opinión | Valores de muestra | Rango usado |
| --------- | --------- | --------- |
| Claramente positiva | "score": 0.8, "magnitude": 3.0 | score > 0.7 |
| Neutral | "score": 0.1, "magnitude": 0.0 | 0.1 < score < 0.7 |
| Mixto | "score": 0.0, "magnitude": 4.0 | -0.5 < score < 0.0 |
| Claramente negativa | "score": -0.6, "magnitude": 4.0 | -0.6 < score |  

* Las opiniones "claramente positivas" y "claramente negativas" varían según los clientes y los casos, Por ejemplo, puedes definir un umbral de cualquier puntuación superior a 0.25 como claramente positivo, y luego modificar el umbral de puntuación a 0.15 después de revisar tus datos y resultados y descubrir que las puntuaciones de 0.15 a 0.25 también deben considerarse positivas.

### Funcion para el analisis de sentimiento

Entrada: text_to_analyze

In [7]:
text_to_analyze = ''

def sample_analyze_sentiment(texto_analizar: str = text_to_analyze) -> None:
    """
    Analyzes Sentiment in a string.

    Args:
      text_content: The text content to analyze.
    """
    print(texto_analizar)
    client = language_v2.LanguageServiceClient()

    # Available types: PLAIN_TEXT, HTML
    document_type_in_plain_text = language_v2.Document.Type.PLAIN_TEXT

    # Optional. If not specified, the language is automatically detected.
    # For list of supported languages:
    # https://cloud.google.com/natural-language/docs/languages
    language_code = "en"
    document = {
        "content": texto_analizar,
        "type_": document_type_in_plain_text,
        "language_code": language_code,
    }
    ##print(document)
    # Available values: NONE, UTF8, UTF16, UTF32
    # See https://cloud.google.com/natural-language/docs/reference/rest/v2/EncodingType.
    encoding_type = language_v2.EncodingType.UTF8

    response = client.analyze_sentiment(
        request={"document": document, "encoding_type": encoding_type}
    )
    
    ##print(response)
    # Get overall sentiment of the input document
    print(f"Document sentiment score: {response.document_sentiment.score}")
    print(f"Document sentiment magnitude: {response.document_sentiment.magnitude}")
    # Get sentiment for all sentences in the document
    for sentence in response.sentences:
        print(f"Sentence text: {sentence.text.content}")
        print(f"Sentence sentiment score: {sentence.sentiment.score}")
        print(f"Sentence sentiment magnitude: {sentence.sentiment.magnitude}")

    # Get the language of the text, which will be the same as
    # the language specified in the request or, if not specified,
    # the automatically-detected language.
    print(f"Language of the text: {response.language_code}")

### Ejemplo sencillo con un solo dato.
Se imprime la informacion y el sentimiento general del texto completo y se analiza cada frase del texto.

In [6]:
sample_analyze_sentiment(df_yelp_reviews_business.iloc[0,4])
##sample_analyze_sentiment("I am so happy and joyful.")

Over the past ten years grocery stores have evolved to include an organic section and it's popularity has sky rocketed. Enter Publix Greenwise market, a new concept from our friends in Lakeland. 
Much like whole foods, sprouts, and others Publix enters the full organic scene with a bang. 
I love the layout, decor and size of this place, it's a great look. It has all you'd expect from a Publix and more. I loved the cheese selection and olive bar. The craft beer selection is awesome and they have a florida section to support local breweries. 
There is a coffee shop up front that looked kinda busy, i should grab a java next time.
I found everything i wanted to craft a great meal and it was market price. 
Will I keep coming back? Probably not, my normal Publix has all the greenwise I need. Is it a cool, trendy new Publix? Yes.
A nice addition to the Lutz/Odessa area.
Document sentiment score: 0.7089999914169312
Document sentiment magnitude: 11.394000053405762
Sentence text: Over the past t

### Funcion para el analisis de sentimiento.
- Entrada:<br> 
    - text_to_analyze: Contiene el texto a analizar.<br>
- Salida:
    - score
    - magnitude
    
| Opinión | Valores de muestra | Rango usado |
| --------- | --------- | --------- |
| Claramente positiva | "score": 0.8, "magnitude": 3.0 | score > 0.7 |
| Neutral | "score": 0.1, "magnitude": 0.0 | 0.1 < score < 0.7 |
| Mixto | "score": 0.0, "magnitude": 4.0 | -0.5 < score < 0.0 |
| Claramente negativa | "score": -0.6, "magnitude": 4.0 | -0.6 < score |    

In [58]:
text_to_analyze = ''

def analisis_sentimento(texto_analizar: str = text_to_analyze) -> None:
    ##print(texto_analizar)
    client = language_v2.LanguageServiceClient()

    # Available types: PLAIN_TEXT, HTML
    document_type_in_plain_text = language_v2.Document.Type.PLAIN_TEXT

    # Optional. If not specified, the language is automatically detected.
    # For list of supported languages:
    # https://cloud.google.com/natural-language/docs/languages
    language_code = "en"
    document = {
        "content": texto_analizar,
        "type_": document_type_in_plain_text,
        "language_code": language_code,
    }
    ##print(document)
    # Available values: NONE, UTF8, UTF16, UTF32
    # See https://cloud.google.com/natural-language/docs/reference/rest/v2/EncodingType.
    encoding_type = language_v2.EncodingType.UTF8

    response = client.analyze_sentiment(
        request={"document": document, "encoding_type": encoding_type}
    )
    score = response.document_sentiment.score
    magnitude = response.document_sentiment.magnitude
    ##print(f"Document sentiment score: {score}")
    ##print(f"Document sentiment magnitude: {magnitude}")
    
#79  ->  0.06700000166893005 0.13300000131130219 
#343  ->  -0.008999999612569809 2.8550000190734863 
#389  ->  0.08100000023841858 1.9700000286102295 
#677  ->  0.05000000074505806 2.1570000648498535 
#810  ->  0.008999999612569809 2.0290000438690186 
#932  ->  -0.004999999888241291 2.871000051498413 
#946  ->  0.08900000154972076 0.9539999961853027 
#969  ->  -0.035999998450279236 1.9010000228881836 
    
    sentimiento = ""
    if ((score > 0.7) & (magnitude > 3.0)):
        sentimiento = "Claramente positiva"
    elif ((score < -0.1) & (magnitude > 0.5)):
        sentimiento = "Claramente negativa"
    elif ((score > -0.1) & (magnitude < 3)):
        sentimiento = "Neutral"
    elif ((score > -0.15) & (magnitude > 3.0)):
        sentimiento = "Mixto"     
    
    return score, magnitude, sentimiento
    #return sentimiento

In [34]:
print(df_yelp_reviews_business.iloc[1,4])
print(analisis_sentimento(df_yelp_reviews_business.iloc[1,4]))

Can I make a suggestion? Close and never open again. Slow service and the servers were rude. They can't split checks because they need that mandatory gratuity. We ordered drinks and waited 15 minutes for them to arrive, and the philly cheese steak is nothing more than bread, cheese, and ground beef. A toddler could cook this dish and have it come out better. Also, the crab cakes are cold and buns are soggy. And when our party got there we were treated as an inconvenience as we were sat in the back corner of the restaurant and barely waited on. 

Just don't go here. Unless you want a headache and shitty 90s music.
Document sentiment score: -0.8069999814033508
Document sentiment magnitude: 8.833000183105469
(-0.8069999814033508, 8.833000183105469, 'Claramente negativa')


In [69]:
##df_yelp_reviews_business['score'] = 0.0
##df_yelp_reviews_business['magnitude'] = 0.0
##df_yelp_reviews_business['sentimiento'] = ''
df = pd.DataFrame()
for i,fila in df_yelp_reviews_business.iterrows():
    score, magnitud, sentimiento = analisis_sentimento(fila['text'])
    ##if(sentimiento == ''):
    ##    print(i, ' -> ', score, magnitud, fila['text'])
    
    # Agregar la columna sentimiento
    ##df_yelp_reviews_business['score'] = score
    ##df_yelp_reviews_business['magnitude'] = magnitud
    df_yelp_reviews_business['sentimiento'] = sentimiento
        
    # Agregar los datos al DataFrame
    df = pd.concat([df, df_yelp_reviews_business], ignore_index=True) 


KeyError: ('score', 'magnitude', 'sentimiento')

In [70]:
print(len(df['sentimiento'].unique()), 'subniveles')
print(df['sentimiento'].unique())

1 subniveles
['Claramente negativa']


In [68]:
df.tail()

Unnamed: 0,review_id,stars,useful,cool,text,categories,score,magnitude,sentimiento
999995,TWrGEh6w4NoBcQK2E1lHew,1,6,0,I have been a patient for quite a while. What ...,"Skin Care, Health & Medical, Beauty & Spas, La...",-0.373,12.548,Claramente negativa
999996,D2Y0bbNphX_BZyMZoQlbgA,1,6,0,I like the daily grind but consistently when w...,"Restaurants, Sandwiches, Breakfast & Brunch, C...",-0.373,12.548,Claramente negativa
999997,oH2UlKxLZmtD59dOtW50_Q,1,6,0,"I ordered something online from these guys, an...","Active Life, Sporting Goods, Local Services, B...",-0.373,12.548,Claramente negativa
999998,k2bHBt3Z_caBdBZRFeisNQ,1,6,0,"You were not forthright from the begining, you...","Real Estate, Apartments, Home Services, Proper...",-0.373,12.548,Claramente negativa
999999,Ju8NoIPKUWIyNo0JKHR8Tg,1,6,0,"Very disappointed in the food, wait, and servi...","Breakfast & Brunch, Burgers, Sandwiches, Resta...",-0.373,12.548,Claramente negativa
