<a href="https://colab.research.google.com/github/mchoirul/genai-code/blob/main/notebook/sentimentanalysis_palm_langchain_vector_github.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis with VertexAI PALM API & Langchain
Reference & credit:

- https://www.kaggle.com/code/derrickmwiti/large-language-model-applications-palm-api-and-lan
- https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/orchestration/langchain/intro_langchain_palm_api.ipynb

Analyze sentiment of Indonesian online news title.

News dataset: [news-sample-with-sentiment](https://github.com/mchoirul/llm-code/blob/853c54825871a93ff6bff9f96e68d558c413a35e/idnews-curated-dataset.csv )

## Preparation

In [None]:
#install required package
!pip -q install langchain tiktoken
!pip install chromadb
!pip install google-cloud-aiplatform

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.4.12-py3-none-any.whl (426 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m426.5/426.5 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastapi<0.100.0,>=0.95.2 (from chromadb)
  Downloading fastapi-0.99.1-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [None]:
#make sure to restart the runtime after install

In [None]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain.document_loaders import CSVLoader
from langchain.llms import VertexAI
from pydantic import BaseModel
from typing import List
from langchain.embeddings import VertexAIEmbeddings
import time

## Authenticate to GCP Project

In [None]:
# authenticate to google cloud
from google.colab import auth as google_auth
google_auth.authenticate_user()

In [None]:
#specify gcp project name and location
import vertexai

PROJECT_ID = "your-gcp-project"  # @param {type:"string"}
vertexai.init(project=PROJECT_ID, location="us-central1")

In [None]:
#mount to google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Load & Create Chunk of sample news dataset

In [None]:
#dataset: idnews-curated-dataset.csv
#curated news title from Indonesian online media
#news csv file: https://github.com/mchoirul/genai-code/blob/main/notebook/idnews-curated-dataset.csv

#put csv to gdrive for easy access
dataset_path ='/content/drive/MyDrive/mydataset/idnews-curated-dataset.csv'

# load the document and split it into chunks
loader = CSVLoader(dataset_path, csv_args={"delimiter": ";"})
documents = loader.load()

# split it into chunks
text_splitter = CharacterTextSplitter(separator="\n")
textdocs = text_splitter.split_documents(documents)

#check content
print(textdocs)

[Document(page_content='input: Komodifikasi Khalayak: Pengertian dan Contohnya\noutput: Neutral', metadata={'source': '/content/drive/MyDrive/mydataset/idnews-curated-dataset.csv', 'row': 0}), Document(page_content='input: Persepsi: Pengertian dan Contohnya\noutput: Neutral', metadata={'source': '/content/drive/MyDrive/mydataset/idnews-curated-dataset.csv', 'row': 1}), Document(page_content='input: Soal dan Jawaban Bilangan Positif dan Negatif\noutput: Neutral', metadata={'source': '/content/drive/MyDrive/mydataset/idnews-curated-dataset.csv', 'row': 2}), Document(page_content='input: Cara Menyebutkan Tanggal dan Tahun dalam Bahasa Inggris\noutput: Neutral', metadata={'source': '/content/drive/MyDrive/mydataset/idnews-curated-dataset.csv', 'row': 3}), Document(page_content='input: Bagaimana Arus Konveksi Terjadi?\noutput: Neutral', metadata={'source': '/content/drive/MyDrive/mydataset/idnews-curated-dataset.csv', 'row': 4}), Document(page_content='input: [HOAKS] Abaikan Permohonan Anwa

## Create embedding function with Vertex AI Embedding

In [None]:
#embedding using vertexai embedding
#vertexai embedding maximum allow 5 input texts per request.
#https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#get_text_embeddings_for_a_snippet_of_text

#create custom method to handle rate limit of 5 input texts per request.
vertex_embeddings = VertexAIEmbeddings(model_name='textembedding-gecko@001')

# Utility to perform vertex ai embedding with rate limit
def rate_limit(max_per_minute):
    period = 60 / max_per_minute
    print("Waiting")
    while True:
        before = time.time()
        yield
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print(".", end="")
            time.sleep(sleep_time)

#extend base model and add rate limiting functionality
class CustomVertexAIEmbeddings(VertexAIEmbeddings, BaseModel):
    req_per_minute: int
    num_items_per_batch: int

    # Override embed_documents method
    def embed_documents(self, texts: List[str]):
        limiter = rate_limit(self.req_per_minute)
        results = []
        docs = list(texts)

        while docs:
            # limit batch size to 5 docs per request
            head, docs = (
                docs[: self.num_items_per_batch],
                docs[self.num_items_per_batch :],
            )
            chunk = self.client.get_embeddings(head)
            results.extend(chunk)
            next(limiter)

        return [r.values for r in results]

In [None]:
# declare embedding function
EMBEDDING_REQ_PER_MIN = 90  #adjust this number according to your quota
EMBEDDING_BATCH_SIZE = 5
vertex_embeddings = CustomVertexAIEmbeddings(
    req_per_minute=EMBEDDING_REQ_PER_MIN,
    num_items_per_batch=EMBEDDING_BATCH_SIZE,
)

In [None]:
#create embedding and store in chromadb for the first time
#persist to directory for later use

db = Chroma.from_documents(textdocs, vertex_embeddings,
                           persist_directory='/content/drive/MyDrive/chromadb')
db.persist()

Waiting
........................................................................................................................................

In [None]:
#load from vectordb file if already been created and persisted to directory
#db = Chroma(persist_directory='/content/drive/MyDrive/chromadb', embedding_function=vertex_embeddings)

## Similarity search for news title

In [None]:
# try to query vectordb
# find 3 similar titles
query = "Golkar calonkan airlangga untuk cawapres"
docs = db.similarity_search(query, k=3)

print("search query : ")
print(query)
print("\n")
print("Result")
# print results
for items in docs:
    print(items.page_content)


search query : 
Golkar calonkan airlangga untuk cawapres


Result
input: Airlangga soal Pergantian Ketum Golkar: Silakan Kalau Minat di 2024
output: Neutral
input: Airlangga soal Pergantian Ketum Golkar: Silakan Kalau Minat di 2024
output: Neutral
input: Golkar-PAN Merapat ke KKIR, PKS: Bravo Pak Jokowi, Eh Bravo Pak Prabowo
output: Neutral


In [None]:
#wrap vector similarity search as method
def return_vectorsimilary(query):

  docs = db.similarity_search(query, k=2)
  # # print results
  for items in docs:
      print(items.page_content)

#test the function
query = "resto ini menunya terbatas, rasa juga so so aja. enaknya ruangan gede muat banyak teman"
return_vectorsimilary(query)

input: tadinya pengin mengantar teman dari kalau ke resto yang lagi hit di bandung  pada rekomendasi ke sini  ah tetapi menyesal sekalilah ini  makanan nya biasa sekali cenderung agak gagal  sepertinya tidak bakal balik ke sana kalau begini   terlalu tinggi expectation kali ya
output: Negative
input: dibandingkan dengan restoran yang ada di sekitar nya  restoran ini ratarata saja  makanan nya juga soso dengan harga yang memang relatif lebih murah dibanding yang lain  lokasi cukup strategis  tapi interior restoran juga standar  pelayanan cukup cepat  jadi inti nya restoran ini tidak spesial
output: Negative


## Sentiment analysis with Palm API

In [None]:
#declare llm using vertex palm for text api
llm = VertexAI(model_name='text-bison@001',
               top_k=40,
               top_n=0.8,
               temperature=0.1,
               max_output_tokens=100,
               max_retries=3)

### Method 1: Analyze sentiment with One shot question
Analyze sentiment directly by supplying prompt & question sirectly into the model. No example provided.

In [None]:
#first method: one shot question without example
#create function to analyze sentiment, return sentiment label only

def label_sentiment_oneshot(text):
    response_oneshot = llm(
        f'Analyze sentiment of the following text. Classify using the one of these: "Positive", "Negative, or "Neutral". What is the sentiment label of the following text?\n\n"{text}"\n\nOutput: {{output}}')
    return response_oneshot

#test function
label_sentiment_oneshot('jokowi tegaskan tidak hadapi wto dalam kasus sawit')

'Positive'

### Method 2: Analyze sentiment with Few shots example
Analyze sentiment by providing examples. Supply example as part of prompt to LLM.

1. search vectordb to find similar news title
2. use similar news title as example into llm prompt

In [None]:
#second method: few shots prompting. analyze sentiment by supplying examples.
#return seentiment label only
#provide news examples as part of the prompt

#supply examples and question
def label_sentiment_fewshot(example, text):
    response_fewshot = llm(
        f'Analyze sentiment of the following text. Classify as "Positive", "Negative, or "Neutral". Use the following example: \n\n" {example}"\n\n What is the sentiment category of the following text?\n\n"{text}"\n\nOutput: {{output}}')
    return response_fewshot

In [None]:
#try to analyze sentiment by providing example

query = 'Coach Indra ungkapkan pemain PSSI masih lemah dalam teknik dan mental'

print('example:')
example=return_vectorsimilary(query)

print('\nInput query:',query)
print('sentiment output:', label_sentiment_fewshot(example, query))

example:
input: Liga 2 Dihentikan, Persib Kecewa dengan Keputusan PSSI
output: Negative
input: Tokoh Perempuan Minang Nilai Sepak Bola Tidak Hanya Olahraga Tetapi Juga Pembelajaran Soal Sportifitas dan Persatuan
output: Positive

Input query: Coach Indra ungkapkan pemain PSSI masih lemah dalam teknik dan mental
sentiment output: Negative


In [None]:
query = 'Erick yakin STY punya strategi hadapi Thailand'

#return examples from similarity search
print('example:')
example=return_vectorsimilary(query)

#show sentiment prediction
print('\nInput query:',query)
print('sentiment output:', label_sentiment_fewshot(example, query))

example:
input: Personil Polsek Banjar Kawal Kegiatan Jalan Sehat Bersama Erick Tohir dalam rangka mrriahkan HUT RI ke 78
output: Positive
input: Personil Polsek Banjar Kawal Kegiatan Jalan Sehat Bersama Erick Tohir dalam rangka mrriahkan HUT RI ke 78
output: Positive

Input query: Erick yakin STY punya strategi hadapi Thailand
sentiment output: Positive


In [None]:
#compare oneshot (no example) with fewshot (with examples)
query = 'menu resto lumayan mahal, makanan so so. tempanya aja luas enak buat ngumpul '
#return examples from similarity search
print('example:')
example=return_vectorsimilary(query)

#show sentiment prediction
print('\n Input query:',query)
print('sentiment output with example:', label_sentiment_fewshot(example, query))

#sentiment with oneshot, no example
print('sentiment output without example:',label_sentiment_oneshot(query))


example:
input: tempat ini menjual sate ayam  tapi harga nya gila  lebih baik makanan di restoran lain yang lebih nyaman pakai ac bisa dapat harga lebih murah  tempat ini cuma makan di bawah tenda dan kursi plastik tidak nyaman saja harga kayak restoran bintang 5  tempat ini cocok dibilang makanan kaki lima tapi harga bintang lima
output: Negative
input: dibandingkan dengan restoran yang ada di sekitar nya  restoran ini ratarata saja  makanan nya juga soso dengan harga yang memang relatif lebih murah dibanding yang lain  lokasi cukup strategis  tapi interior restoran juga standar  pelayanan cukup cepat  jadi inti nya restoran ini tidak spesial
output: Negative

 Input query: menu resto lumayan mahal, makanan so so. tempanya aja luas enak buat ngumpul 
sentiment output with example: Negative
sentiment output without example: Negative


In [None]:
#delete all collection in vector db - if required
#clean up and clear
db.delete_collection()