In [13]:
# imports 
%load_ext autoreload
%autoreload 2

import google.cloud.aiplatform as aiplatform
import kfp
from kfp.v2 import compiler
from kfp.v2 import dsl
import os 
import datetime

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Set GCP Credentials 

In [14]:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="your_gcp_credentials.json"
pid = 'your_gcp_project'
aiplatform.init(project=pid)

# Step 1. Define Arguments for Pipeline--note input data should already be stored in a folder in GCP


The pipeline component takes the following arguments:

>> gcs_main_bucket (str): The main Google Cloud Storage bucket.

>> gcs_folder (str): The specific folder in the GCS bucket.

>> project_id (str): The Google Cloud Project ID.

>> api_type (str): Name of the api type to use.

>> labels (list): List of labels to use.

>> llm_query (str): Prompt for LLM

>> file_suffix (str): Suffix to add to files.

>> rate_limit (int): Rate limit for API calls/second. E.g. if your quota limit is 1200 requests/minute then 1200 req/min * 1min/60s = 20 requests/ seconds




In [29]:
# Arguments 
llm_query = """ Label the sentiment of the social posts by following the sentiment guidelines below.

Negative Guidelines:

1. **Focus on Sentiment Words:** Look for words that express a negative opinion or dissatisfaction, such as "overrated," "not worth it," "wish was different," "annoying," or "too predictable."

2. **Check for Comparisons:** Pay attention to comparisons that place the subject in a negative light compared to something else. For example, "Pikachu's popularity has overshadowed other great Pokémon" implies that Pikachu doesn't deserve its popularity compared to other Pokémon.

3. **Look for Wishes or Desires for Change:** If the post expresses a wish or desire for something to be different, it may indicate dissatisfaction with the current state. For example, "I wish Ash had a different starter Pokémon" shows a desire for change and potential negativity towards the current situation.

4. **Consider Hashtags:** Hashtags can often provide context about the post's sentiment. For example, #Boycott is a clear indication of a negative sentiment.

5. **Pay Attention to Emoji Usage:** Emojis can often convey the tone of the post. A frustrated or angry emoji can indicate a negative sentiment.

6. **Take into Account the Overall Tone:** Look at the post as a whole and consider the overall tone. If the post seems to be expressing dissatisfaction, disappointment, or frustration, it is likely negative.

7. **Note any Call to Action:** If the post includes a call to action that is based on a negative sentiment, such as boycotting, it is likely negative.

Positive Guidelines:

1. **Look for Positive Words or Phrases:** Focus on words or phrases that express a positive opinion, satisfaction, or praise, such as "love," "pleasure," "cute," "well-made," "charming," "can't get enough of," or "exciting things are coming."

2. **Check for Positive Experiences or Memories:** Pay attention to positive experiences or fond memories mentioned in the post. For example, "I've had a Pikachu plushie since I was a kid" or "I've been a fan of Pikachu since I was a kid" show a positive emotional connection.

3. **Consider Hashtags:** Hashtags can often provide context about the post's sentiment. For example, #Pikachu combined with positive words or phrases is a clear indication of a positive sentiment.

4. **Pay Attention to Emoji Usage:** Emojis can often convey the tone of the post. Positive emojis such as hearts or smiley faces can indicate a positive sentiment.

5. **Take into Account the Overall Tone:** Look at the post as a whole and consider the overall tone. If the post seems to be expressing happiness, satisfaction, or excitement, it is likely positive.

6. **Notice Expressions of Pride or Accomplishment:** If the post expresses pride or accomplishment, such as winning a battle or having a productive meeting, it is likely positive.

7. **Check for Expressions of Affection or Nostalgia:** Posts that express affection for or nostalgia about the subject, such as mentioning a cherished childhood toy or a special place in one's heart, are likely positive.

Neutral Guidelines:

1. **Focus on Factual Statements:** Look for statements that are factual or informative, without expressing a clear positive or negative opinion. For example, "Pikachu's height is 0.4 meters" or "Pikachu can use the move Quick Attack."

2. **Check for Lack of Emotional Language:** Neutral posts often lack emotional language or sentiments, such as "love," "hate," "exciting," or "disappointing."

3. **Consider Hashtags:** Hashtags can provide context about the post's content, but in neutral posts, they often relate to the topic without adding sentiment. For example, #Pikachu combined with a factual statement is likely neutral.

4. **Pay Attention to Emoji Usage:** Emojis can often convey the tone of the post. Neutral posts may use emojis that are related to the content without expressing a strong positive or negative emotion, such as a lightning bolt emoji for Pikachu's tail.

5. **Look for Descriptions or Explanations:** Neutral posts may include descriptions or explanations that add context to the topic without expressing a clear sentiment.

6. **Check for Research or Study References:** Posts that reference research, reports, case studies, or other forms of investigation are likely to be neutral, as they often focus on factual information.

7. **Consider Conversational Posts:** Conversations that simply share information or preferences, without expressing a strong sentiment, are likely neutral. For example, "Had a conversation with a friend about our favorite Pokémon."



"""
pipeline_parameters = {'gcs_main_bucket':'mm-gpe-data',
                      'gcs_folder': 'pikachu_llm_sentiment_labeling',
                      'project_id':pid,
                      'api_type':'vertex-api',
                      'labels':['negative','neutral','positive'],
                      'llm_query':llm_query,
                      'file_suffix':datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
                      'rate_limit':20} 



## Step 2. Build out Kubeflow Pipeline 

In [22]:
@dsl.pipeline(name=f'llm-pikachu-labeling')
def pikachu_llm_pipeline(gcs_main_bucket:str,gcs_folder:str,project_id: str, api_type: str, labels:list, llm_query:str, file_suffix: str, rate_limit: int):
    
    # LLM Labeling Component 
    llm_component = kfp.components.load_component_from_text('''
    name: llm_component
    description: Preprocesses data for NLP Models
    inputs:
    - {name: gcs_main_bucket, type: String}
    - {name: gcs_folder, type: String}
    - {name: project_id, type: String}
    - {name: api_type, type: String}
    - {name: labels, type: JsonArray}
    - {name: llm_query, type: String}
    - {name: file_suffix, type: String}
    - {name: rate_limit, type: Integer}
   

     
    implementation:
      container:
        image: gcr.io/<your-project-id>/<your-pipeline-name>/llm_component:latest
        args: [
        --gcs-main-bucket, {inputValue: gcs_main_bucket},
        --gcs-folder, {inputValue: gcs_folder},
        --project-id, {inputValue: project_id},
        --api-type, {inputValue: api_type},
        --labels, {inputValue: labels},
        --llm-query, {inputValue: llm_query},
        --file-suffix, {inputValue: file_suffix},
        --rate-limit, {inputValue: rate_limit},

       
        
        ]
       

    ''')

    llm_step = llm_component(gcs_main_bucket,gcs_folder,project_id,api_type,labels,llm_query,file_suffix,rate_limit).set_cpu_limit('8').set_memory_limit('30')
    

    

compiler.Compiler().compile(pipeline_func=pikachu_llm_pipeline, package_path='pipeline_spec.json',type_check=True)


## Step 3. Run Pipeline 

In [23]:
DISPLAY_NAME = f"{datetime.datetime.now().strftime('%Y%m%d')}_pikachu_llm_sentiment_run"

PIPELINE_ROOT = f"gs://{pipeline_parameters['gcs_main_bucket']}/{pipeline_parameters['gcs_folder']}/pipeline_root"



In [1]:
# Run the Pipeline 
job = aiplatform.PipelineJob(
    display_name=DISPLAY_NAME,
    template_path='pipeline_spec.json',
    pipeline_root=PIPELINE_ROOT,
    parameter_values = pipeline_parameters,
    labels={'dss_demo':'pikachu-sentiment'}

)

job.submit()