# IBM Build Lab | Watson Brothers

> **Participant**: `YOUR_NAME`
> 
> **Delivery Date**: `dd/mm/yy`

> **Note:** this notebook was given to you as a reference to help you order your work during the challenge. You are free to edit it, add/modify/remove as many cells as you like.

## 0. Goal

Engineer a prompt using one of the [supported foundation models available with watsonx.ai](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx) to classify the [movie reviews](assets/data/movie-reviews-train.csv) given to you so that if a new review is inputted, the system will respond either `Positive`, `Negative` or `Unknown` if it's impossible to determine.

Evaluate its results (based on the [test dataset](assets/data/movie-reviews-test.csv)) on the following metrics:
- Accuracy
- Precision & Recall
- F1-Score

To do this, you have been given a train/test dataset in `csv` format containing three columns: `row number`, `review` and `score`, where score is `1` for Positive reviews and `0` for Negative ones.

## 0.1. Relevant considerations

- You are expected to program clearly, and offer comments on your decisions.
- It is expected for programming best practices to be leveraged. Used environment variables and do not share credentials with others.
- Not only will programming skills will be evaluated, but your analysis and writing also. Provide observations, graphics, images and any other resource you wish in order to help readers understand your process and decisions.
- Provide a conclusion that summarizes what the challenge was, how you solved it, and what your results are. Support your arguments with data. 

## 1. Environment setup

See the [`requirements.txt`](requirements.txt) file for more information on dependencies.

In [11]:
import pandas as pd

from os import getenv

### 1.1. Environment variables

In [52]:
# Load environment variables using dotenv

## 2. Load Data

Load data from the CSV files. The `train` examples you can use to create your prompt and/or tune your model. The `test` are meant to be used for validation.

In [2]:
train_filename = "../assets/data/movie-reviews-train.csv"
test_filename = "../assets/data/movie-reviews-test.csv"

In [6]:
df_train = pd.read_csv(train_filename)

df_train.head()

Unnamed: 0,row,text,label
0,0,I rented I AM CURIOUS-YELLOW from my video sto...,0
1,1,"""I Am Curious: Yellow"" is a risible and preten...",0
2,2,If only to avoid making this type of film in t...,0
3,3,This film was probably inspired by Godard's Ma...,0
4,4,"Oh, brother...after hearing about this ridicul...",0


In [5]:
df_test = pd.read_csv(test_filename)

df_test.head()

Unnamed: 0,row,text,label
0,0,I love sci-fi and am willing to put up with a ...,0
1,1,"Worth the entertainment value of a rental, esp...",0
2,2,its a totally average film with a few semi-alr...,0
3,3,STAR RATING: ***** Saturday Night **** Friday ...,0
4,4,"First off let me say, If you haven't enjoyed a...",0


## 3. Data Exploration & Manipulation

In [51]:
# Add your explorations, changes and modifications here

## 4. Prompt Engineering & LLM Invoke

Now we will define the LLM we are going to invoke, and define the prompt to be sent

In [22]:
from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes, DecodingMethods

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

### 4.1. WatsonX Setup

Define watsonx credentials from environment variables:

In [None]:
wx_creds = {
    "apikey": getenv("WATSONX_API_KEY"),
    "url": getenv("WX_URL", 'https://us-south.ml.cloud.ibm.com'),
    "project_id" : getenv("WX_PROJECT_ID")
}

print(wx_creds)

In [25]:
model_id = "MODEL_ID"
model_params = "MODEL PARAMS"

In [26]:
# Instantiate a model proxy object to send requests
wx_llm = Model(
        model_id=model_id,
        params=model_params,
        credentials=wx_creds,
        project_id=wx_creds["project_id"],
    )

> **Note:** it is highly recommended you experiment with different parameters. Visit the [SDK documentation](https://ibm.github.io/watsonx-ai-python-sdk/) for more information.

### 4.2. Prompt Generation

In [27]:
prompt = "YOUR PROMPT GOES HERE"

### 4.3. LLM Invoke

Here you will invoke the WatsonX LLM for classifying the user reviews.

In [None]:
result = wx_llm.generate_text(
    prompt=prompt
)

print(result)

# 5. Prompt evaluation

In [32]:
from sklearn import metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import classification_report

In [39]:
# load ground truth and predicted values from classier output and store as variables
ground_truth = [0, 0, 1, 0, 1] # EXAMPLE VALUES

predictions = [0, 0, 0, 1, 1] # EXAMPLE VALUES

Create the confusion matrix:

In [40]:
confusion_matrix = metrics.confusion_matrix(ground_truth, predictions)

print(confusion_matrix)

[[2 1]
 [1 1]]


In [41]:
# define normalized confusion matrix
confusion_matrix_norm = metrics.confusion_matrix(ground_truth, predictions, normalize="all")

print(confusion_matrix_norm)

[[0.4 0.2]
 [0.2 0.2]]


### 5.1. Accuracy

Accuracy is one of the most basic evaluation metrics. However, is not the most informative. It tells you how many samples your model thinks are right, not how many there truly are. Use other evaluation metrics to provide a better picture of the performance of a classification algorithm.

Compute accuracy using `sklearn.metrics`:

In [42]:
accuracy = accuracy_score(ground_truth, predictions)

print("Accuracy:", accuracy)

Accuracy: 0.6


```
TODO: PROVIDE AN OBSERVATION FOR THIS RESULT
```

### 5.2. Precision & Recall

**Precision (PPV)**, tells you proportion of predicted positive samples that actually belong to the class in question. 

**Recall (TPR)**, is the percentage of actual class instances that are correctly classified by the model.

Compute precision using `sklearn.metrics`:

In [43]:
precision = precision_score(ground_truth, predictions)

print("Precision:", precision)

Precision: 0.5


Compute recall using `sklearn.metrics`:

In [44]:
recall = recall_score(ground_truth, predictions)

print("Recall:", recall)

Recall: 0.5


```
TODO: PROVIDE AN OBSERVATION FOR THESE RESULTS
```

### 5.3. F1-Score

The **F1 score** is the harmonic mean of precision and recall. It represents a model’s total class-wise accuracy.

Compute F1 Score using `sklearn.metrics`:

In [45]:
F1 = f1_score(ground_truth, predictions)

print("F1 Score:", F1)


F1 Score: 0.5


```
TODO: PROVIDE AN OBSERVATION FOR THIS RESULT
```

## 6. Conclusions

### 6.1. Classification Report

Final report using `sklearn.classification_report`:

In [47]:
report = classification_report(ground_truth, predictions)

print(report)


              precision    recall  f1-score   support

           0       0.67      0.67      0.67         3
           1       0.50      0.50      0.50         2

    accuracy                           0.60         5
   macro avg       0.58      0.58      0.58         5
weighted avg       0.60      0.60      0.60         5



```

TODO: PROVIDE CONCLUSIONS AND COMMENTS.

```

### 6.2. Lessons Learned

```
TODO: PROVIDE A BRIEF CONCLUSION OF YOUR EXPERIENCE DURING THIS CHALLENGE.
```
