<h1 align="center"><b>Financial Sentiment Predict
</b></h1>

## **Import Libraries**

In [1]:
#  Load model & labels 
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import load_model
import os
import pandas as pd
import json
from tensorflow.keras.layers import TextVectorization

## **Modelling**

In [2]:
EXP = "impr2"
MODEL = f"model_{EXP}.keras"
VOCAB = f"vocab_{EXP}.txt"
CFG   = f"tv_cfg_{EXP}.json"
LABEL = f"class_names_{EXP}.npy" 

- Menentukan nama file yang akan dipakai untuk load model, vocabulary, konfigurasi, dan label.
- Supaya konsisten, cukup ganti EXP (experiment name), maka semua file terkait ikut berubah.

In [3]:
model = load_model(MODEL)
idx2label = np.load(LABEL, allow_pickle=True).tolist() \
    if os.path.exists(LABEL) else ['negative','neutral','positive']

- load_model(MODEL) → memuat model .keras yang sudah disimpan.
- np.load(LABEL...) → load file label (class_names_impr2.npy) supaya tahu urutan kelas (misalnya 0=negative, 1=neutral, 2=positive).
- Kalau file label tidak ada → default pakai list ['negative','neutral','positive'].

In [4]:
e2e = (model.inputs[0].dtype == tf.string) 

In [None]:
s = pd.read_csv(
    VOCAB,
    header=None,
    names=["tok"],
    dtype=str,           
    keep_default_na=False, 
    na_filter=False        
)

vocab = s["tok"].astype(str).str.strip()
vocab = vocab[vocab != ""].tolist()

oov_set = {"[unk]", "<unk>", "unk"}
if vocab and vocab[0].lower() in oov_set:
    vocab = vocab[1:]

cfg = json.load(open(CFG, "r"))

text_vectorization = TextVectorization(
    max_tokens=None,                             
    standardize=cfg["standardize"],
    split=cfg["split"],
    output_mode=cfg["output_mode"],
    output_sequence_length=cfg["max_len"],
)
text_vectorization.set_vocabulary(vocab)

print("Vectorizer OK | vocab_size:", len(vocab), "| max_len:", cfg["max_len"])

Vectorizer OK | vocab_size: 9548 | max_len: 48


- Membaca file vocabulary (vocab_impr2.txt) hasil dari training sebelumnya, setiap kata/token disimpan sebagai kolom "tok".
- Membersihkan vocabulary (hapus spasi kosong, hilangkan token kosong), lalu konversi ke list Python.
- Mengecek apakah token pertama adalah OOV token ([unk], <unk>, atau unk), kalau iya maka dibuang.
- Membaca konfigurasi preprocessing (tv_cfg_impr2.json) yang berisi parameter max_len, standardize, split, dll.
- Membangun ulang layer TextVectorization dengan konfigurasi dan vocabulary yang sama persis dengan saat training, supaya model inference konsisten.
- Mengecek apakah vectorizer sudah berhasil dibuat dengan benar, serta menampilkan jumlah vocabulary dan panjang sequence maksimum.

In [6]:
EXP = "impr2"
model = load_model(f"model_{EXP}.keras")
print("Model loaded!")

Model loaded!


- Menentukan nama eksperimen (impr2).
- Memuat model yang sudah disimpan sebelumnya (model_impr2.keras).
- Mengecek apakah model berhasil dimuat.

In [7]:
texts = [
    "Revenue jumped and management raised full-year guidance.",
    "Profit declined sharply and the outlook remains weak.",
    "Shares were little changed following the announcement.",
    "Customer demand is strong and margins improved this quarter.",
    "The company cut its forecast due to softer sales.",
    "Analysts expect stable performance with limited upside.",
    "New product launch exceeded expectations across regions.",
    "Operational issues led to delays and higher costs."
]

In [8]:
# vectorize -> predict
X = text_vectorization(np.array(texts)).numpy()   # TV sudah direbuild sebelumnya
probs = model.predict(X, verbose=0)
preds = probs.argmax(1)

for t, p, pr in zip(texts, preds, probs):
    print("text :", t)
    print("pred :", idx2label[int(p)], "| probs:", np.round(pr, 4))
    print("-"*60)

text : Revenue jumped and management raised full-year guidance.
pred : positive | probs: [0.0836 0.0327 0.8836]
------------------------------------------------------------
text : Profit declined sharply and the outlook remains weak.
pred : neutral | probs: [0.2346 0.6622 0.1032]
------------------------------------------------------------
text : Shares were little changed following the announcement.
pred : neutral | probs: [0.3554 0.4494 0.1953]
------------------------------------------------------------
text : Customer demand is strong and margins improved this quarter.
pred : positive | probs: [0.0395 0.0145 0.9461]
------------------------------------------------------------
text : The company cut its forecast due to softer sales.
pred : positive | probs: [0.2454 0.1903 0.5643]
------------------------------------------------------------
text : Analysts expect stable performance with limited upside.
pred : positive | probs: [0.2456 0.2936 0.4608]
----------------------------------

**Kesimpulan Inference**
1. Model kuat dalam mendeteksi sentimen positif (contoh: “revenue jumped”, “customer demand strong”) dengan probabilitas tinggi dan konsisten.
2. Namun, banyak kesalahan pada kalimat bernuansa negatif, yang sering diprediksi sebagai positif atau netral.
3. Pada kalimat netral, model kadang bingung jika ada campuran kata positif/negatif ringan, sehingga hasil prediksi kurang stabil.
4. Secara umum: inference ini menunjukkan bahwa model masih bias ke kelas positif, sehingga perlu perbaikan dataset (balancing kelas negatif & netral) atau penambahan fitur linguistik untuk menangkap nuansa negatif secara lebih baik.