# Step 1 — Import Libraries
We import the libraries needed for our sentiment analyzer:
- pandas → handle data
- scikit-learn → train/test split, model, and vectorizer
- TfidfVectorizer → convert text to numbers
- LogisticRegression → simple AI model
- accuracy_score → check how good our model is

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 2 — Create Dataset
We create a small dataset of text and sentiment labels.
Each sentence has a corresponding sentiment: positive, negative, or neutral.

In [3]:
data = {
    'text': [
        "I love this product",
        "This is amazing",
        "I hate this so much",
        "This is terrible",
        "I feel happy today",
        "I am very sad",
        "This is the worst day",
        "I am feeling great",
        "I like this movie",
        "I dislike this book",
        "This app is okay",
        "The weather is fine",
        "I enjoy spending time with friends",
        "I am bored",
        "I am excited for the trip",
        "I feel tired",
        "This restaurant is fantastic",
        "I am frustrated with work",
        "The service was average",
        "I feel relaxed after yoga",
        "The concert was amazing",
        "I don't like this song",
        "My day is fine",
        "I am feeling miserable",
        "The food is delicious",
        "I am unhappy with the results",
        "The presentation was okay",
        "I love my new phone",
        "I hate waiting in line",
        "The movie was okay",
        "I feel very joyful today",
        "The internet connection is bad",
        "I am neutral about this event",
        "This cake is wonderful",
        "I feel stressed",
        "The weather is okay",
        "I am thrilled with my score",
        "I dislike the traffic",
        "I feel calm",
        "I am extremely happy"
    ],
    'sentiment': [
        "positive",
        "positive",
        "negative",
        "negative",
        "positive",
        "negative",
        "negative",
        "positive",
        "positive",
        "negative",
        "neutral",
        "neutral",
        "positive",
        "negative",
        "positive",
        "negative",
        "positive",
        "negative",
        "neutral",
        "positive",
        "positive",
        "negative",
        "neutral",
        "negative",
        "positive",
        "negative",
        "neutral",
        "positive",
        "negative",
        "neutral",
        "positive",
        "negative",
        "neutral",
        "positive",
        "negative",
        "neutral",
        "positive",
        "negative",
        "neutral",
        "positive"
    ]
}

df = pd.DataFrame(data)
df

Unnamed: 0,text,sentiment
0,I love this product,positive
1,This is amazing,positive
2,I hate this so much,negative
3,This is terrible,negative
4,I feel happy today,positive
5,I am very sad,negative
6,This is the worst day,negative
7,I am feeling great,positive
8,I like this movie,positive
9,I dislike this book,negative


# Step 3 — Split Data
We split our dataset into training and testing sets.
- X_train / y_train → used to train the model
- X_test / y_test → used to test the model's performance

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    df['text'], df['sentiment'], test_size=0.2, random_state=42
)

# Step 4 — Convert Text to Numbers
Text cannot be directly understood by AI.
We use TfidfVectorizer to convert text into numerical features.

In [5]:
vectorizer = TfidfVectorizer()
X_train_vectors = vectorizer.fit_transform(X_train)
X_test_vectors = vectorizer.transform(X_test)

# Step 5 — Train the Model
We train a Logistic Regression model on the training data.
This model will learn patterns in text to predict sentiment.

In [6]:
model = LogisticRegression()
model.fit(X_train_vectors, y_train)

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,100


# Step 6 — Test the Model
We use the testing data to check how accurate the model is.
Accuracy is the percentage of correct predictions.

In [7]:
y_pred = model.predict(X_test_vectors)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.625


# Step 7 — Try Your Own Sentence
We can now input any sentence to see the model's prediction:
positive, negative, or neutral.

In [8]:
sentence = "I am very happy today"
sentence_vector = vectorizer.transform([sentence])
print("Prediction:", model.predict(sentence_vector)[0])

Prediction: positive


In [10]:
import joblib

# Save the vectorizer
joblib.dump(vectorizer, 'vectorizer.pkl')

# Save the model
joblib.dump(model, 'model.pkl')

print("Files saved!")

Files saved!


In [11]:
import os

# List all files in the current folder
files = os.listdir()

if 'model.pkl' in files and 'vectorizer.pkl' in files:
    print("✅ Success! Both files are present.")
else:
    print("❌ Files not found. Please run the save code again.")

✅ Success! Both files are present.


In [12]:
!pip install streamlit

Defaulting to user installation because normal site-packages is not writeable
Collecting streamlit
  Downloading streamlit-1.51.0-py3-none-any.whl.metadata (9.5 kB)
Collecting altair!=5.4.0,!=5.4.1,<6,>=4.0 (from streamlit)
  Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting cachetools<7,>=4.0 (from streamlit)
  Downloading cachetools-6.2.2-py3-none-any.whl.metadata (5.6 kB)
Collecting protobuf<7,>=3.20 (from streamlit)
  Downloading protobuf-6.33.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Collecting pyarrow<22,>=7.0 (from streamlit)
  Downloading pyarrow-21.0.0-cp313-cp313-win_amd64.whl.metadata (3.4 kB)
Collecting tenacity<10,>=8.1.0 (from streamlit)
  Downloading tenacity-9.1.2-py3-none-any.whl.metadata (1.2 kB)
Collecting toml<2,>=0.10.1 (from streamlit)
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-win_amd64.whl.metadata (44 kB)
Collecting gitpython!=3.1


[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip
