<a href="https://colab.research.google.com/github/enasshalolh/my_coulb_project/blob/main/SVM_vs_NaiveBayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

✅ Step 1: Creating the Dataset



In [7]:
import pandas as pd

# تحميل البيانات من ملف CSV
df = pd.read_csv("Reviews.csv")

# عرض أول 5 مراجعات
print(df.head())


                                              review  label
0  The food was delicious, and the service was ex...      1
1  Oh wow, this place is amazing… if you love wai...      0
2  I absolutely love this phone. The battery last...      1
3     Best phone ever! Died after 10 minutes of use!      0
4  Fantastic hotel! The staff was so friendly and...      1


✅ Step 2: Converting Text to Numerical Data using TF-IDF

In [8]:
# استيراد المكتبات الضرورية
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

# تحميل البيانات من ملف CSV
df = pd.read_csv("Reviews.csv")

# إنشاء نموذج TF-IDF
tfidf_vectorizer = TfidfVectorizer()

# تحويل النصوص إلى تمثيل رقمي
X_tfidf = tfidf_vectorizer.fit_transform(df["review"])

# عرض أبعاد البيانات بعد التحويل
print("TF-IDF:", X_tfidf.shape)  # عدد المراجعات × عدد الكلمات الفريدة


TF-IDF: (25, 146)


✅ Step 3: Splitting Data and Training Models (SVM & Naïve Bayes)





In [9]:
# استيراد المكتبات اللازمة
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# تقسيم البيانات إلى تدريب واختبار (80% تدريب، 20% اختبار)
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, df["label"], test_size=0.2, random_state=42)

# إنشاء وتدريب نموذج SVM
svm_model = SVC(kernel='linear')  # استخدام نواة خطية
svm_model.fit(X_train, y_train)

# إنشاء وتدريب نموذج Naïve Bayes
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

# التنبؤ على بيانات الاختبار
y_pred_svm = svm_model.predict(X_test)
y_pred_nb = nb_model.predict(X_test)

# حساب دقة النموذجين
accuracy_svm = accuracy_score(y_test, y_pred_svm)
accuracy_nb = accuracy_score(y_test, y_pred_nb)

# طباعة النتائج
print("دقة نموذج SVM:", accuracy_svm)
print("دقة نموذج Naïve Bayes:", accuracy_nb)


دقة نموذج SVM: 0.8
دقة نموذج Naïve Bayes: 0.8


✅ Step 4: Testing Models and Comparing Results



In [10]:
# إنشاء مراجعتين جديدتين (واحدة جادة وواحدة ساخرة) لاختبار النماذج
new_reviews = ["This restaurant has the best service and food! Highly recommended.",
               "Oh wow, I just love waiting an hour to get cold food. Amazing!"]

# تحويل المراجعات الجديدة إلى نفس تمثيل TF-IDF
X_new_tfidf = tfidf_vectorizer.transform(new_reviews)

# التنبؤ باستخدام كلا النموذجين
svm_prediction = svm_model.predict(X_new_tfidf)
nb_prediction = nb_model.predict(X_new_tfidf)

# طباعة النتائج
print("SVM Prediction:", svm_prediction)
print("Naïve Bayes Prediction:", nb_prediction)


SVM Prediction: [1 0]
Naïve Bayes Prediction: [1 0]


1️⃣ Did both models agree on the classification?
✅ Yes, both SVM and Naïve Bayes predicted the same results:

The first review was classified as genuine (1).
The second review was classified as sarcastic (0).
2️⃣ Which model had higher accuracy?
✅ In our case, both models achieved 80% accuracy on the test dataset. So, they performed equally well.

3️⃣ Which model was faster?
✅ Naïve Bayes is generally faster than SVM, especially for large text datasets.

Naïve Bayes relies on probabilistic calculations, making it computationally efficient.
SVM, on the other hand, involves finding the optimal decision boundary, which can be slower for large datasets.
4️⃣ When would SVM be better than Naïve Bayes and vice versa?
✅ SVM is better when:

The data is complex and non-linearly separable.
High accuracy is more important than speed.
The dataset is relatively small to medium-sized.
✅ Naïve Bayes is better when:

The dataset is very large (millions of reviews).
You need fast predictions in real-time applications.
The text follows a simple distribution, such as spam detection.
