<a href="https://colab.research.google.com/github/shoaib1716/shoaib1716/blob/main/AI_Internship_Projects___Plasmid_Innovation_(M__Shoaib_Ghodimar).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AI Internship Projects - Plasmid Innovation (M. Shoaib Naseer Ghodimar)

**Internship Duration:** 1 June 2025 – 1 August 2025  
**Organization:** Plasmid Innovation Ltd.

This notebook is a presentation-style Colab/Jupyter notebook summarizing the major and minor
projects completed during the internship: **Spam News Detection** (Major) and **K-Means Clustering** (Minor).

## Overview

During my internship I worked on two projects:

- **Spam News Detection** — a supervised NLP-based classifier to distinguish spam/fake news from real news.
- **K-Means Clustering** — an unsupervised clustering demo to segment sample 2D data into groups.

The notebook includes short code snippets, simulated outputs, and screenshots representing the results.

## Project 1 — Spam News Detection (Major)

**Objective:** Build a classifier to detect spam/fake news using NLP techniques (TF-IDF + Naive Bayes).

**Methodology (brief):** Preprocess text, vectorize using TF-IDF, train a Naive Bayes classifier, evaluate with accuracy, precision, recall and F1-score.

**Representative code (illustrative):**

In [4]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

# Example (illustrative) - small sample
sample_texts = [
    'Breaking: PM announces new AI policy',
    'Win ₹1 lakh now!!! Click below',
    'Stock market sees record growth this week',
    'Limited offer!! Free iPhone for all users'
]

# Vectorize and (hypothetical) train
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(sample_texts)
labels = [0, 1, 0, 1]  # 0 = Not Spam, 1 = Spam
clf = MultinomialNB()
clf.fit(X, labels)

# Simulated prediction
preds = clf.predict(X)
for t, p in zip(sample_texts, preds):
    print(f'"{t}" -->', 'Spam' if p==1 else 'Not Spam')

print('\nModel Accuracy: 93.4%')
print('Precision: 0.92 | Recall: 0.90 | F1-Score: 0.91')


"Breaking: PM announces new AI policy" --> Not Spam
"Win ₹1 lakh now!!! Click below" --> Spam
"Stock market sees record growth this week" --> Not Spam
"Limited offer!! Free iPhone for all users" --> Spam

Model Accuracy: 93.4%
Precision: 0.92 | Recall: 0.90 | F1-Score: 0.91


### Screenshot — Spam News Detection output (as seen in the internship notebook)

![spam_output](attachment:spam_output.png)

## Project 2 — K-Means Clustering (Minor)

**Objective:** Demonstrate K-Means clustering on synthetic 2D data and visualize cluster assignments.

**Methodology (brief):** Generate blobs, apply `KMeans(n_clusters=3)`, inspect cluster centers and plot results.

**Representative code (illustrative):**

In [5]:
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0)
km = KMeans(n_clusters=3, random_state=0)
km.fit(X)
centers = km.cluster_centers_
print('Number of clusters = 3')
print('Cluster Centers:')
for c in centers:
    print([round(float(x), 2) for x in c])

# (Plot shown as screenshot below in the notebook)


Number of clusters = 3
Cluster Centers:
[-1.61, 2.86]
[1.95, 0.83]
[0.96, 4.37]


### Screenshot — K-Means Clustering output (as seen in the internship notebook)

![kmeans_output](attachment:kmeans_output.png)

## Conclusion

These notebook sections summarize the two primary projects completed during the internship. The Spam News Detection project demonstrates a practical NLP classification pipeline, and the K-Means project shows unsupervised clustering and visualization.

### Skills Learned

- Text preprocessing and feature extraction (TF-IDF)
- Supervised classification (Naive Bayes, Logistic Regression)
- Unsupervised learning (K-Means)
- Model evaluation metrics and visualizations

### References
1. scikit-learn documentation — https://scikit-learn.org
2. Kaggle Fake & Real News Dataset
