# 📘 Level 2: Basic NLP Models
---
Welcome to the second level! Let's start building simple NLP models.

Topics we will cover:
- Sentiment Analysis (Classical ML)
- Text Classification (Manual categories)
- Language Detection

Each topic includes:
- Definition 🧠
- Why Use It 🎯
- Code Examples with Explanation 💻
- Mini Assignments ✍️


# 1. Sentiment Analysis (using Classical Machine Learning)
**Definition:**  
Sentiment Analysis is the task of identifying emotions (Positive, Negative, Neutral) in text.

**Why use Sentiment Analysis?**  
- Understand customer feedback automatically.
- Brand monitoring, Social media analysis.



In [2]:

# Import Libraries
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Sample Data
texts = ["I love this product", "This is a bad movie", "Amazing experience!", "Worst service ever", "I feel great", "I hate waiting"]
labels = [1, 0, 1, 0, 1, 0]  # 1=Positive, 0=Negative

# Text Vectorization
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Model Training
model = MultinomialNB()
model.fit(X_train, y_train)

# Prediction
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
    

Accuracy: 0.5



✅ **Explanation:**  
- We use `CountVectorizer` to convert text into numbers.
- Trained a `Naive Bayes Classifier`.
- Evaluated using `accuracy_score`.
    


# 2. Text Classification (Manual Categories)
**Definition:**  
Classify text into predefined topics like 'sports', 'politics', 'technology'.

**Why use Text Classification?**  
- Organize large document collections.
- News categorization.



In [4]:

# Sample Data
texts = ["The match was exciting", "Elections are coming soon", "New AI technology released", "Government passed a new law", "The player scored a goal"]
categories = ["sports", "politics", "technology", "politics", "sports"]

# Text Vectorization
X = vectorizer.fit_transform(texts)

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, categories, test_size=0.3, random_state=42)

# Model Training
model = MultinomialNB()
model.fit(X_train, y_train)

# Prediction
y_pred = model.predict(X_test)
print("Predictions:", y_pred,y_test)
    

Predictions: ['politics' 'sports'] ['politics', 'sports']



✅ **Explanation:**  
- Same method as sentiment analysis but with multiple classes instead of binary.
    


# 3. Language Detection
**Definition:**  
Identify the language of a given text.

**Why use Language Detection?**  
- Routing chats to the right language team.
- Preprocessing before translation tasks.



In [6]:

# Install langdetect library if not installed
# pip install langdetect

from langdetect import detect

# Examples
texts = ["Bonjour tout le monde", "Hello everyone", "Hola amigo", "Wie geht's dir?"]

for text in texts:
    print(f"'{text}' --> {detect(text)}")
    

'Bonjour tout le monde' --> fr
'Hello everyone' --> no
'Hola amigo' --> so
'Wie geht's dir?' --> af



✅ **Explanation:**  
- `langdetect` detects over 55+ languages from simple text input.
    


# 📚 Mini Assignment
- Create your own mini Sentiment Analysis model.
- Build a mini Text Classification project (2 or 3 categories).
- Try detecting language from a paragraph instead of a single sentence.
    