# Topic 05 - Problem 05: Creating Binary Features from Text

---

## 1. About the Problem

In machine learning, models **cannot understand raw text directly**.  
One common approach is to convert text into **binary (0/1) features**.

Examples:
- Does a review contain the word *“good”*?
- Does a message contain *“urgent”*?
- Does a job description mention *“python”*?

In this problem, I will create a binary feature that indicates **whether a product description contains the word "premium"**.

---



## 2. Solution Code

In [3]:
import pandas as pd

# Sample dataset
data = {
    "product_description": [
        "This is a premium quality product",
        "Budget friendly option",
        "Premium design with advanced features",
        "Standard model"
    ]
}

df = pd.DataFrame(data)

# Creating binary feature from text
df['is_premium(0|1)']=df['product_description'].str.lower().str.contains('premium').astype(int)

print(df)


                     product_description  is_premium(0|1)
0      This is a premium quality product                1
1                 Budget friendly option                0
2  Premium design with advanced features                1
3                         Standard model                0


---

## 3. Explanation (What is happening)

- **str.lower()**  
  → Ensures case-insensitive matching

- **str.contains("premium")**  
  → Returns True if word exists, otherwise False

- **astype(int)**  
  → Converts:
  - True → 1  
  - False → 0  

This converts raw text into a numeric ML-friendly feature.

---

## 4. Summary / Takeaways

By solving this problem, I learned:

1. How to convert text into binary features
2. Why ML models require numerical representations
3. How simple text features improve predictions
4. The bridge between text data and ML pipelines

This problem clearly shows **feature engineering mindset** and is excellent for GitHub.

---

Next, I’ll move toward:
- Counting word occurrences
- Text length–based features
