# **Computer Vision with AI: Hands-On Projects**

## **Section 1: Introduction to Computer Vision (Theory)**


### **What is Computer Vision?**
Computer Vision (CV) is a field of artificial intelligence (AI) that enables machines to **see, interpret, and understand visual data** like humans. It involves teaching machines to process images and videos, recognize objects, detect patterns, and make decisions based on visual inputs.

- **Why Does Computer Vision Matter?**
  - Over **3 billion images** are shared online every day.
  - Object recognition accuracy has improved from **50% to 99%** in less than a decade.
  - CV is used in **self-driving cars, medical imaging, security systems, augmented reality, and more**.



### **How Does Computer Vision Work?**
1. **Image Acquisition**: Capturing images or videos using cameras.
2. **Preprocessing**: Cleaning and enhancing images (e.g., resizing, filtering).
3. **Feature Extraction**: Identifying key patterns or features (e.g., edges, textures).
4. **Model Training**: Using machine learning (ML) models, especially **Convolutional Neural Networks (CNNs)**, to learn from labeled datasets.
5. **Prediction**: The model makes predictions (e.g., classifying objects, detecting faces).



### **Key Applications of Computer Vision**
1. **Object Detection**: Identifying objects in images (e.g., detecting cars in traffic).
2. **Facial Recognition**: Recognizing or verifying individuals (e.g., unlocking smartphones).
3. **Medical Imaging**: Analyzing X-rays, MRIs, and CT scans for diagnostics.
4. **Augmented Reality (AR)**: Overlaying digital content on the real world (e.g., Snapchat filters).
5. **Autonomous Vehicles**: Enabling cars to navigate and avoid obstacles.


### **Why Learn Computer Vision?**
- It’s a **high-demand skill** in industries like healthcare, automotive, and entertainment.
- It combines **creativity and technology**, making it fun and rewarding.
- You’ll build **real-world applications** that can impact millions of lives.

## **Section 2: Setting Up the Environment**


### **Installing Required Libraries**
We’ll use popular Python libraries for computer vision:
- **OpenCV**: For image processing.
- **MediaPipe**: For pose estimation.
- **DeepFace**: For emotion detection.
- **Transformers**: For image captioning.


In [None]:
!pip install opencv-python mediapipe deepface transformers

## **Section 3: Hands-On Projects**


### **Project 1: Face Blurring Tool**
#### **Goal**: Detect and blur faces in an image for privacy.

#### **Why It’s Useful**:
- Protects privacy in images shared online.
- Used in security systems and social media platforms.

#### **Libraries Used**:
- **OpenCV**: A powerful library for image processing.
- **Haar Cascades**: A pre-trained model for face detection.


In [None]:

import cv2
from google.colab.patches import cv2_imshow

# Load pre-trained face detector
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

# Load image
image = cv2.imread("group_photo.jpg")

# Detect faces
faces = face_cascade.detectMultiScale(image, scaleFactor=1.1, minNeighbors=5)

# Blur each face
for (x, y, w, h) in faces:
    face = image[y:y+h, x:x+w]
    face = cv2.GaussianBlur(face, (99, 99), 30)
    image[y:y+h, x:x+w] = face

# Show result
cv2_imshow(image)



### **Project 2: AI-Powered Sketch Artist**
#### **Goal**: Convert an image into a pencil sketch.

#### **Why It’s Useful**:
- Turns photos into artistic sketches.
- Used in photo editing apps and creative tools.

#### **Libraries Used**:
- **OpenCV**: For image transformations.


In [None]:

import cv2
from google.colab.patches import cv2_imshow

# Load image
image = cv2.imread("photo.jpg")

# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Invert the grayscale image
inverted = cv2.bitwise_not(gray)

# Apply Gaussian blur
blurred = cv2.GaussianBlur(inverted, (21, 21), 0)

# Invert the blurred image and blend with grayscale
sketch = cv2.divide(gray, cv2.bitwise_not(blurred), scale=256.0)

# Show result
cv2_imshow(sketch)



### **Project 3: Emotion Detector**
#### **Goal**: Detect emotions from facial expressions.

#### **Why It’s Useful**:
- Used in customer feedback systems, mental health apps, and security.

#### **Libraries Used**:
- **DeepFace**: A pre-trained model for emotion detection.


In [None]:

from deepface import DeepFace
from PIL import Image

# Load image
image_path = "face.jpg"

# Analyze emotions
result = DeepFace.analyze(image_path, actions=["emotion"])

# Display result
print("Detected Emotion:", result[0]["dominant_emotion"])
Image.open(image_path).show()



### **Project 4: Pose Estimation**
#### **Goal**: Detect and visualize human poses in an image.

#### **Why It’s Useful**:
- Used in fitness apps, animation, and surveillance.

#### **Libraries Used**:
- **MediaPipe**: A framework for pose detection.


In [None]:

import cv2
import mediapipe as mp
from google.colab.patches import cv2_imshow

# Initialize MediaPipe Pose
mp_pose = mp.solutions.pose
pose = mp_pose.Pose()

# Load image
image = cv2.imread("person.jpg")

# Process image
results = pose.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

# Draw pose landmarks
if results.pose_landmarks:
    mp.solutions.drawing_utils.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)

# Show result
cv2_imshow(image)



### **Project 5: AI-Powered Image Captioning**
#### **Goal**: Generate captions for images.

#### **Why It’s Useful**:
- Helps visually impaired individuals understand images.
- Used in social media and content creation.

#### **Libraries Used**:
- **Transformers**: A library for pre-trained NLP models.


In [None]:

from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image

# Load pre-trained model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Load image
image = Image.open("image.jpg")

# Generate caption
inputs = processor(image, return_tensors="pt")
out = model.generate(**inputs)
caption = processor.decode(out[0], skip_special_tokens=True)

print("Caption:", caption)
