MediaPipe is an open-source framework developed by Google that offers customizable machine learning (ML) pipelines to process media data such as images, video, and audio. It provides a wide range of pre-trained ML models and components to perform various tasks, including:

1. Pose Estimation: Detecting and tracking human body poses in images and video.
2. Hand Tracking: Identifying and tracking hands in images and video.
3. Object Detection: Detecting and tracking objects in images and video streams.
4. Face Detection: Detecting and tracking faces in images and video.
5. Face Mesh: Estimating facial landmarks in real-time.
6. Holistic: Combining multiple components like face detection, pose estimation, and hand tracking to create holistic pipelines.
7. Selfie Segmentation: Segmenting a person's image from the background in real-time.
8. Hair Segmentation: Segmenting hair from images and video.



In [1]:
pip install mediapipe --user


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import mediapipe as mp
import cv2

2024-08-27 10:23:05.240817: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
mp_facedetection=mp.solutions.face_detection 
mp_drawings=mp.solutions.drawing_utils

mp_facedetection=mp.solutions.face_detection: This line imports the face detection module from the Mediapipe library and assigns it to the variable mp_facedetection. This module provides functionality for detecting faces in images or video streams
mp_drawings=mp.solutions.drawing_utils: This line imports the drawing utilities module from the Mediapipe library and assigns it to the varlable mp_drawings.
This module contains functions for drawing annotations, such as bounding boxes or landmarks, on images or video frames.


In [4]:
face_detection=mp_facedetection.FaceDetection()

I0000 00:00:1724734421.678471   56678 gl_context.cc:357] GL version: 2.1 (2.1 INTEL-20.7.2), renderer: Intel(R) Iris(TM) Plus Graphics 640


The FaceDetection class in the Mediapipe library utilizes a pre-trained machine learning model to detect faces in images or video frames. This model has been trained on a large dataset of annotated images, allowing it to leam features and patterns assoclated with human faces.
creating an instance of the face detection model from the Mediapipe library. The line face_detection=mp_facedetection.FaceDetection) initializes an instance of the FaceDetection class,
This instance of the FaceDetection class will allow to perform face detection on images or video streams using the functionality provided by the Mediapipe llbrary, then use this face_detection object to detect faces in images or video frames.


In [5]:
video=cv2.VideoCapture(0)
while True:
    suc,img=video.read()
    img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
    result=face_detection.process(img)
    img=cv2.cvtColor(img,cv2.COLOR_RGB2BGR)
    if result.detections:
        for det in result.detections:
            
            mp_drawings.draw_detection(img,det)
    cv2.imshow('FACE',img)
    if cv2.waitKey(1) & 0XFF==ord('q'):
        break
        
video.release()
cv2.destroyAllWindows()
    
    

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1724734421.706340   91683 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.


In [11]:
result.detections

[label_id: 0
 score: 0.8885659575462341
 location_data {
   format: RELATIVE_BOUNDING_BOX
   relative_bounding_box {
     xmin: 0.39209404587745667
     ymin: 0.5205647349357605
     width: 0.2354254424571991
     height: 0.41853463649749756
   }
   relative_keypoints {
     x: 0.47817090153694153
     y: 0.6443884968757629
   }
   relative_keypoints {
     x: 0.5718467235565186
     y: 0.6577954888343811
   }
   relative_keypoints {
     x: 0.532436728477478
     y: 0.7609246373176575
   }
   relative_keypoints {
     x: 0.5238144397735596
     y: 0.8360314965248108
   }
   relative_keypoints {
     x: 0.40303340554237366
     y: 0.6522131562232971
   }
   relative_keypoints {
     x: 0.6068379878997803
     y: 0.6754410862922668
   }
 }]

face_detection process this code would typically detect faces in the image

and store the result in the variable result

Reading a Frame: suc, img = video.read() captures a frame from the video feed.

Color Conversion (BGR to RGB): img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) converts the image from BGR (OpenCV's default) to RGB (required by MediaPipe).

Face Detection: result = face_detection.process(img) processes the image to detect faces.

Color Conversion (RGB to BGR): img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) converts the image back to BGR for display with OpenCV.

Drawing Face Detections:
If faces are detected (if result.detections:), a loop iterates through each detected face.

mp_drawings.draw_detection(img, det) draws the detected face on the image.

Displaying the Image: cv2.imshow('FACE', img) displays the image with the detected faces.

Exit Condition: if cv2.waitKey(1) & 0xFF == ord('q'): breaks the loop and exits if the 'q' key is pressed.


cv2.waitKey(1): Waits for 1 millisecond for a key event.
cv2.waitKey(1) & 0xFF: The q operation ensures the result is within the ASCII range.
== ord('q'): Compares the result to the ASCII value of 'q' to check if 'q' was pressed.
If 'q' is pressed, the loop breaks,ending the video capture.