In [1]:
import mediapipe as mp
import numpy as np

In [2]:
import mediapipe as mp
import cv2

print("✅ MediaPipe works!")

✅ MediaPipe works!


# Stage 1: Basic Info and Basic Code

# What is Mediapipe?
MediaPipe is an open-source framework by Google for building real-time machine learning pipelines — especially for computer vision tasks.

In simple words: MediaPipe helps you detect faces, hands, pose, objects, gestures, etc. from videos or images — using pretrained ML models, in real-time, and very fast

# Why mediapipe?
✅ Feature --------------------------------------------------------- 💬 What It Means

Pre-trained Models ------------------------------------------------ No need to train — just use and go!

Real-time Processing ---------------------------------------------- Detect gestures, poses instantly on webcam

Cross-Platform ------------------------------------------------ Works on Python, Android, iOS, C++, Web

Lightweight ------------------------------------------------------ Can run on low-end devices like phones

# What Mediapipe do?

🧩 Module-------------------------------💡 What It Does

Hands--------------------------------------Detects 21 hand landmarks per hand 🤚

Face Mesh-------------------------------- Detects 468 facial landmarks 😯

Pose------------------------------------ Full-body pose detection 🧍‍♂️

Holistic-------------------------------- Combines Hands + Face + Pose 🧠

Selfie Segmentation ----------------------------Separates person from background ✂️

Objectron------------------------------------ Detects 3D objects like shoes, cups 🛍️

# How It Works
Takes input from webcam or video

Passes it through ML models

Gives you back keypoints/landmarks

we can use these landmarks to build cool apps like:

 1. Gesture controls
 2. Pose-based games
 3. Virtual makeup
 4. Air drawing 
 5. Sign language recognition
 6. Fitness posture checkers


# Task 1: Real-time Hand Landmark Detection
🧾 What This Will Do:

Open the webcam 📸

Detect our hands ✋

Draw 21 landmark points + hand connections in real-time

In [13]:
import cv2
import mediapipe as mp

cap = cv2.VideoCapture(0)        #Opens the default webcam (0) for capturing live video frames

hands = mp.solutions.hands.Hands()
draw = mp.solutions.drawing_utils

while True:         #Keeps running frame-by-frame until you press 'q' to quit.
    ret, frame = cap.read() #cap.read() captures one frame from webcam
                           # If it fails (ret == False), we stop the loop
    if not ret:
        break

    img_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) #OpenCV captures in BGR, but MediaPipe needs RGB
                                        #This converts the color format
    result = hands.process(img_rgb)

    if result.multi_hand_landmarks:
        for hand in result.multi_hand_landmarks:
            draw.draw_landmarks(frame, hand, mp.solutions.hands.HAND_CONNECTIONS)



    frame = cv2.flip(frame,1)  # This will flip the video since there is a change in left and right direction 
    cv2.imshow("Hand Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('p'):     #Checks if you pressed the p keyvIf yes, breaks the loop
        break

cap.release()
cv2.destroyAllWindows()


mp.solutions is just a namespace inside the MediaPipe package.

It groups all the different ML models: hands, face_mesh, pose, holistic, etc.

When you do mp.solutions.hands, you're accessing the Hands module

mp.solutions is not just a module full of functions — it's more like a package (submodule) inside MediaPipe that contains different ML models as classes, not just functions.

mediapipe/

│

├── solutions/

│ ├── hands.py ← Hand tracking model

│ ├── pose.py ← Full-body pose model

│ ├── face_mesh.py ← Facial landmark model

│ ├── holistic.py ← Combo of all above

│ └── drawing_utils.py← Functions for drawing

mp.solutions.hands: Loads the hand tracking module

hands = mp_hands.Hands(): Initializes the model with default settings (detects up to 2 hands, in real-time)

mp_draw: A helper to draw landmarks on the hand points and connections between these points (like skeleton lines)

# results = hands.process(img_rgb)
This line is where MediaPipe does its job.

🔍 Step-by-step Explanation:

img_rgb:

This is the current image frame (from your webcam, for example).

It has been converted to RGB format because MediaPipe works with RGB images, not BGR (which OpenCV uses by default).

-hands.process(img_rgb):

MediaPipe looks at the image and tries to find hands in it.

It uses a machine learning model (a pre-trained deep learning model) to do this.

It analyzes the image and checks: "Are there any hands here? If yes, where exactly?"

-results:

The output of the .process() method.

It contains information about what was detected in the image.

🖐️ results.multi_hand_landmarks

If MediaPipe finds hands, this will be a list of hands.

Each hand has 21 landmarks (important points like fingertips, joints, etc.).

You can loop through this list to draw or analyze hand positions

# Loop through the 21 landmarks
    for id, lm in enumerate(hand_landmarks.landmark):
        # lm contains x, y, z (normalized values between 0 and 1)
        h, w, c = img.shape  # Get image height, width, channels
        cx, cy = int(lm.x * w), int(lm.y * h)  # Convert to pixel values

        print(f"Landmark {id}: (x={cx}, y={cy})")



        This code will give the co ordinates of all those 21 points in our hand

        A breakdown of code:

        hand_landmarks.landmark → List of 21 points on the hand.

        enumerate(...) → Gives you both the landmark index (id) and the landmark itself (lm).

        lm.x and lm.y → Coordinates in normalized form (0 to 1).

        cx, cy → Converted to pixel positions using image width and height.

        print(...) → Shows the (x, y) location of each point in the frame.

# Stage 2 :
# Task 1: Extract Hand Landmark Coordinates

In [29]:
import cv2
import mediapipe as mp

# Initialize MediaPipe Hands
mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
mp_draw = mp.solutions.drawing_utils

# Start webcam
cap = cv2.VideoCapture(0)

while True:
    success, img = cap.read()
    if not success:
        break
    img = cv2.flip(img,1)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            # Draw hand landmarks
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)

            # Extract landmark coordinates
            for id, lm in enumerate(hand_landmarks.landmark):
                h, w, c = img.shape  # Get image dimensions
                cx, cy = int(lm.x * w), int(lm.y * h)
                print(f"Landmark {id}: X={cx}, Y={cy}")

    cv2.imshow("MediaPipe Hands", img)

    if cv2.waitKey(1) & 0xFF == ord('p'):
        break

cap.release()
cv2.destroyAllWindows()


Landmark 0: X=79, Y=327
Landmark 1: X=122, Y=322
Landmark 2: X=164, Y=301
Landmark 3: X=197, Y=280
Landmark 4: X=221, Y=259
Landmark 5: X=142, Y=239
Landmark 6: X=163, Y=199
Landmark 7: X=174, Y=172
Landmark 8: X=183, Y=148
Landmark 9: X=116, Y=225
Landmark 10: X=132, Y=178
Landmark 11: X=142, Y=146
Landmark 12: X=149, Y=120
Landmark 13: X=89, Y=223
Landmark 14: X=100, Y=177
Landmark 15: X=108, Y=147
Landmark 16: X=115, Y=123
Landmark 17: X=61, Y=230
Landmark 18: X=63, Y=195
Landmark 19: X=66, Y=170
Landmark 20: X=70, Y=149
Landmark 0: X=83, Y=320
Landmark 1: X=129, Y=312
Landmark 2: X=172, Y=291
Landmark 3: X=202, Y=268
Landmark 4: X=226, Y=248
Landmark 5: X=149, Y=223
Landmark 6: X=170, Y=181
Landmark 7: X=182, Y=155
Landmark 8: X=191, Y=132
Landmark 9: X=122, Y=211
Landmark 10: X=138, Y=162
Landmark 11: X=146, Y=130
Landmark 12: X=153, Y=104
Landmark 13: X=95, Y=210
Landmark 14: X=108, Y=162
Landmark 15: X=115, Y=132
Landmark 16: X=122, Y=108
Landmark 17: X=67, Y=218
Landmark 18: X=

# What this code will do in the upper code
if results.multi_hand_landmarks:

for hand_landmarks in results.multi_hand_landmarks:
  
    mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)
If any hands are detected:

Loops through each detected hand

Draws 21 landmark points and connections on the hand

# Extract Pixel Coordinates
# Code
for id, lm in enumerate(hand_landmarks.landmark):
    h, w, c = img.shape
    cx, cy = int(lm.x * w), int(lm.y * h)
    print(f"Landmark {id}: X={cx}, Y={cy}")
# Explaination
for id, lm in enumerate(hand_landmarks.landmark):
    hand_landmarks.landmark is a list containing 21 landmarks for a hand.
    Each lm is one landmark, holding x, y, z values (normalized between 0 to 1).
    enumerate() is used so we get:
    id → index number of landmark (0–20)
    lm → the landmark object itself (with x, y, z)
h, w, c = img.shape

    img.shape gives:
    h = image height in pixels
    w = image width in pixels
    c = number of color channels (should be 3 → B, G, R)

    Example:

    If  webcam frame is 640×480 with 3 color channels, then img.shape will be (480, 640, 3)

cx, cy = int(lm.x * w), int(lm.y * h)

    lm.x and lm.y are normalized values between 0 and 1

    → 0 = left/top edge, 1 = right/bottom edge (lm.x = 0 → point is at the left edge, lm.x = 1 → point is at the right edge, lm.x = 0.5 → point is in       the middle horizontally, Same thing for lm.y (from top to bottom))

    Multiply them by actual image width and height to convert them to pixel positions

    int() converts the float to integer because pixel coordinates must be whole numbers.

    Example:

    If lm.x = 0.5 and image width is 640:

    cx = 0.5 * 640 = 320

    Same for cy

    lm.x and lm.y are between 0 and 1

    👉 This is before multiplying.

    When you multiply them by w and h

    👉 You get pixel values

    👉 Pixel values are not between 0 and 1 anymore

    👉 They now range from 0 to image width and 0 to image height

    # Why we need pixel value/position:

    ✅ Because when we want to interact with the actual image — like:

    Drawing a circle at a point

    Drawing lines between points


     We have to use actual pixel positions since images in OpenCV, PIL, or any imaging library are stored as arrays of pixels, indexed by integer values like (x, y) where:

    x goes from 0 to image width-1

    y goes from 0 to image height-1

    📌 So the pixel range is needed for:

    Accurately placing/drawing things on the actual image

    Interacting with image arrays (since arrays use integer indices)

    Measuring actual pixel distances or areas

    ✅ Model predicts normalized positions

    ✅ We convert them to pixel positions to use on image

    ✅ Image pixel values stay unchanged unless we intentionally modify

# Task 2 : Landmark ID Mapping to Hand Anatomy (Mapping)
Understand Landmark IDs

📖 Why This Matters:

Each hand has 21 landmarks — each with a unique ID from 0 to 20.

If we know which ID belongs to which part of the hand, we can build logic like:

“If landmark 8 is above landmark 6 → index finger is up”

“If thumb tip (4) is to the left of thumb MCP (2) → thumb is extended” 

# Task 3:  Build Logic to Detect If Finger is Up or Down(How to decide)??

In [41]:
import cv2
import mediapipe as mp

cap = cv2.VideoCapture(0)

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=1)  # this takes a value max number of hands as parameter 

mp_draw = mp.solutions.drawing_utils

while True:
    success, img = cap.read()
    img=cv2.flip(img,1)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)
    
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            # Draw landmarks
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)
            # Get landmark positions
            h, w, c = img.shape
            landmarks = []

            for id, lm in enumerate(hand_landmarks.landmark):
                cx, cy = int(lm.x * w), int(lm.y * h)
                landmarks.append((id, cx, cy))

            
            # Now check if Index Finger is up
            index_tip_y = landmarks[8][2]
            index_pip_y = landmarks[6][2]

            if index_tip_y < index_pip_y:
                cv2.putText(img, "Index Finger UP", (10, 50),
                            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
            else:
                cv2.putText(img, "Index Finger DOWN", (10, 50),
                            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    
    cv2.imshow("Hand Tracking", img)

    if cv2.waitKey(1) & 0xFF == ord('p'):
        break

cap.release()
cv2.destroyAllWindows()


# Explaination of the code
h, w, c = img.shape
landmarks = []
for id, lm in enumerate(hand_landmarks.landmark):
cx, cy = int(lm.x * w), int(lm.y * h)
landmarks.append((id, cx, cy))

    In this img.shape is used to get the dimensions of the img which is used in normalization
    after then we have created a list landmark here we are storing all the landmark information for each frame.
    The landmarks array holds tuples of (ID, X, Y) for each landmark. 
    Here:
    id is the landmark ID (like 0 for wrist, 8 for index finger tip, etc.).
    cx is the X position of the landmark in pixels.
    cy is the Y position of the landmark in pixels.


index_tip_y = landmarks[8][2]
index_pip_y = landmarks[6][2]

    The landmarks list allows you to access the position of any landmark by its ID (e.g., landmarks[8] for the index finger tip).
    📌 Why landmarks[8][2]? What is the use of 2 here??
    Now — Python list/tuple indexing works like this:
    [0] → first value
    [1] → second value
    [2] → third value
    Meaning:
    landmarks[8][0] → 8 (the landmark ID)
    landmarks[8][1] → cx (X position)
    landmarks[8][2] → cy (Y position)

    Here We’re grabbing the Y coordinate (vertical position in pixels) of the index finger tip.

    📖 Why do we need Y position?
    Because — to check if a finger is up, you need to see if the tip’s Y position is higher (i.e. smaller) than the middle joint’s Y position.

if index_tip_y < index_pip_y:
    cv2.putText(img, "Index Finger UP", (10, 50),
    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
 else:
    cv2.putText(img, "Index Finger DOWN", (10, 50),
    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

    This checks whether the tip of the index finger is above its proximal interphalangeal (PIP) joint in the image.
    If up then Then draws the text "Index Finger UP" on the image at coordinates (10, 50). Uses the FONT_HERSHEY_SIMPLEX font. Text size scale is 1.
    Color is (0, 255, 0) → green. Thickness is 2 pixels.
    If the tip is not above the PIP joint (i.e., it’s at same level or lower down the image), then. Draws "Index Finger DOWN" text at the same position.
    Color is (0, 0, 255) → red. Thickness is 2 pixels.

# Task 4 :  Detect and track both hands if present

In [45]:
import cv2
import mediapipe as mp

cap = cv2.VideoCapture(0)

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=2)  # Allow up to 2 hands
mp_draw = mp.solutions.drawing_utils

while True:
    success, img = cap.read()
    img = cv2.flip(img,1)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)

    if results.multi_hand_landmarks:
        for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
            # Draw landmarks
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)

            # Get image dimensions
            h, w, c = img.shape

            for id, lm in enumerate(hand_landmarks.landmark):
                cx, cy = int(lm.x * w), int(lm.y * h)
                if id == 8:  # Index Finger Tip
                    cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)
                    cv2.putText(img, f"Hand {hand_no+1} - Index Tip", (cx-50, cy-20),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)

    cv2.imshow("Multiple Hands", img)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


# Task 5: Count Number of Fingers Up in right hand

Since in mediapipe there 21 land marks and while receiving our hand we receive all these landmarks 

So instead of checking all 21 landmarks every time, I just provided the list of the finger tips:

These are always the tips, and we use them to:

Count how many fingers are raised.

Check gesture conditions.

Do finger detection logic like:

In [51]:
import cv2
import mediapipe as mp

# Initialize video capture
cap = cv2.VideoCapture(0)

# MediaPipe hands module setup
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=2)  # Now detect up to 2 hands
mp_draw = mp.solutions.drawing_utils

# Landmark indices for finger tips
finger_tips_ids = [4, 8, 12, 16, 20]

while True:
    success, img = cap.read()
    img = cv2.flip(img,1)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)

    total_count = 0  # This will hold the sum of fingers up from all detected hands.

    if results.multi_hand_landmarks:      #If hands are detected, loop over each detected hand.
        for hand_landmarks in results.multi_hand_landmarks:
            lm_list = []
            h, w, c = img.shape
            for id, lm in enumerate(hand_landmarks.landmark):
                lm_list.append((int(lm.x * w), int(lm.y * h)))        #Converts MediaPipe’s normalized landmark positions (0-1) 
                                                         #to pixel coordinates based on image width and height to get Pixel Positions of Landmarks
 
            fingers = []

            # Thumb (check x-axis because it's sideways)
            if lm_list[finger_tips_ids[0]][0] < lm_list[finger_tips_ids[0] - 1][0]:
                fingers.append(1)
            else:
                fingers.append(0)

            # Other four fingers (check y-axis)
            for id in range(1, 5):
                if lm_list[finger_tips_ids[id]][1] < lm_list[finger_tips_ids[id] - 2][1]:
                    fingers.append(1)
                else:
                    fingers.append(0)

            total_count += fingers.count(1) #Counts how many 1s are in the fingers list (fingers up)
                                             #and adds to the overall total_count.

            # Draw landmarks and connections
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)

    # Display the total count on screen
    cv2.rectangle(img, (20, 300), (270, 425), (0, 255, 0), cv2.FILLED) #Draws a green rectangle
    cv2.putText(img, str(total_count), (45, 400), cv2.FONT_HERSHEY_SIMPLEX,
                5, (255, 0, 0  ), 10) #Prints the current total finger count in large blue text.

    cv2.imshow("Finger Counter", img)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release resources
cap.release()
cv2.destroyAllWindows()


# 🤔 Why do we create the fingers list?
we want to store the state (up/down) of each of the 5 fingers — one by one — for the current hand we're analyzing.

Each value in this list will be either:

1 → finger is up ✅

0 → finger is down ❌

After calculating which fingers are up (based on tip position vs joint), we append the result (0 or 1) to this list.

In [54]:
import cv2
import mediapipe as mp

cap = cv2.VideoCapture(0)

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=2)  # Allow up to 2 hands
mp_draw = mp.solutions.drawing_utils

while True:
    success, img = cap.read()
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)

    if results.multi_hand_landmarks:
        for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
            # Draw landmarks
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)

            # Get image dimensions
            h, w, c = img.shape

            for id, lm in enumerate(hand_landmarks.landmark):
                cx, cy = int(lm.x * w), int(lm.y * h)
                if id == 8:  # Index Finger Tip
                    cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)
                    cv2.putText(img, f"Hand {hand_no+1} - Index Tip", (cx-50, cy-20),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)

    cv2.imshow("Multiple Hands", img)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


# Task 6 : Updated code of task 5 with left/right detection¶

In [57]:
import cv2
import mediapipe as mp

cap = cv2.VideoCapture(0)

mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=2)
mp_draw = mp.solutions.drawing_utils

finger_tips_ids = [4, 8, 12, 16, 20]

while True:
    success, img = cap.read()
    img = cv2.flip(img,1)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)

    total_count = 0

    if results.multi_hand_landmarks:
        for hand_index, hand_landmarks in enumerate(results.multi_hand_landmarks):
            lm_list = []
            h, w, c = img.shape
            for id, lm in enumerate(hand_landmarks.landmark):
                lm_list.append((int(lm.x * w), int(lm.y * h)))

            fingers = []

            # Get handedness: "Right" or "Left"
            hand_label = results.multi_handedness[hand_index].classification[0].label

            # Thumb logic depends on hand
            if hand_label == "Right":
                if lm_list[finger_tips_ids[0]][0] < lm_list[finger_tips_ids[0] - 1][0]:
                    fingers.append(1)
                else:
                    fingers.append(0)
            else:  # Left hand
                if lm_list[finger_tips_ids[0]][0] > lm_list[finger_tips_ids[0] - 1][0]:
                    fingers.append(1)
                else:
                    fingers.append(0)

            # Other four fingers (common for both hands)
            for id in range(1, 5):
                if lm_list[finger_tips_ids[id]][1] < lm_list[finger_tips_ids[id] - 2][1]:
                    fingers.append(1)
                else:
                    fingers.append(0)

            total_count += fingers.count(1)

            # Draw landmarks
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)

    # Display total count
    cv2.rectangle(img, (20, 300), (270, 425), (0, 255, 0), cv2.FILLED)
    cv2.putText(img, str(total_count), (45, 400), cv2.FONT_HERSHEY_SIMPLEX,
                5, (255, 0, 0), 10)

    cv2.imshow("Finger Counter", img)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


hand_label = results.multi_handedness[hand_index].classification[0].label

    results.multi_handedness:- gives you a list of detected hands’ handedness (i.e. whether    it's a left or right hand).

    hand_index:- points to a specific hand from the detected list (like the first or second hand detected in the frame).

    classification[0].label:- accesses the classification result for that hand — and this will be either "Left" or "Right".

MediaPipe uses the relative position of landmarks in the detected hand region to infer whether it's a left or right hand, based on its internal trained model. It looks at landmark orientations and positions relative to each other (like thumb vs. index finger positions) to guess handedness.



# Detailed 

1.Hand Detection
    First, MediaPipe’s palm detection model finds potential hand regions in the image (it's a lightweight detector designed for speed).

2.Hand Landmark Model
    Once a hand is detected, another model runs inside that region to identify 21 hand landmarks (key points like fingertips, joints, wrist etc.) — in 3D (x, y, z coordinates).
    0: Wrist
    1-4: Thumb points
    5-8: Index finger points
    9-12: Middle finger points
    13-16: Ring finger points
    17-20: Pinky finger points

3.Handedness Classification
    After getting the 21 landmarks, MediaPipe uses the positions and orientations of these landmarks to classify whether it’s a left or right hand.


📏 How Does It Decide Left or Right?

The model essentially looks at the relative positions of key landmarks like the thumb tip (landmark 4) and index finger (landmark 8).

Based on the spatial arrangement and orientation of these points (like the thumb being on the left or right side of the hand's palm region, relative to the index and middle fingers), it can infer the handedness.

For example:

If thumb landmark is on the left side of the detected hand region → it’s likely a Right hand.

If thumb landmark is on the right side of the detected hand region → it’s likely a Left hand.

This is done internally by a trained classification model that has learned to recognize these patterns from labeled training images.


Along with the label ('Left' or 'Right'), MediaPipe also provides a confidence score — how sure the model is about its decision.

results.multi_handedness[hand_index].classification[0].score
 Example: 0.95 (95% confidence)

# Task 7: Measure Distance Between Two Landmarks

In [61]:
import cv2
import mediapipe as mp
import math

mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
mp_draw = mp.solutions.drawing_utils

cap = cv2.VideoCapture(0)

while True:
    success, img = cap.read()
    img = cv2.flip(img,1)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)

            # Get height, width
            h, w, c = img.shape

            # Get landmark positions
            x1, y1 = int(hand_landmarks.landmark[4].x * w), int(hand_landmarks.landmark[4].y * h)
            x2, y2 = int(hand_landmarks.landmark[8].x * w), int(hand_landmarks.landmark[8].y * h)

            # Draw circles at points
            cv2.circle(img, (x1, y1), 10, (255, 0, 0), cv2.FILLED)
            cv2.circle(img, (x2, y2), 10, (255, 0, 0), cv2.FILLED)

            # Draws a green line connecting both tips.
            cv2.line(img, (x1, y1), (x2, y2), (0, 255, 0), 3)

            # Calculate distance
            distance = int(math.hypot(x2 - x1, y2 - y1))
            print("Distance:", distance)

            # Show distance on screen
            cv2.putText(img, f'{distance}', ((x1 + x2)//2, (y1 + y2)//2),
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

    cv2.imshow("Hand Tracking", img)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


Distance: 138
Distance: 118
Distance: 125
Distance: 126
Distance: 129
Distance: 131
Distance: 134
Distance: 129
Distance: 128
Distance: 36
Distance: 39
Distance: 45
Distance: 36
Distance: 33
Distance: 33
Distance: 34
Distance: 31
Distance: 33
Distance: 39
Distance: 41
Distance: 58
Distance: 64
Distance: 116
Distance: 129
Distance: 132
Distance: 143
Distance: 141
Distance: 141
Distance: 141
Distance: 140
Distance: 138
Distance: 139
Distance: 127
Distance: 124
Distance: 125
Distance: 123
Distance: 123
Distance: 123
Distance: 123
Distance: 129
Distance: 130
Distance: 131
Distance: 125
Distance: 125
Distance: 126
Distance: 126
Distance: 127
Distance: 129
Distance: 128
Distance: 130
Distance: 123
Distance: 122
Distance: 123
Distance: 102
Distance: 25
Distance: 15
Distance: 17
Distance: 15
Distance: 14
Distance: 15
Distance: 12
Distance: 14
Distance: 10
Distance: 14
Distance: 23
Distance: 69
Distance: 105
Distance: 109
Distance: 164
Distance: 176
Distance: 176
Distance: 177
Distance: 186
Dis