# AI Virtual Mouse 

In this project, we are going to create an AI based Mouse Controller. We will first detect the hand landmarks and then track and click based on these points. We will also apply smoothing techniques to make it more usable.

I have created a python file called `HandTrackingModule.py` where it has useful functions already like the `fingersUp()` and `findDistance()` methods and these methods will allow us to very easily create this new project.

So for this will need couple of libraries/modules such as :

- mediapipe
- numpy
- HandTrackingModule (the one we created)
- OpenCV
- time
- autopy : The autopy library is used in Python for automating tasks by controlling the keyboard, mouse, and screen. It allows you to simulate user inputs and interactions, making it useful for tasks like GUI automation, testing, and even building bots.

So will import all of them

In [1]:
!pip install autopy

Collecting autopy
  Using cached autopy-4.0.0.tar.gz (20 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'


  error: subprocess-exited-with-error
  
  python setup.py egg_info did not run successfully.
  exit code: 1
  
  [6 lines of output]
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "C:\Users\User\AppData\Local\Temp\pip-install-sps3kris\autopy_0e3aef85221d4aea94dfd151b3f796eb\setup.py", line 8, in <module>
      from setuptools_rust import Binding, RustExtension
  ModuleNotFoundError: No module named 'setuptools_rust'
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.


For some reason autopy only supports python version 3.8 and if we downgrade our Python version then some other packages might not work, so hence will try to use another alternative which is `PyAutoGUI` and it can be installed using 

```python
pip install pyautogui
```

In [1]:
import cv2
import numpy as np
import HandTrackingModule as htm
import time
import pyautogui
from pynput.keyboard import Controller, Key

Okay, so first thing is will run our webcam using opencv to see if its working

In [2]:
cap = cv2.VideoCapture(1)

Second thing is we have to have a fixed width and height, so we cant leave it to the default of the camera, so will change our width and height using `cap.set()`

In the context of using OpenCV with the `cap.set()` function, the parameters you see—like cap.set(3, 450)—are used to set properties for the video capture object, cap.

`set()`: This method is used to set a property of the video capture object.

- `3`: This is a property identifier. In OpenCV, property identifiers are represented by integer values. Specifically, `3` corresponds to the `CV_CAP_PROP_FRAME_WIDTH` property, which is used to set the width of the frames captured by the video capture object, similarly for height its `4`
- `450`: This is the value being assigned to the property. In this case, it sets the frame width to 450 pixels.

`cap.set()` Adjusts the resolution of the captured frames from the camera. If the camera does not support the specified resolution, the settings may not take effect.

In [3]:
WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

True

In [8]:
import cv2

# Try camera indices 0 through 9
for i in range(10):
    cap = cv2.VideoCapture(i)
    if cap.isOpened():
        print(f"Camera found at index {i}")
        cap.release()  # Release the camera
    else:
        print(f"No camera at index {i}")


Camera found at index 0
Camera found at index 1
No camera at index 2
No camera at index 3
No camera at index 4
No camera at index 5
No camera at index 6
No camera at index 7
No camera at index 8
No camera at index 9


Now will simply write our code for capturing frames from the webcam

In [9]:
cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    '''
    This reads a frame (image) from the video capture. 
    success is True if the frame is read correctly, and frame is the actual image.
    '''
    success, frame = cap.read()
    '''This displays the captured frame in a window titled "image".'''
    cv2.imshow("image", frame)
    
    # Resize the window
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    
    '''
    cv2.waitKey(10): This waits for 10 milliseconds for a key press. 
    It checks if a key is pressed during that time.
    & 0xFF: This ensures that the result is within the range of valid key codes, 
    as some systems may return more than 8 bits.
    == ord("q"): This checks if the key pressed is the letter "q". 
    The ord("q") function gets the ASCII value of the character "q".
    '''
    if cv2.waitKey(10) & 0xFF == ord("q"):
        '''If the key pressed is "q", this command exits the loop.'''
        break 
        
cap.release()
cv2.destroyAllWindows()

So that is all good!

Now next what we can do is add our Detector for hand tracking, first lets discuss the steps we are going to take to create this project:

1. **Find hand landmarks :** This is the first step
2. **Get the tip of the index and middle fingers :** Second step is we want to get the tip of the index and the middle finger, so the idea is, if we have just the index finger up then the mouse cursor will move, if we also have the middle finger up then it will be in clicking mode, we also need to check the distance between two fingers so if the distance is less than a certain value then will detect it as a **click**, so u can bring ur fingers together and then click, this will be **clicking mode** and in this mode u wont be able to move the cursor unless u put ur middle finger down so ur in **moving mode**
3. **Check which fingers are up :** So once we have the tip of both fingers, we will check which fingers are up
4. **Only index finger (moving mode) :** Then based on the information, we will check if its in moving mode (index mode)
5. **Convert Coordinates :** And if it is in moving mode, then we are going to convert our Coordinates, now why do we need to convert? Because our Webcam will give us a value of lets say 640 to 480, the one who made this tutorial, his screen in in HD 920 by 1080, so we need to convert these coordinates so that we get the correct positioning
6. **Smoothen Values :** Then will add another step to **Smoothen** the values, why do we need to do that? So that the cursor is not very jittery or doesnt flicker alot
7. **Move Mouse :** Once smoothen is done we can simply move our mouse
8. **Both Index and Middle fingers are up then Clicking mode :** Then we need to check when we are in clicking mode, so when both the fingers are up then it is in Clicking mode
9. **Find distance between fingers :** So then we find the distance between these fingers
10. **Click mouse if distance is short :** Then if the distance is short then we are going to CLICK
11. **Frame Rate :** Checking the frame rates, the 11 and 12 steps are pretty easy 
12. **Dislay :** We have already done the 12th display step since its just to display/render whats happening

These steps might seems alot but some of them are actually easy, some of them are single lines so dont worry about these

Now we can first go on with the frame rates as its very simple, we can simply get it using `time` module and display it on the screen using `cv2.putText()` which takes in the `frames`, the `text` which in our case is `fps`, `coordinates position`, cv2 fonts from `cv2.FONT_HERSHEY_PLAIN`, The thicken which we put as `3` then simply the color then again `3` which is also the thickness

In [5]:
previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(
        frame, # image to put text on
        str(int(fps)), # the text
        (20, 50), # position
        cv2.FONT_HERSHEY_PLAIN, # font
        3, # scale/size of font
        (255, 0, 0), # color
        3 # thickness
    )
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

So if we run the above cell u should see your FPS show up on your webcam.

So now we have the fps and is displaying them as well, now what we will do is the rest of the steps.

So first of all we have to get the landmarks, to get the landmark we have to declare the detector on top, we have to create the object of the detector from `HandTrackingModule` module,

And inside the detector we can add like the maximum hands, so we are only expecting one hand so will pass `maxHands=1`

In [4]:
detector = htm.handDetector(
    maxHands=1
)

Then inside the loop we will apply `findHands()` from the detector to the frame, 

then will find the position of this hand, we can do it by 

        lmlist, bbox = detector.findPosition(frame)

In [7]:
previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()



So when u run above code, it should start Detecting your hand, it would start detecting the bounding box, the fingers and the landmarks as well, so thats pretty good. so that was our step 1 and we are done with it, 

Now we will checkl that if our length of the `lmlist` is not equal to 0, then we are going to get the TIP info, so we are getting `x1` and `y1` which is just the points of the index finger.

Similarly will do the same for the middle finger and store it in `x2` and `y2`

So these will give us the coordinates of our Index and Middle fingers

We do not to draw this at this point, we can just print them out if u want, we can just `print(x1, y1, x2, y2)`

In [8]:
previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:]
        x2, y2 = lmlist[12][1:]
        print(x1, y1, x2, y2)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

611 386 565 385
575 308 525 299
574 292 523 278
529 185 486 184
532 166 467 132
478 103 409 74
443 72 380 50
426 58 369 39
424 59 367 41
420 62 363 46
418 63 360 48
414 68 358 51
411 71 356 54
412 75 360 56
415 76 361 57
416 76 358 57
415 76 346 73
407 83 369 173
408 84 375 211
415 87 394 266
417 89 396 269
418 89 393 274
417 90 393 277
417 90 391 277
418 89 390 280
417 88 390 280
420 89 394 279
423 89 392 274
434 89 388 251
452 95 379 141
453 111 378 80
453 115 380 78
448 133 394 78
454 153 406 86
455 193 428 141
455 198 430 159
474 197 445 170
465 211 452 130
470 228 448 93
474 222 444 87
474 223 443 82
479 218 446 79
465 217 460 83
497 136 460 73
498 132 458 71
500 118 458 74
520 118 472 80
570 155 523 114
896 524 926 573


You can see above its printing the points of the fingers, u can put any of the fingers up and down or both and see it works perfectly fine!

Okay so this is good! we are done with the step 2

Now will go onto the third part which is to Check which finger are up, now these is exremely simple because we have already created a method in the  `HandTrackingModule` module by the name of `fingersUp()`, all we have to do is we have to call it and simply print it to see

In [9]:
previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:]
        x2, y2 = lmlist[12][1:]
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        print(fingers)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[0, 1, 1, 1, 1]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 1, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 1, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 1, 1, 0]
[0, 0, 1, 1, 1]
[0, 0, 1, 1, 1]
[0, 0, 1

When u run the code above u will see a One hot encoded kind of array, of 0 and 1, if u lift first finger up, it will be 1, else 0, same with the other fingers, its pretty cool

Now lets go to the next step which is step 4, So now we need to check if only the index finger is up, so we can write if fingers at index `[1]` (which is the index finger) is equal to 1 (means its up) and fingers at index `[2]` (mid finger) is equal to 0 (meaning its down), 

So basically this is when the index finger is up and the middle finger is down, so this will be the moving mode, so here we need to check wherre our fingers is moving so we get those points and we send it to the mouse cursor

Also before this, we need to do step 5 and convert the coordinates, we need to convert 1 range to another range using `np.interp()`, in here we want to convert `x1` values from range of 0 to the width of our webcam and the second range is from 0 to Width of the screen, and same thing we will do for the height, will store both in `x3` and `y3`

But we havent gotten the width and height of our screen, so in order to get the exact value, we will define them up using `pyautogui`

In [5]:
# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

(1366, 768)

So now that we have these values we can continue and say 

In [3]:
detector = htm.handDetector(
    maxHands=1
)

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen
previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:]
        x2, y2 = lmlist[12][1:]
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (0, WIDTH_CAM), (0, width_screen))
            y3 = np.interp(y1, (0, HEIGHT_CAM), (0, height_screen))
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()



So `x3` and `y3` are the points that now we have converted, and now we will send these values to the Mouse, we will do step 6 of Smoothening the value later once we see whats the original result and then we can convert it, anyways now will do step 7 using **pyautogui**

        # Move the mouse to the (x3, y3) coordinates
        pyautogui.moveTo(x3, y3)

In [5]:
detector = htm.handDetector(
    maxHands=1
)

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:]
        x2, y2 = lmlist[12][1:]
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (0, WIDTH_CAM), (0, width_screen))
            y3 = np.interp(y1, (0, HEIGHT_CAM), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

If u run above code u will see now u can move the mouse with the index finger which is sexy!

But u might get the problem where when u move to left the cursor would go right, so this is very annoying, so to fix it what will do is simply flip it

So in order to flip it, we just need to flip the width, so we can say whatever the width of the screen is minus it with `x3`

        pyautogui.moveTo(width_screen - x3, y3)
          
Now u can run and try again, but since its already good for us we dont have to do it :p

So this is good enough, now what we can do is, we can draw a Circle on the tip of the finger of our Index finger so that we know we are moving the mouse

We can do it whenever we are in moving mode using 

```python
cv2.circle(frame,
           (x1, y1), # this is where we wanna draw 
           15, # this is the radius
           (255, 0, 255), # the color of the circle
           cv2.FILLED # filling the circle
```

So now when ur index finger is up it meaning ur in moving mode, it should draw the purple circle

In [6]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:]
        x2, y2 = lmlist[12][1:]
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (0, WIDTH_CAM), (0, width_screen))
            y3 = np.interp(y1, (0, HEIGHT_CAM), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

The error you're seeing, FailSafeException, is a built-in feature of PyAutoGUI designed to prevent automation from running out of control. If the mouse is moved to one of the corners of the screen (usually the top-left corner), PyAutoGUI will raise a FailSafeException and stop execution to ensure the user can regain control.

If you want to disable this fail-safe (though not recommended for safety reasons), you can set the pyautogui.FAILSAFE option to False. This removes the stop-trigger when the mouse is moved to a corner.

```python
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False
```

Anyways, now one of the main issues here is that, when we move the index finger to the edges especially when we wanna go all the way down, its pretty bad because the hand is not detected properly

So what we can do is, we can set Region where we want to detect the movements, so instead of the whole frame size we can set a particular range, so how can we do that?

First of all lets create that range using `cv2.rectangle()`, we can define `frame_reduction` as variable above then do this

```python
cv2.rectangle(
    frame,
    (frame_reduction, frame_reduction), # top-left corner of the rectangle
    (WIDTH_CAM - frame_reduction, HEIGHT_CAM - frame_reduction), # bottom-right corner
    (255, 0, 255), # color of rectangle
    2 # thickness
)
```

So this will draw a Rectangle

In [7]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

frame_reduction = 100

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:]
        x2, y2 = lmlist[12][1:]
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        cv2.rectangle(frame, (frame_reduction, frame_reduction),
                        (WIDTH_CAM - frame_reduction,
                        HEIGHT_CAM - frame_reduction),
                        (255, 0, 255),
                        2 
                     )
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (0, WIDTH_CAM), (0, width_screen))
            y3 = np.interp(y1, (0, HEIGHT_CAM), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

So if u run above code u will see a rectangle, Okay now the idea here is that, for example when my index finger reaches the top of the rectangle, then it should also reach the top of the screen, similarly for down, left and right, so this way our hand will stay in detection range and we could also move the cursor till the edges

So how do we reflect this on our `x3` and `y3`? 

So all we have to do is, its very simple, make changes here, instead of `0` write `frame_reduction` and `WIDTH_CAM - frame_reduction` and `HEIGHT_CAM - frame_reduction` and thats it

In [9]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

frame_reduction = 100

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:] # 8 is the point for index finger
        x2, y2 = lmlist[12][1:] # 12 is for middle finger
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        cv2.rectangle(frame, (frame_reduction, frame_reduction),
                        (WIDTH_CAM - frame_reduction,
                        HEIGHT_CAM - frame_reduction),
                        (255, 0, 255),
                        2 
                     )
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (frame_reduction, WIDTH_CAM-frame_reduction), (0, width_screen))
            y3 = np.interp(y1, (frame_reduction, HEIGHT_CAM-frame_reduction), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()



So if we run the above code, we will be able to move the Mouse cursor till the edges as we move our finger to the edges of the Rectangle

Now moving next to detect the clicks, so we will do step 8 now where when both the index and middle finger are up, so when both are up then we need to fiind distance between our fingers, so we can do it using `detector.findDistance()`, and we can simply pass `8` which is point for index finger and point `12` which is of the mid finger, these are just landmarks id uses in Mediapipe model i guess.

Then will simply unpack the values and it will return us `length`, `frame` and `lineInfo`

The main thing we need is the length, now we need to check the length between the 2 fingers, lets simply print it

In [11]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

frame_reduction = 100

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:] # 8 is the point for index finger
        x2, y2 = lmlist[12][1:] # 12 is for middle finger
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        cv2.rectangle(frame, (frame_reduction, frame_reduction),
                        (WIDTH_CAM - frame_reduction,
                        HEIGHT_CAM - frame_reduction),
                        (255, 0, 255),
                        2 
                     )
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (frame_reduction, WIDTH_CAM-frame_reduction), (0, width_screen))
            y3 = np.interp(y1, (frame_reduction, HEIGHT_CAM-frame_reduction), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
            
        # 8. Both index and mid fingers are up: Clicking mode
        if fingers[1] == 1 and fingers[2] == 1:
            length, frame, lineInfo = detector.findDistance(8, 12, frame)
            print(length)
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

70.00714249274856
67.23094525588644
60.83584469702052
62.177166226839255
59.16924876994806
52.478567053607705
42.80186911806539
38.948684188300895
38.2099463490856
40.36087214122113
40.45985664828782
43.139309220245984
41.72529209005013
40.311288741492746
41.72529209005013
41.72529209005013
41.036569057366385
41.012193308819754
40.311288741492746
40.36087214122113
41.773197148410844
41.773197148410844
41.72529209005013
39.59797974644666
38.18376618407357
37.48332962798263
41.72529209005013
38.18376618407357
36.87817782917155
38.948684188300895
37.64306044943742
40.311288741492746
36.76955262170047
37.64306044943742
38.28837943815329
38.28837943815329
36.796738985948195
36.235341863986875
36.124783736376884


You can see it not only draw a center point which is its giving us indication of both fingers being up and it also printed the length between the 2 fingers

So what we can do next is, we can check, if the length is below a certain value, then we will detect it as a click, but first we need to define the Treshold, u can try running above code again and close both ur finger together and see what value it prints for the length, so we will use that as a treshold

So well it looks like 40-45, lets just set the treshold to be 40, so we can say if the length is less than 40 then will detect it as a click, also when clicking will draw the same center circle but just with a different color (green) so we know that it has been clicked and will use pyautogui for the clicking

To draw that middle circle and recolor it, will simply use `lineInfo[4]` and `lineInfo[5]` 

In [16]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

frame_reduction = 100

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:] # 8 is the point for index finger
        x2, y2 = lmlist[12][1:] # 12 is for middle finger
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        cv2.rectangle(frame, (frame_reduction, frame_reduction),
                        (WIDTH_CAM - frame_reduction,
                        HEIGHT_CAM - frame_reduction),
                        (255, 0, 255),
                        2 
                     )
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (frame_reduction, WIDTH_CAM-frame_reduction), (0, width_screen))
            y3 = np.interp(y1, (frame_reduction, HEIGHT_CAM-frame_reduction), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
            
        # 8. Both index and mid fingers are up: Clicking mode
        if fingers[1] == 1 and fingers[2] == 1:
            length, frame, lineInfo = detector.findDistance(8, 12, frame)
            #print(length)
            if length < 40:
                cv2.circle(frame, (lineInfo[4], lineInfo[5]), 15, (0,255,255), cv2.FILLED)
            
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

So if u run the above code and put both index and mid finger up, it will recolor the center circle when u close them together

So now we actually need to do the clicking, hence will use `pyaugogui`, its very easy and simple

    # Perform a mouse click at the current position
    pyautogui.click()

In [18]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

frame_reduction = 100

# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x1, y1 = lmlist[8][1:] # 8 is the point for index finger
        x2, y2 = lmlist[12][1:] # 12 is for middle finger
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        cv2.rectangle(frame, (frame_reduction, frame_reduction),
                        (WIDTH_CAM - frame_reduction,
                        HEIGHT_CAM - frame_reduction),
                        (255, 0, 255),
                        2 
                     )
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (frame_reduction, WIDTH_CAM-frame_reduction), (0, width_screen))
            y3 = np.interp(y1, (frame_reduction, HEIGHT_CAM-frame_reduction), (0, height_screen))
            
            # 6. Smoothen Values
            
            # 7. Move Mouse
            pyautogui.moveTo(x3, y3)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
            
        # 8. Both index and mid fingers are up: Clicking mode
        if fingers[1] == 1 and fingers[2] == 1:
            # 9. Find distance between index and mid finger
            length, frame, lineInfo = detector.findDistance(8, 12, frame)
            #print(length)
            
            # 10. Click mouse if the distance is short
            if length < 40:
                cv2.circle(frame, (lineInfo[4], lineInfo[5]), 15, (0,255,255), cv2.FILLED)
                pyautogui.click()
            
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

Perfect! if u run above code we can see the clicking does work

Now the problem is its very jittery and shaky u can hardly control it, so its a vert big problem which is not allowing us to use this Virtual Mouse properly, so what can we do? Well we can do step 6 now and Smoothen the values, so how can we do that?

What we can do is, instead of sending exactly the same value, we will dilute it a lil bit, so we will smoothen it so it goes step by step

First of all we will create a variable called `smoothening` and give it a value lets say 5, its a random value, we can tweak it and see which gives more smoothening

We also need to create couple of more variables

`plocX` is basically previous location of X

`plocy` is previous location of y

`curlocX` is current location of X

`curlocy` is current location of y

We can put all 4 values as 0

So now what we will do is, use this value and update them each iteration to smoothen our mouse movements, we can use the formula of

    current location of X = prev loc of X + (x3 - prev loc of x) / smoothening value
    current location of y = prev loc of y + (y3 - prev loc of y) / smoothening value


Now we can simply pass these values to `pyautogui.moveTo()`

Then will simply update prev loc of x to be curr loc of x and prev loc of y to be curr loc of y

In [4]:
# Initialize the keyboard controller
keyboard = Controller()

In [8]:
# Disable fail-safe (not recommended)
pyautogui.FAILSAFE = False

detector = htm.handDetector(
    maxHands=1
)

smoothening = 3
plocX, plocy = 0,0
curlocX, curlocy = 0,0

frame_reduction = 50



# getting screen width and height
width_screen, height_screen =  pyautogui.size()
width_screen, height_screen

previous_time = 0

cap = cv2.VideoCapture(1)

WIDTH_CAM = 640
HEIGHT_CAM = 480
cap.set(3, WIDTH_CAM)
cap.set(4, HEIGHT_CAM)

while cap.isOpened():
    success, frame = cap.read()
    # 1. find hand landmarks
    frame = detector.findHands(frame) 
    lmlist, bbox = detector.findPosition(frame)
    
    # 2. Get the tip of the index and middle fingers
    if len(lmlist) != 0:
        x0, y0 = lmlist[4][1:]  # 4 for Thumb tip
        x1, y1 = lmlist[8][1:] # 8 is the point for index finger
        x2, y2 = lmlist[12][1:] # 12 is for middle finger
        # print(x1, y1, x2, y2)
        
        # 3. Check which fingers are up
        fingers = detector.fingersUp()
        
        cv2.rectangle(frame, (frame_reduction, frame_reduction),
                        (WIDTH_CAM - frame_reduction,
                        HEIGHT_CAM - frame_reduction),
                        (255, 0, 255),
                        2 
                     )
        
        # 4. Only Index Finger : Moving Mode
        if fingers[1]==1  and fingers[2]==0:
            
            # 5. Convert Coordinates
            x3 = np.interp(x1, (frame_reduction, WIDTH_CAM-frame_reduction), (0, width_screen))
            y3 = np.interp(y1, (frame_reduction, HEIGHT_CAM-frame_reduction), (0, height_screen))
            
            # 6. Smoothen Values
            curlocX = plocX + (x3 - plocX) / smoothening
            curlocy = plocy + (y3 - plocy) / smoothening
            
            # 7. Move Mouse
            pyautogui.moveTo(curlocX, curlocy)
            cv2.circle(frame, (x1, y1), 15, (255,0,255), cv2.FILLED)
            plocX, plocy = curlocX, curlocy
            
        # 8. Both index and mid fingers are up: Clicking mode
        if fingers[1] == 1 and fingers[2] == 1:
            # 9. Find distance between index and mid finger
            length, frame, lineInfo = detector.findDistance(8, 12, frame)
            #print(length)
            
            # 10. Click mouse if the distance is short
            if length < 40:
                cv2.circle(frame, (lineInfo[4], lineInfo[5]), 15, (0,255,255), cv2.FILLED)
                pyautogui.click()
                
            
    
    # 11. Frame Rate
    current_time = time.time()
    fps = 1/(current_time - previous_time)
    previous_time = current_time
    cv2.putText(frame, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 0), 3)
    
    # 12. Dislay
    cv2.imshow("image", frame)
    cv2.resizeWindow("image", WIDTH_CAM, HEIGHT_CAM)
    if cv2.waitKey(10) & 0xFF == ord("q"):
        break 
        
cap.release()
cv2.destroyAllWindows()

WOW! as u can see it works very well!

All of this is possible thanks to the `HandTrackingModule` module, if we didnt have this and it would be very difficult and take quite alot of time to create such project!