# 1. Rysowanie Dłonie

Rozpoznawanie dłonie polega na wyznaczeniu pozycji elementów charakterystycznych dłoni. W sumie można wyznaczych ich 21. Są to między innymi stawy, nadgarstek lub końcówki palców. Współrzędne są obliczne względem położenia nagarstka.  

<img src=https://i.imgur.com/qpRACer.png />

Rozpoczynamy od zaimportowania odpowiednich bibliotek.

OpenCV pozowli na przeprowadzenie wstępnych przekształceń obrazu, w taki sposób, aby biblioteka MediaPipe mogła poprawnie rozpoznać dłoń oraz jej elementy charaktterystyczne. 

In [22]:
import mediapipe as mp
import cv2
import numpy as np

Wybieramy dwa obiekty klasy mp.solutions:

1. mp_drawing - pozowli na naniesienie punktów na elementy charakterystyczne dłoni oraz linii ich łączących. 
2. mp_hands - zostanie wykorzystany do rozpoznania dłoni z wybraną dokładnością. 

In [23]:
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands

Wstępne ropoznanie dłoni i nanisieni grafiki na obraz pobrany z kamery.

In [24]:
cap = cv2.VideoCapture(0)

with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands:
    while cap.isOpened():
        ret, frame = cap.read()
        
        #BGR to RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        #Flip horizontal
        image = cv2.flip(image, 1)
        
        #Set flag
        image.flags.writeable = False
        
        #Detections
        results = hands.process(image)
        
        #Set flag back to True
        image.flags.writeable = True
        
        #RGB to BGR
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        #print(results)
        
        #Rendering results
        if results.multi_hand_landmarks:
            for num, hand in enumerate(results.multi_hand_landmarks):
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, 
                                         mp_drawing.DrawingSpec(color=(0,255,0), thickness=2, circle_radius=4), 
                                         mp_drawing.DrawingSpec(color=(0,0,255), thickness=2, circle_radius=4))

        #image = cv2.flip(image, 0)
        cv2.imshow("Hand Tracking", image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    print(image.shape)

    cap.release()
    cv2.destroyAllWindows()

(480, 640, 3)


Wyniki zapisujemy w liście "results", która posiada informacje o wszystkich wykrytyach dłoniach. 

In [25]:
results.multi_hand_landmarks[0].landmark

[x: 0.6646337509155273
y: 0.7779116630554199
z: 0.0
, x: 0.5861634016036987
y: 0.7216818332672119
z: -0.02502651885151863
, x: 0.525833010673523
y: 0.6241156458854675
z: -0.04007631167769432
, x: 0.49093371629714966
y: 0.5338039994239807
z: -0.05749374255537987
, x: 0.45626354217529297
y: 0.4671103060245514
z: -0.07552837580442429
, x: 0.5921728014945984
y: 0.46859803795814514
z: -0.0071991002187132835
, x: 0.5680752992630005
y: 0.3497309386730194
z: -0.029075084254145622
, x: 0.557329535484314
y: 0.2765547037124634
z: -0.04595037177205086
, x: 0.5513691902160645
y: 0.2124011516571045
z: -0.058871712535619736
, x: 0.6478991508483887
y: 0.4594009816646576
z: -0.020010558888316154
, x: 0.6458236575126648
y: 0.32941722869873047
z: -0.03746825084090233
, x: 0.6463074684143066
y: 0.2499026656150818
z: -0.05505750700831413
, x: 0.6490587592124939
y: 0.18397444486618042
z: -0.07051093876361847
, x: 0.6992640495300293
y: 0.48383551836013794
z: -0.03934817388653755
, x: 0.7216429114341736
y: 0.

# 2. Zapis pozycji elementów charakterystycznych do pliku CSV

In [26]:
import csv
import os
import numpy as np

Sprawdzamy sumaryczną liczbę wszysktich elementów charakterystycznych

In [27]:
num_coords = len(results.multi_hand_landmarks[0].landmark)
print(num_coords)

21


Tworzymy oznaczenia kolumn (klasy, współrzędne)

In [28]:
landmarks = ['class']
for val in range(0, num_coords):
    landmarks += ['x{}'.format(val), 'y{}'.format(val), 'z{}'.format(val)]

In [12]:
landmarks

['class',
 'x0',
 'y0',
 'z0',
 'x1',
 'y1',
 'z1',
 'x2',
 'y2',
 'z2',
 'x3',
 'y3',
 'z3',
 'x4',
 'y4',
 'z4',
 'x5',
 'y5',
 'z5',
 'x6',
 'y6',
 'z6',
 'x7',
 'y7',
 'z7',
 'x8',
 'y8',
 'z8',
 'x9',
 'y9',
 'z9',
 'x10',
 'y10',
 'z10',
 'x11',
 'y11',
 'z11',
 'x12',
 'y12',
 'z12',
 'x13',
 'y13',
 'z13',
 'x14',
 'y14',
 'z14',
 'x15',
 'y15',
 'z15',
 'x16',
 'y16',
 'z16',
 'x17',
 'y17',
 'z17',
 'x18',
 'y18',
 'z18',
 'x19',
 'y19',
 'z19',
 'x20',
 'y20',
 'z20']

Tworzymy plik CSV i zapisujemy do niego oznaczenia kolumn.

In [30]:
with open('coords.csv', mode='w', newline='') as f:
    csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(landmarks)

Tworzymy zmienną class_name, która będzie przechowywała informację o aktualnie przechwytywanym geście. W momencie rozpoczęcie tej części programu, będziemy zapisywać wszystkie współrzędne elementów charakterystycznych dla wybranego gestu. 

In [54]:
class_name = "Open"

In [55]:
cap = cv2.VideoCapture(0)

detections = 0
with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands:
    while cap.isOpened():
        ret, frame = cap.read()
        
        #BGR to RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        #Flip horizontal
        image = cv2.flip(image, 1)
        
        #Set flag
        image.flags.writeable = False
        
        #Detections
        results = hands.process(image)
        
        #Set flag back to True
        image.flags.writeable = True
        
        #RGB to BGR
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        #print(results)
        
        #Rendering results
        if results.multi_hand_landmarks:
            for num, hand in enumerate(results.multi_hand_landmarks):
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, 
                                         mp_drawing.DrawingSpec(color=(0,255,0), thickness=2, circle_radius=4), 
                                         mp_drawing.DrawingSpec(color=(0,0,255), thickness=2, circle_radius=4))
                
        try:
            hand_landmarks = results.multi_hand_landmarks[0].landmark
            hand_landmarks_row = list(np.array([[landmark.x, landmark.y, landmark.z] for landmark in hand_landmarks]).flatten())
            hand_landmarks_row.insert(0, class_name)
            
            with open('coords.csv', mode='a', newline='') as f:
                csv_writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
                csv_writer.writerow(hand_landmarks_row)
                detections += 1
        except:
            pass
        
        if detections == 1500:
            break

        #image = cv2.flip(image, 0)
        cv2.imshow("Hand Tracking", image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    print(image.shape)

    cap.release()
    cv2.destroyAllWindows()

(480, 640, 3)


In [31]:
len(hand_landmarks_row)

64

# 3. Trening modeli z wykorzystaniem Scikit Learn

In [32]:
import pandas as pd
from sklearn.model_selection import train_test_split

Odczytyjemy wszystkie dane z pliku CSV

In [56]:
df = pd.read_csv('coords.csv')

In [34]:
df.head()

Unnamed: 0,class,x0,y0,z0,x1,y1,z1,x2,y2,z2,...,z17,x18,y18,z18,x19,y19,z19,x20,y20,z20
0,Open,0.573203,0.64056,0.0,0.499717,0.596618,-0.023945,0.44707,0.513179,-0.038492,...,-0.057663,0.682798,0.355524,-0.083973,0.697056,0.304105,-0.098364,0.707266,0.256017,-0.107347
1,Open,0.576247,0.646744,0.0,0.501655,0.600668,-0.019519,0.446323,0.514822,-0.030754,...,-0.044279,0.686625,0.355156,-0.063872,0.701243,0.304934,-0.076318,0.708923,0.257824,-0.084473
2,Open,0.577344,0.648723,0.0,0.502655,0.602043,-0.017698,0.447916,0.515996,-0.028837,...,-0.048054,0.688223,0.357956,-0.068268,0.703907,0.307465,-0.080551,0.713025,0.26075,-0.088573
3,Open,0.578213,0.651113,0.0,0.502575,0.603344,-0.018102,0.447733,0.515504,-0.029073,...,-0.047166,0.689426,0.358991,-0.067121,0.704428,0.309513,-0.079505,0.712927,0.263294,-0.087619
4,Open,0.579336,0.651718,0.0,0.503827,0.602943,-0.018932,0.44896,0.517329,-0.03049,...,-0.048345,0.690076,0.356923,-0.068354,0.704977,0.306609,-0.080476,0.713523,0.260047,-0.088444


In [57]:
df.tail()

Unnamed: 0,class,x0,y0,z0,x1,y1,z1,x2,y2,z2,...,z17,x18,y18,z18,x19,y19,z19,x20,y20,z20
18822,Open,0.467846,0.406861,0.0,0.428533,0.380948,-0.008333,0.395,0.332535,-0.014978,...,-0.036459,0.507578,0.225739,-0.048563,0.515882,0.19622,-0.054918,0.520815,0.170179,-0.059214
18823,Open,0.479289,0.399037,0.0,0.440316,0.368913,-0.006411,0.407408,0.317506,-0.012222,...,-0.036765,0.523634,0.222356,-0.04923,0.533162,0.191859,-0.055645,0.539148,0.163603,-0.059932
18824,Open,0.483824,0.393661,0.0,0.445903,0.364271,-0.007689,0.414518,0.313127,-0.013561,...,-0.03411,0.531351,0.219048,-0.045684,0.541211,0.190878,-0.051768,0.547371,0.165572,-0.055823
18825,Open,0.488795,0.389167,0.0,0.451866,0.359073,-0.008251,0.421664,0.307815,-0.014395,...,-0.034072,0.538266,0.216589,-0.045693,0.548472,0.188789,-0.051936,0.5552,0.163757,-0.056152
18826,Open,0.494083,0.38454,0.0,0.457702,0.3544,-0.008339,0.428187,0.303309,-0.014619,...,-0.034583,0.545277,0.215185,-0.046321,0.55599,0.187927,-0.052506,0.563251,0.163348,-0.056651


In [36]:
df[df['class']=='Fist']

Unnamed: 0,class,x0,y0,z0,x1,y1,z1,x2,y2,z2,...,z17,x18,y18,z18,x19,y19,z19,x20,y20,z20
893,Fist,0.656125,0.654117,0.0,0.569530,0.638938,-0.026891,0.483274,0.555184,-0.044985,...,-0.076638,0.654862,0.389905,-0.103299,0.656953,0.464484,-0.090600,0.678349,0.483450,-0.071660
894,Fist,0.650962,0.655531,0.0,0.567307,0.649745,-0.037093,0.476494,0.562160,-0.052693,...,-0.039232,0.660754,0.378976,-0.061604,0.658466,0.457000,-0.050921,0.677963,0.469749,-0.032807
895,Fist,0.642972,0.647645,0.0,0.563097,0.642524,-0.038610,0.475390,0.553085,-0.056490,...,-0.037658,0.658460,0.379504,-0.060410,0.655274,0.453308,-0.050536,0.673471,0.466329,-0.032854
896,Fist,0.641329,0.645809,0.0,0.561098,0.641453,-0.040462,0.474805,0.554611,-0.059773,...,-0.037179,0.656818,0.380161,-0.060614,0.656590,0.452866,-0.051575,0.673701,0.465277,-0.034630
897,Fist,0.643294,0.654593,0.0,0.562677,0.645861,-0.035969,0.477000,0.560907,-0.051397,...,-0.034409,0.659196,0.386266,-0.055901,0.655459,0.458902,-0.046396,0.673958,0.472202,-0.029651
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17322,Fist,0.880197,0.274399,0.0,0.831428,0.252549,-0.020512,0.776574,0.199264,-0.045410,...,-0.046241,0.927589,0.113589,-0.043851,0.922207,0.156737,-0.031414,0.907116,0.147849,-0.022297
17323,Fist,0.877812,0.275025,0.0,0.829071,0.251459,-0.018144,0.774391,0.200918,-0.043276,...,-0.048025,0.924914,0.114345,-0.044778,0.918183,0.156934,-0.031974,0.902419,0.147757,-0.022989
17324,Fist,0.876423,0.275874,0.0,0.826541,0.253047,-0.017919,0.771227,0.201744,-0.042834,...,-0.047798,0.922215,0.114968,-0.044430,0.916394,0.156909,-0.031314,0.901559,0.146656,-0.022021
17325,Fist,0.869560,0.278645,0.0,0.820412,0.255662,-0.016948,0.764987,0.205553,-0.040883,...,-0.050110,0.914737,0.120443,-0.046376,0.908467,0.162532,-0.033749,0.892263,0.152109,-0.025170


In [58]:
x = df.drop('class', axis=1)
y = df['class']

In [19]:
y

0        Open
1        Open
2        Open
3        Open
4        Open
         ... 
15822    Open
15823    Open
15824    Open
15825    Open
15826    Open
Name: class, Length: 15827, dtype: object

In [20]:
x

Unnamed: 0,x0,y0,z0,x1,y1,z1,x2,y2,z2,x3,...,z17,x18,y18,z18,x19,y19,z19,x20,y20,z20
0,0.573203,0.640560,0.0,0.499717,0.596618,-0.023945,0.447070,0.513179,-0.038492,0.415968,...,-0.057663,0.682798,0.355524,-0.083973,0.697056,0.304105,-0.098364,0.707266,0.256017,-0.107347
1,0.576247,0.646744,0.0,0.501655,0.600668,-0.019519,0.446323,0.514822,-0.030754,0.413325,...,-0.044279,0.686625,0.355156,-0.063872,0.701243,0.304934,-0.076318,0.708923,0.257824,-0.084473
2,0.577344,0.648723,0.0,0.502655,0.602043,-0.017698,0.447916,0.515996,-0.028837,0.415523,...,-0.048054,0.688223,0.357956,-0.068268,0.703907,0.307465,-0.080551,0.713025,0.260750,-0.088573
3,0.578213,0.651113,0.0,0.502575,0.603344,-0.018102,0.447733,0.515504,-0.029073,0.415643,...,-0.047166,0.689426,0.358991,-0.067121,0.704428,0.309513,-0.079505,0.712927,0.263294,-0.087619
4,0.579336,0.651718,0.0,0.503827,0.602943,-0.018932,0.448960,0.517329,-0.030490,0.417358,...,-0.048345,0.690076,0.356923,-0.068354,0.704977,0.306609,-0.080476,0.713523,0.260047,-0.088444
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15822,0.745920,0.822766,0.0,0.673529,0.781417,-0.021865,0.617996,0.709655,-0.041506,0.583282,...,-0.084872,0.847413,0.515584,-0.113214,0.862228,0.455671,-0.127817,0.871616,0.398791,-0.138076
15823,0.747752,0.823369,0.0,0.675269,0.782351,-0.023667,0.619182,0.710473,-0.044135,0.585453,...,-0.086734,0.848156,0.514670,-0.115502,0.861851,0.454837,-0.130281,0.870234,0.397057,-0.140667
15824,0.750831,0.823639,0.0,0.677713,0.781122,-0.020728,0.620577,0.708134,-0.039233,0.587027,...,-0.083076,0.847371,0.510916,-0.110269,0.860105,0.451476,-0.124075,0.867597,0.394660,-0.133652
15825,0.752100,0.819934,0.0,0.679781,0.775837,-0.020275,0.623208,0.702549,-0.038121,0.590963,...,-0.082703,0.850726,0.507875,-0.109655,0.862297,0.449699,-0.122998,0.868528,0.394151,-0.132132


Wszystkie pobrane dane dzielimy na dwie części, pierwsza posłuży do trenowania, druga do testwowania.

In [59]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=3451)

In [39]:
y_train.values

array(['Open', 'Open', 'Love', ..., 'Open', 'OK', 'Peace'], dtype=object)

# 4. Trenowanie Klasyfikujących Modeli Uczenia Maszynowego

In [39]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

Tworzymy słownik przechowywujący 4 metody uczenie maszynowego wraz z metodą normalizacji.

In [60]:
pipelines = {
    'lr':make_pipeline(StandardScaler(), LogisticRegression()),
    'rd':make_pipeline(StandardScaler(), RidgeClassifier()),
    'rf':make_pipeline(StandardScaler(), RandomForestClassifier()),
    'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier()),
}

Trenujemy 4 różne modele jednocześnie. 

!!! PRZETESTOWAĆ INNE METODY !!!

In [61]:
fit_models = {}

for algo, pipeline in pipelines.items():
    model = pipeline.fit(x_train, y_train)
    fit_models[algo] = model

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [110]:
fit_models['rf'].predict(x_test)

array(['Fist', 'Peace', 'Love', ..., 'Peace', 'Love', 'Open'],
      dtype=object)

# 5. Ewaluacja Modelu

In [42]:
from sklearn.metrics import accuracy_score
import pickle

Porównujemy dokładność każdego modelu wykorzystując funkcję accuracy_score

In [62]:
for algo, model in fit_models.items():
    yhat = model.predict(x_test)
    print(algo, accuracy_score(y_test, yhat))

lr 0.9792883696229421
rd 0.9644184811471057
rf 0.9950433705080545
gb 0.9890246061249779


In [39]:
fit_models['rf'].predict(x_test)

array(['Open', 'Fist', 'Fist', 'Open', 'Open', 'Open', 'Open', 'Fist',
       'Open', 'Open', 'Open', 'Open', 'Open', 'Open', 'Fist', 'Open',
       'Fist', 'Open', 'Open', 'Open', 'Open', 'Open', 'Open', 'Open',
       'Open', 'Open', 'Fist', 'Fist', 'Open', 'Open', 'Fist', 'Open',
       'Open', 'Fist', 'Fist', 'Fist', 'Open', 'Open', 'Open', 'Fist',
       'Open', 'Fist', 'Open', 'Open', 'Open', 'Fist', 'Open', 'Fist',
       'Open', 'Open', 'Open', 'Open', 'Fist', 'Open', 'Fist', 'Open',
       'Open', 'Open', 'Fist', 'Open', 'Open', 'Open', 'Open', 'Open',
       'Fist', 'Fist', 'Open', 'Open', 'Fist', 'Open', 'Open', 'Open',
       'Open', 'Fist', 'Fist', 'Open', 'Open', 'Open', 'Open', 'Fist',
       'Fist', 'Open', 'Open', 'Open', 'Fist', 'Open', 'Fist', 'Open',
       'Open', 'Open', 'Open', 'Fist', 'Fist', 'Open', 'Open', 'Open',
       'Open', 'Open', 'Open', 'Open', 'Fist', 'Open', 'Open', 'Fist',
       'Open', 'Open', 'Fist', 'Open', 'Fist', 'Fist', 'Open', 'Open',
      

In [18]:
y_test

102     Open
1547    Fist
1840    Fist
1359    Open
1092    Open
        ... 
732     Open
793     Open
833     Open
953     Open
777     Open
Name: class, Length: 574, dtype: object

Najdokładniejszy model zapisujemy w postaci binarnej wykorzystując moduł pickle.

In [63]:
with open('gesture_recognition.pkl', 'wb') as f:
    pickle.dump(fit_models['rf'], f)

# 5. Detekcje

Powtórnie ładujemy model.

In [45]:
with open('gesture_recognition.pkl', 'rb') as f:
    model = pickle.load(f)

In [47]:
model

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('randomforestclassifier', RandomForestClassifier())])

In [48]:
model.predict(x_test)

array(['Love', 'Open', 'Open', ..., 'Peace', 'Love', 'Open'], dtype=object)

Testujemy działanie modelu na podstawie obrazu z kamery. 

In [51]:
cap = cv2.VideoCapture(0)

detections = 0
with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands:
    while cap.isOpened():
        ret, frame = cap.read()
        
        #BGR to RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        #Flip horizontal
        image = cv2.flip(image, 1)
        
        #Set flag
        image.flags.writeable = False
        
        #Detections
        results = hands.process(image)
        
        #Set flag back to True
        image.flags.writeable = True
        
        #RGB to BGR
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        #print(results)
        
        #Rendering results
        if results.multi_hand_landmarks:
            for num, hand in enumerate(results.multi_hand_landmarks):
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, 
                                         mp_drawing.DrawingSpec(color=(0,255,0), thickness=2, circle_radius=4), 
                                         mp_drawing.DrawingSpec(color=(0,0,255), thickness=2, circle_radius=4))
                
        try:
            hand_landmarks = results.multi_hand_landmarks[0].landmark
            hand_landmarks_row = list(np.array([[landmark.x, landmark.y, landmark.z] for landmark in hand_landmarks]).flatten())
            #hand_landmarks_row.insert(0, class_name)
            
            #Make Detections
            x = pd.DataFrame([hand_landmarks_row])
            #print(x)
            gesture_class = model.predict(x.values)
            #gesture_prob = model.predict_proba(x)[0]
            
            with warnings.catch_warnings():
                warnings.filterwarnings("ignore")
                print(gesture_class)
            
         
            
            cv2.putText(image, 'test', (10,20), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.LINE_AA)
        except:
            pass
        
        if detections == 500:
            break

        #image = cv2.flip(image, 0)
        cv2.imshow("Hand Tracking", image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
    print(image.shape)

    cap.release()
    cv2.destroyAllWindows()



(480, 640, 3)
