* In this dataset, we are given images,descriptions and their respective groups.
* Images that have same group will have same type of product in them.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
import collections

In [None]:
images_path='../input/shopee-product-matching/train_images'
train_df=pd.read_csv('../input/shopee-product-matching/train.csv')
test_df=pd.read_csv('../input/shopee-product-matching/test.csv')
sample_sub=pd.read_csv('../input/shopee-product-matching/sample_submission.csv')

print('Shape of Train: ',train_df.shape)
train_df.head()

In [None]:
print('Number of Groups in our dataset: ',train_df['label_group'].nunique())

**Images Plotting**

In [None]:
#Lets take 2 examples
def plot_img(ids_,lbl_id):
    images=[]
    for id_ in ids_:
        img=cv2.imread(os.path.join(images_path,id_))
        img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
        images.append(img)
    
    
    f, axes_ = plt.subplots(1,len(images))
    plt.title(str(lbl_id))
    for ix,img in enumerate(images):
        axes_[ix].imshow(img)    

In [None]:
#Let's take 2 label ids
ids_0=train_df['image'][train_df['label_group']==3648931069]
ids_1=train_df['image'][train_df['label_group']==4093212188]
ids_2=train_df['image'][train_df['label_group']==2395904891]

#Plot images belonging to these 2 groups
plot_img(ids_0,3648931069)
plot_img(ids_1,4093212188)
plot_img(ids_2,2395904891)

In [None]:
#Now let's take a look at unique labels distribution
labels_uq=train_df['label_group'].unique()
sns.displot(labels_uq,kde=True)

**F1 Score**
* This metric is measure of precision and recall of the data.
* precision-> Out of all predicted positive labels how many are actually positive.
* recall-> Also called Senstivity and it means out of all positive labels how many were predicted positive.
![](https://i.imgur.com/qFmteYs.png)


In [None]:
#Easiest way of implement f1 score is through sklearn library
#Also, let's take a look at difference between accuracy score and f1 score
from sklearn.metrics import f1_score,accuracy_score
true_y=[1,1,0,0,0,0,0,0,0,0,0,0]
pred_y=[0,0,0,0,0,0,0,0,0,0,0,0]

print('f1 Score: {:.3f}'.format(f1_score(true_y,pred_y,)))

true_y=[1,1,0,0,0,0,0,0,0,0,0,0]
pred_y=[0,0,0,0,0,0,0,0,0,0,0,0]

print('Accuracy Score: {:.3f}'.format(accuracy_score(true_y,pred_y,)))

In [None]:
'''As we see, f1 score is only concerned about positive labels. So,if we look at our f1 score above we 
can say that our model is performing poorly. 
But if we look at accuracy score of 0.83, we may think that our model is performing good in predicting labels
even though it is not able to predict positive class.
This is why we must not rely on accuracy only and evaluate our model using different methods.'''