# Intro
Welcome to the [Herbarium 2022 - FGVC9](https://www.kaggle.com/c/herbarium-2022-fgvc9//) compedtition.

![](https://storage.googleapis.com/kaggle-competitions/kaggle/33679/logos/header.png)

<font size="4"><span style="color: royalblue;">Please vote the notebook up if it helps you. Feel free to leave a comment above the notebook. Thank you. </span></font>

# Libraries

In [None]:
import os
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
import json

# Path

In [None]:
path = '/kaggle/input/herbarium-2022-fgvc9/'
os.listdir(path)

# Load Data

In [None]:
samp_subm = pd.read_csv(path+'sample_submission.csv')

In [None]:
with open(path+'train_metadata.json') as f:
    train_data = json.load(f)

with open(path+'test_metadata.json') as f:
    test_data = json.load(f)

# Overview

In [None]:
print('Number of train samples:', len(train_data['images']))
print('Number of submission samples:', len(samp_subm))

For each image Id, we should predict the corresponding image label (category_id) in the Predicted column.

In [None]:
samp_subm.head()

# EDA

## Focus Train Data Structure
There are some metadata:

In [None]:
train_data.keys()

In [None]:
train_data['annotations'][0]

In [None]:
train_data['images'][0]

In [None]:
train_data['categories'][0]

In [None]:
train_data['genera'][0]

In [None]:
train_data['institutions'][0]

In [None]:
train_data['distances'][0]

In [None]:
train_data['license'][0]

## Focus Test Data Structure
There are some metadata: "image_id", "file_name", "licenses".

In [None]:
test_data[0]

## Plot An Examples
We focus on the first example of the train data set:

In [None]:
row = 0
print('image:', train_data['images'][row])
print('category:', train_data['categories'][row])

In [None]:
file_name = train_data['images'][row]['file_name']
species = train_data['categories'][row]['species']

In [None]:
img = cv2.imread(path+'train_images/'+file_name)
print('Shape:', img.shape)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7, 7))
ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.set_title(species)
plt.show()

## Plot Some Examples
We define a simple plot function to plot some examples of the train images:

In [None]:
def plot_examples():
    fig, axs = plt.subplots(4, 4, figsize=(20, 20))
    fig.subplots_adjust(hspace = .1, wspace=.1)
    
    axs = axs.ravel()
    for i in range(16):
        img = cv2.imread(path+'train_images/'+train_data['images'][i]['file_name'])
        axs[i].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        axs[i].set_title(train_data['categories'][i]['species'])
        axs[i].set_xticklabels([])
        axs[i].set_yticklabels([])
    plt.show()

In [None]:
plot_examples()

## Categories

In [None]:
df_train_image = pd.json_normalize(train_data['images'])
df_train_anno = pd.json_normalize(train_data['annotations'])

In [None]:
df_train_anno['category_id'].value_counts()

# Export Data

In [None]:
samp_subm['Predicted'] = 2774

In [None]:
samp_subm.to_csv('submission.csv', index=False)