# <span style="color:#6D72C3;">**Phoenix — the cleansing flame is eternal, dying to be reborn.**<span>
    
### <span style="color:#6D72C3;">**Table of content**<span>
<a id="table-of-contents"></a>
- [1. Introduction](#1)
- [2. Dataset Overview](#2)
    - [2.1 Train Dataset](#2.1)
        - [2.1.1 Quick view](#2.1.1)
        - [2.1.2 Data types](#2.1.2)
        - [2.1.3 Basic Statistics](#2.1.3)
        - [2.1.4 Target Column](#2.1.4)
    - [2.2 Test Dataset](#2.2)
        - [2.2.1 Quick view](#2.2.1)
        - [2.2.2 Data types](#2.2.2)
    - [2.3 Ebird Dataset](#2.3)
       - [2.3.1 Quick view](#2.3.1)
       - [2.3.2 Data types](#2.3.2)
- [3. Explore Data Analisys](#3)

In [None]:

import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

import librosa
import librosa.display
import IPython.display as ipd


BASE_DIR = '../input/birdclef-2022/'
train = pd.read_csv(f'{BASE_DIR}/train_metadata.csv')
test = pd.read_csv(f'{BASE_DIR}/test.csv')
ebird = pd.read_csv(f'{BASE_DIR}/eBird_Taxonomy_v2021.csv')
ss = pd.read_csv(f'{BASE_DIR}/sample_submission.csv')

import warnings
warnings.filterwarnings('ignore')


[back to top](#table-of-contents)
<a id="1"></a>
# <span style="color:#6D72C3;">**Introduction**<span>

<span style="color:#514F59;">As the “extinction capital of the world,” Hawai'i has lost 68% of its bird species, the consequences of which can harm entire food chains. Researchers use population monitoring to understand how native birds react to changes in the environment and conservation efforts. But many of the remaining birds across the islands are isolated in difficult-to-access, high-elevation habitats. With physical monitoring difficult, scientists have turned to sound recordings. Known as bioacoustic monitoring, this approach could provide a passive, low labor, and cost-effective strategy for studying endangered bird populations.<span>

<span style="color:#514F59;">In this competition, you’ll use your machine learning skills to identify bird species by sound. Specifically, you'll develop a model that can process continuous audio data and then acoustically recognize the species. The best entries will be able to train reliable classifiers with limited training data.<span>

## <span style="color:#6D72C3;">**Evaluation**<span>

$ \text{Macro-F1} = \frac{1}{Q} \sum_{Q}^{j=1} \frac{2 \times pj \times rj}{pj + rj}$

In [None]:
"""
                                            TORCH ME
                                           UPVOTE IT :)

:::::::::::::::::::::~?YY7!::~~^^^^!7?JJJJJJJJJJJJJJJJJJJJJ?????YGG5J??5PPPGGGGGGGGGGGGGGG5PGGP55PBB
.....................^?JJ?P5?~^^^~7?????????????????????JJJJJ??JGGG5???7?5PPPGGGGGGGGGBBBGPGBGG555GB
.....................^?JY?PGGPJ~~7???????????????????????????JJPBGBY??????JPGGGGGGGGB#GGPBGGGPBBPPGB
..:::::::............^?YJ?5GGG5^7???????JJJJJJJJJJJJJJ?????????PBBGY???????75GGGGGGB##P5YYG55YB#BGBB
:::::::::::::........^?J~~Y5GG!~??JJJ?????????????????JJJJJJJ??5BGY5JJ??????75GGGGGGBBBBBBBBBBBP5PGG
::::::::::::::.......^?~^!!!Y?:?J????????????????????????????Y?JGJ?J??JJ?????75GGGGGPPGGGGGP5YJ?????
::::::::::::::.......~~^~:^~:.~J?????????????????????????????JJ??JJJJJ?JJ?????7PGP5PGGGGGGP5YYJJ???7
::::::::::::::.......~^!~^^~~^???J???????J????????JJ????????J?JJJJ???JJJJY??????GPPGGGPPPPPP5YYYJJJJ
:::::::::::::.......:^:~^^^^~~!?Y?J?????JJ????????YJ??????JJYJ?J????????JYY?????5PP555555PPGGPPPPPPP
:::::::::::.........^:^:^^^^^^!JYJY?????Y????????JJJ??????YYYJ????????????J?????JPPPPPP555PPPGPPPPPG
....................^::^^^~^^^7YJYY?????Y???????J??J?????JYJJ?J???????????JJ?????YPPPPPPPPPPPPP55PP5
...................:^.::^^!^^~?JJJJ????JJ???????J7?J??YJ?YYJJ?JJ???JJJ?????Y?????JPP555555555PPPGGPP
...................^:::::~~^^??YJYY????YY????J?J?7?J??5JJJJJYJJYJ??JYJ?????Y??????PGGGGGGGGGGGGBBBBB
...................~^~^^^!^^777YJ?J????YY???YJJ?77?J?J5JY?JJ?77?Y???YY?????Y??????YGGBBBBBBBBBBBBBBB
...................^^!^^^!^!~.~JY?JJ???YY??J5?J777?J?YYJYY55JJ?7?Y??JY?????Y??????JGBBBBBBBBBBBBBBBB
:::::::::::........~~!^^~!~~:~?5GGGGP5YYY??J5JJ777?JJJPGGB#BBBP5YYY?JY?????Y???????PBBBBBBBBBBBBBBBB
??????????????????J!~!^^!!!7!~!YGGGGG55Y???????777?77?J7JPPPPPPJJJ5PJY????JY???????YBGBBBBBBBBBBBBBB
777777777777777777?!^7^^7!~?^.^?J5G5PJ!7777777777777777!!J5PYYPJ!7JPYY????JY???????JGGBBBBBBBBBBBBBB
..................^~^7^^?^:~!~~7JJ??5J777777?77777777777YY?77YPJJYY??J????JYJJ??????PBBBBBBBBBBBBBBB
..................~~^7^^J^:::^^?5J??J?777777?77777777777?JJYJYY???77?J????YY?YJ?????5G55PGBBBBBBBBBB
::::::::::::......~^^7~^?!:::::!777777777777?77777777777777777777777?Y????YJ?YJ?????YYJYYY5GBBBBBBBB
:::::::::::::::...~^^!~^??:::::^777777777777?77777777777777777777777?Y????Y??J??????JYYJJYYGBBBBBBB#
:::::::::::::::...!^^~!^?J!:::::!77777777777777777777777777777777777?Y???JY?J????????YYJJYYBBBBBBB##
:::::::::::::::...!^^^7^???^::::~777777777777?????777777777777777777?Y???JYJ?????????YYYJJY5GBBBB###
::::::::::::::::.:!^^^7^?Y?7:::::77777777????????JJJY?77777777777777?Y???YY??????????JYJJYYJJPBBB###
::::::::::::::::..!7^^!~7YJJ~::::~7777775GPPPGGGGGGBBB5??7777777777?JY???YJ?????JJ????YYJJJJJJYPB###
::::::::::::::::..!7^^!!7Y?JJ^::::!7777?PBB#BBGGGGGGBBBJ?777777777?YYY???YJ?????JY????Y5JJJJJJJJP###
::::::::::::::....77^^~7!Y???7:.:::!7777YBGP5YYYYYYY55P??77777777?YJJY???YJ??????YJ???JYJJJJJJJJJPB#
::::::::::::......7!^^~!!YJ????~:::^7777?P5YYYYYYYYYY5J77777777?JYJ?JY??JY???????JY????YJJJJJJJJJJPB
:::::::::.........!~^~~!~YJ???YJ7!^:^!777?JYYYYYYY5YJ?7777777?JYJ???JY??JY????????YY???YJJJJJJJJJJYJ
..................~~^~~!~YJ??JY??J?7!~~7777???JJJJ??77777??JJJYYJ???YY??JY?????????YY??JYJJJJJJJYJ??
..................!^^!~7~JJ??JY??Y??JJJ7!777777777777???JJ????J7Y???JY??YY??????????YJ?JYJJJJJYJ????
.................:!^!?^!~JJ??JJ??YJ??JY???777?777???JJ???77??J77JJ??JY??Y5??????JJ??JYJ?YJJJJ???????
.................^~^!!~7~JY??JJ??JY??JYY?7?????JJJJ??777????7777?JJ?JY??Y5Y?????JYJ?JJ5JYYJ?????????
.................~~~~!~7^?Y??JJ???YJ?JY?????77???77???J???77777777?JYY??YJYJ???JJJYJJJJ5YY?????JJYYY
.................!^!.!~7~?Y??JYJJJY5YY5777????777????777777777777777?Y??5YY5JJJJJJJYJJJJY5???JPGBBBB
^~~~^^^~~~^^^^^^!!~7^7~!~?YJ??JJY555YYY777777J?7?J777777777777777777Y5?J555555YYYJJJ5JJJJ5Y?JGBBBBBB
5555555555555555Y~?YYYJ?~??!~~~JYY55YY?7777777J?J7777777777777777777Y5?JP555555YYYYY5555YJYY5BGGGGGB
555555555555555?7!!~~?5J~J?^~!YY!7YYJ?77777777?YJ7777777777777777777?YJJP5YY5555YYYYY55555555PPPGGGB
GGGGGGGG555555Y~^^^~!YJ7!J?^^~77!!!777777777777J777777777777777777777YJJ5Y55555YYYYYY5P5Y55555555PGB
GGGGGGG5555555J~~~~^7?77!J7~!7~^::.~7777777777J5J77777777777777777777YJYJ7?JY5P55YYY55P55P555555PBBB
GGGGGGP55555557~~~~~~!!?!J55PJ::::::^!7777777?PPPY7777777777777777777YJYJ777?G#BBGGPP55555555555BBBB
BBBBBB555555557~~~J5PPP57YGPP7:::::::^!7777775PPPPY777777777777777777YJYJ7??JB#######BPPBGGP555GBBBB
BB#BBG555555557~~?GGGGG5?5GPP7:::::::::~!777YPPPPPPY77777777777777777JYYJ???5########BPG####P5PBB#B#
BBBBBP555555557~~YGGGGGPJPGPP!::::::::::^~!JPPPPPPPPY77777777777777??JYYJ???G########BPB####PPG#B###
BBBB#P555555557~~5GGGGGGJPGPG7:::::::::::::755PPPPPP5?77777777777?????5YJ??Y#########GP#####PPB#####
BBBBB555555555?~~5GGGGGGYGGPG?::::::::::::::^JPPPPPY?7777777??????????5YY?JP#########GB#####PG######
                        #E5D4ED   #6D72C3   #5941A9   #514F59   #1D1128
""";

[back to top](#table-of-contents)
<a id="2"></a>

# <span style="color:#6D72C3;">**Dataset Overview**<span>
[back to top](#table-of-contents)
<a id="2.1"></a>

## <span style="color:#6D72C3;">**Train Dataset**<span>

|Variable|Definition|
|---|---|
| primary label | A code for the bird species. You can review detailed information about the bird codes by appending the code to `https://ebird.org/species/`. ||
| secondary labels |  Background species as annotated by the recordist. An empty list does not mean that no background birds are audible. ||
| type | type of bird sound ||
| latitude | - ||
| longitude | - ||
| scientific name | - ||
| common name | - ||
| autor | The eBird user who provided the recording. ||
| license | - ||
| ratin | Float value between 0.0 and 5.0 as an indicator of the quality rating on Xeno-canto and the number of background species, ||
| | where 5.0 is the highest and 1.0 is the lowest. 0.0 means that this recording has no user rating yet. ||
| time | - ||
| url | - ||
| filename | The associated audio file. ||

[back to top](#table-of-contents)
<a id="2.1.1"></a>

### <span style="color:#6D72C3;">**Quick view**<span>

In [None]:
train.head(3)

In [None]:
print(f'Number of rows: {train.shape[0]}; Number of columns: {train.shape[1]}; No of missing values: {sum(train.isna().sum())}')

[back to top](#table-of-contents)
<a id="2.1.2"></a>

### <span style="color:#6D72C3;">**Data types**<span>

In [None]:
train.dtypes

[back to top](#table-of-contents)
<a id="2.1.3"></a>

### <span style="color:#6D72C3;">**Basic statistics**<span>

In [None]:
train.describe()

[back to top](#table-of-contents)
<a id="2.1.4"></a>

### <span style="color:#6D72C3;">**Target columns**<span>

In [None]:
print('Frequency of each target classes:')
train['primary_label'].value_counts()

[back to top](#table-of-contents)
<a id="2.2"></a>

## <span style="color:#6D72C3;">**Test Dataset**<span>

|Variable|Definition|
|---|---|
| row id | A unique identifier for the row. ||
| file_id | A unique identifier for the audio file. ||
| bird | The ebird code for the row. There is one row for each of the scored species per 5 second window per audio file. ||
| end_time | The last second of the 5 second time window (5, 10, 15, etc). ||


[back to top](#table-of-contents)
<a id="2.2.1"></a>

### <span style="color:#6D72C3;">**Quick view**<span>

In [None]:
test

[back to top](#table-of-contents)
<a id="2.2.2"></a>

### <span style="color:#6D72C3;">**Data types**<span>

In [None]:
test.dtypes

[back to top](#table-of-contents)
<a id="2.3"></a>

## <span style="color:#6D72C3;">**Ebird DataSet**<span>

[back to top](#table-of-contents)
<a id="2.3.1"></a>

### <span style="color:#6D72C3;">**Quick view**<span>

In [None]:
ebird.head(10)

In [None]:
print(f'Number of rows: {ebird.shape[0]}; Number of columns: {ebird.shape[1]}; No of missing values: {sum(ebird.isna().sum())}')

[back to top](#table-of-contents)
<a id="2.3.2"></a>

### <span style="color:#6D72C3;">**Data types**<span>

In [None]:
ebird.dtypes

[back to top](#table-of-contents)
<a id="3"></a>

# <span style="color:#6D72C3;">**Exploratory Data Analysis**<span>

In [None]:
#ref https://www.kaggle.com/code/m1y7k8/birdclef-2022-eda/notebook#When-are-birds-singing?%E2%8C%9A
import geopandas as gpd
facecolor = '#1D1128'

# initialize an axis
fig, ax = plt.subplots(figsize=(26,20), facecolor=facecolor)
# plot map on axis
countries = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
countries.plot(color="lightgrey", ax=ax)

ax.set_facecolor(facecolor)
for s in ["top","right", "left", "bottom"]:
    ax.spines[s].set_visible(False)

# plot points
cmap = plt.cm.get_cmap('jet')
birds = len(train["primary_label"].unique())
for i, (bird, dfg) in enumerate(train.groupby("primary_label")):
    dfg.longitude = np.around(dfg.longitude, 1)
    dfg.latitude = np.around(dfg.latitude, 1)
    dfgg = dfg.groupby(["longitude", "latitude"]).size().reset_index(name="counts")
    dfgg.plot(x="longitude", y="latitude", kind="scatter", ax=ax, label=bird, alpha=0.5, color='#6D72C3')

ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.25), ncol=15, fancybox=True, shadow=False, facecolor=facecolor, edgecolor=facecolor, labelcolor='#E5D4ED')
# get axes limits
x_lo, x_up = ax.get_xlim()
y_lo, y_up = ax.get_ylim()
# add minor ticks with a specified sapcing (deg)
deg = 5
# add grid
ax.set_xticks(np.arange(np.ceil(x_lo), np.ceil(x_up), deg), minor=True)
ax.set_yticks(np.arange(np.ceil(y_lo), np.ceil(y_up), deg), minor=True)
ax.grid(which='major', axis='y', zorder=0, color='#E5D4ED', linewidth=0.4, alpha=0.3)
ax.grid(which='major', axis='x', zorder=0, color='#E5D4ED', linewidth=0.4, alpha=0.3)
plt.text(-70, 145, 'Geo Scatter Map of Bird Types', color='#E5D4ED', fontsize=36, fontweight='bold');
plt.text(-70.5, 145.5, 'Geo Scatter Map of Bird Types', color='#E5D4ED', fontsize=36, fontweight='bold');
plt.text(-71, 146, 'Geo Scatter Map of Bird Types', color='#6D72C3', fontsize=36, fontweight='bold');


In [None]:
def formatter(v):
    if type(v) is str:
        return v
    if pd.isna(v) or v <= 0:
        return ''
    if v == int(v):
        return f'{v:.0f}'
    return f'{v:.3f}'
#author
fig = plt.figure(figsize=(20, 8), facecolor='#1D1128')
gs = fig.add_gridspec(1, 2)
background_color = '#1D1128'
gs.update(wspace=0.2, hspace=0.5)
run_no = 0
for row in range(0, 1):
    for col in range(0, 2):
        locals()["ax"+str(run_no)] = fig.add_subplot(gs[row, col])
        locals()["ax"+str(run_no)].set_facecolor(background_color)
        for s in ["top","right", 'bottom', 'left']:
            locals()["ax"+str(run_no)].spines[s].set_visible(False)
        run_no += 1  
_ = train['primary_label'].value_counts(ascending=True, normalize=True)[:25] * 100
ax0.barh(_.index, _.values, color=['#6D72C3' for col in range(25)])
ax0.tick_params(axis='y', colors="#E5D4ED")
for ind, val in enumerate(_):
        #print(val)
        ax0.text(val-0.006, ind, formatter(val), fontweight="bold", color='#E5D4ED', fontsize=14)

_ = train['scientific_name'].value_counts(ascending=True, normalize=True)[:25] * 100
ax1.barh(_.index, _.values, color=['#6D72C3' for col in range(25)])
ax1.tick_params(axis='y', colors="#E5D4ED", width=2)
for ind, val in enumerate(_):
        #print(val)
        ax1.text(val-0.006, ind, formatter(val), fontweight="bold", color='#E5D4ED', fontsize=14)

plt.text(-0.05, 27, 'Most Common Classes', color='#E5D4ED', fontsize=36, fontweight='bold');
plt.text(-0.0505, 27.1, 'Most Common Classes', color='#E5D4ED', fontsize=36, fontweight='bold');
plt.text(-0.051, 27.2, 'Most Common Classes', color='#6D72C3', fontsize=36, fontweight='bold');


# IN PROCESS

:( went to bed