# Bird Analysis

In [1]:
import pandas as pd
import plotly.express as px

In [2]:
df = pd.read_csv('birds_ch_2018-2022.csv', delimiter=';')
df.head()

Unnamed: 0,ID_SIGHTING,ID_SPECIES,NAME_SPECIES,DATE,TIMING,COORD_LAT,COORD_LON,PRECISION,ALTITUDE,TOTAL_COUNT,ATLAS_CODE_CH,ID_OBSERVER
0,14731644,371.0,Blaumeise,2018-01-21,,46.217211,7.582658,Exakte Lokalisierung,1150,1.0,0,11750.0
1,15360340,361.0,Saatkrähe,2018-03-24,10:41:00,46.923721,7.481304,Exakte Lokalisierung,510,,0,2246.0
2,15360731,358.0,Rabenkrähe,2018-03-24,,46.887983,7.545741,Ort,520,,0,3539.0
3,15360732,495.0,Feldsperling,2018-03-24,,46.887983,7.545741,Ort,520,,0,3539.0
4,15360733,518.0,Buchfink,2018-03-24,,46.887983,7.545741,Ort,520,,0,3539.0


## Columns
- ID
- Species ID & Name
- Date & Time
- Coordinates, Precision & Altitude
- Count
- Atlas Code
- Observer ID

## Ideas
- Basic Stats
    - How many of each species? 
    - How often are they sighted? ✅
    - Do we have power users? ✅
    - Which columns are often missing? ✅
    - Kartierung


- User Error
    - Day and time
    - Coordinates on Map
    - In correlation with population (e.g. only rare birds are listed, pigeons etc. are not listed since they are so many)



- Feasibility of approach
    - How many sights per species / balance ?
    - Feature importance
    - Precision of plain models

- Extra Ideas
    - Features
        - Slope
        - Aspect
    - Seltene Sichtung? -> eBird Meldung nachschauen
    - Grenzsichtungen - Länderüberegreifend gegenchecken

- Für Simon
    - spatiale Analyse
    - Datenbeschaffung
    - here Maps

# Basic Stats

## NaN values 💬

In [3]:
df.isnull().sum() * 100 / len(df)

ID_SIGHTING       0.000000
ID_SPECIES        0.000645
NAME_SPECIES      0.000000
DATE              0.000000
TIMING           63.002890
COORD_LAT         0.000000
COORD_LON         0.000000
PRECISION         0.000000
ALTITUDE          0.000000
TOTAL_COUNT      12.565139
ATLAS_CODE_CH     0.000000
ID_OBSERVER       0.000010
dtype: float64

## Birdos 🦜 

### How many species?

In [4]:
n_species = df['NAME_SPECIES'].nunique()
print('Number of species in dataset:', n_species)

Number of species in dataset: 497


### How many of each species? 

In [3]:
n_per_species = df.groupby('NAME_SPECIES').size()

In [6]:
fig = px.bar(n_per_species)
fig.show()

### Top 10 birdo's

In [4]:
fig = px.bar(n_per_species.sort_values(ascending=True))
fig.show()

## Observers  👀

### How many observers?

In [8]:
n_observers = df['ID_OBSERVER'].nunique()
print('Number of observers in dataset:', n_observers)

Number of observers in dataset: 8885


### Top 10 observers

In [9]:
df['ID_OBSERVER'] = df['ID_OBSERVER'].astype(str)
n_per_observer = df.groupby('ID_OBSERVER').size().sort_values(ascending=False)
n_per_observer

ID_OBSERVER
11442.0    227058
1270.0     182857
1052.0     166358
3135.0     145420
9615.0     116936
            ...  
22335.0         1
22338.0         1
22339.0         1
22340.0         1
nan             1
Length: 8886, dtype: int64

In [10]:
fig = px.bar(n_per_observer[0:10])
fig.show()