<a href="https://colab.research.google.com/github/ishadvay3928/Local-Food-Wastage-Management-System-Project/blob/main/Local_Food_Wastage_Management_System_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Local Food Wastage Management System Analysis**



##### **Contribution**    - Individual


# **GitHub Link -**

https://github.com/ishadvay3928/Bird-Species-Observation-Analysis/blob/main/Bird_Species_Observation_EDA_Analysis.ipynb

# **Problem Statement**


The project aims to analyze the distribution and diversity of bird species in two distinct ecosystems: forests and grasslands. By examining bird species observations across these habitats, the goal is to understand how environmental factors, such as vegetation type, climate, and terrain, influence bird populations and their behavior. The study will involve working on the provided observational data of bird species present in both ecosystems, identifying patterns of habitat preference, and assessing the impact of these habitats on bird diversity. The findings can provide valuable insights into habitat conservation, biodiversity management, and the effects of environmental changes on avian communities.



#### **Define Your Business Objective?**

* Study bird species distribution and diversity across forest and grassland habitats.

* Identify habitat preferences and key environmental influences on bird presence.

* Determine peak observation times to optimize fieldwork and tourism activities.

* Highlight dominant and rare species for targeted conservation efforts.

* Recognize top observers to improve and motivate data collection.

* Provide actionable insights for biodiversity conservation and eco-tourism
planning.



# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load forest Dataset

import pandas as pd
# Specify the file path
file_path = "/content/Bird_Monitoring_Data_FOREST.XLSX"

# Read the Excel file with multiple sheets
excel_data = pd.ExcelFile(file_path)

# Get all sheet names
sheet_names = excel_data.sheet_names

# Read data from all sheets into a dictionary
sheets_dict = {sheet: excel_data.parse(sheet) for sheet in sheet_names}

In [None]:
Forest_combined_df = pd.concat(
    [df.assign(Sheet=sheet_name) for sheet_name, df in sheets_dict.items()],
    ignore_index=True
)

In [None]:
# later u can Drop the 'Sheet' column
Forest_combined_df = Forest_combined_df.drop(columns=['Sheet'])

In [None]:
# Load grassland Dataset

import pandas as pd
# Specify the file path
file_path = "/content/Bird_Monitoring_Data_GRASSLAND.XLSX"

# Read the Excel file with multiple sheets
excel_data = pd.ExcelFile(file_path)

# Get all sheet names
sheet_names = excel_data.sheet_names

# Read data from all sheets into a dictionary
sheets_dict = {sheet: excel_data.parse(sheet) for sheet in sheet_names}

In [None]:
Grassland_combined_df = pd.concat(
    [df.assign(Sheet=sheet_name) for sheet_name, df in sheets_dict.items()],
    ignore_index=True
)

In [None]:
# later u can Drop the 'Sheet' column
Grassland_combined_df = Grassland_combined_df.drop(columns=['Sheet'])

### Dataset First View

In [None]:
# Dataset First Look
Forest_combined_df.head()

In [None]:
Grassland_combined_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
Forest_combined_df.shape

In [None]:
Grassland_combined_df.shape

### Dataset Information

In [None]:
# Dataset Info
Forest_combined_df.info()

In [None]:
Grassland_combined_df.info()

#### Duplicate Values

In [None]:
# Duplicate Value Count
Forest_combined_df.duplicated().sum()

In [None]:
Grassland_combined_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count of datasets
Forest_combined_df.isnull().sum()

In [None]:
Grassland_combined_df.isnull().sum()

In [None]:
# Visualizing the missing values
import missingno as msno
msno.bar(Forest_combined_df)

In [None]:
import missingno as msno
msno.bar(Grassland_combined_df)

### What did you know about your dataset?

**Forest dataset**
- There are 8546 rows and 29 columns in the dataset.
- Out of which 5 Columns have missing Values. Column 'Sub_Unit_Code' have most missing values of 7824.
- Out of all 'ID_Method' Have least missing values of 1.

**Grassland dataset**
- There are 8531 rows and 29 columns in the dataset.
- Out of which 5 Columns have missing Values. Column 'Sub_Unit_Code' have most missing values of 8531.
- Out of all 'ID_Method' Have least missing values of 1.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
Forest_combined_df.columns

In [None]:
Grassland_combined_df.columns

In [None]:
#Dataset Describe
Forest_combined_df.describe(include='all')

In [None]:
Grassland_combined_df.describe(include='all')

### Variables Description

**The dataset contains observational data for bird species recorded across multiple forest and grassland sites. It includes detailed columns describing location, observation methods, bird species, and environmental conditions.**

- Admin_Unit_Code: The code for the administrative unit (e.g., "ANTI") where the observation was conducted.
- Sub_Unit_Code: The sub-unit within the administrative unit for further classification.
- Site_Name: The name of the specific observation site within the unit.
- Plot_Name: A unique identifier for the specific plot where observations were recorded.
- Location_Type: The habitat type of the observation area (e.g., "Forest").
- Year: The year in which the observation took place.
- Date: The exact date of the observation.
- Start_Time: The start time of the observation session.
- End_Time: The end time of the observation session.
- Observer: The individual who conducted the observation.
- Visit: The count of visits made to the same observation site or plot.
- Interval_Length: The duration of the observation interval (e.g., "0-2.5 min").
- ID_Method: The method used to identify the species (e.g., "Singing," "Calling," "Visualization").
- Distance: The distance of the observed species from the observer (e.g., "<= 50 Meters").
- Flyover_Observed: Indicates whether the bird was observed flying overhead (TRUE/FALSE).
- Sex: The sex of the observed bird (e.g., Male, Female, Undetermined).
- Common_Name: The common name of the observed bird species (e.g., "Eastern Towhee").
- Scientific_Name: The scientific name of the observed bird species (e.g., Pipilo erythrophthalmus).
- AcceptedTSN: The Taxonomic Serial Number for the observed species.
- NPSTaxonCode: A unique code assigned to the taxon of the species.
- AOU_Code: The American Ornithological Union code for the species.
- PIF_Watchlist_Status: Indicates whether the species is on the Partners in  Flight Watchlist (e.g., "TRUE" for at-risk species).
- Regional_Stewardship_Status: Denotes the conservation priority within the region (TRUE/FALSE).
- Temperature: The temperature recorded at the time of observation (in degrees).
- Humidity: The humidity percentage recorded at the time of observation.
- Sky: The sky condition during the observation (e.g., "Cloudy/Overcast").
- Wind: The wind condition (e.g., "Calm (< 1 mph) smoke rises vertically").
- Disturbance: Notes any disturbances that could affect the observation (e.g., "No effect on count").
- Initial_Three_Min_Cnt: The count of the species observed in the first three minutes of the session.

**Sheets Information:**

The Excel file contains multiple sheets representing different administrative units, with their codes matching the Admin_Unit_Code column:

- ANTI: Data for the Antietam National Battlefield.
- CATO: Data for the Catoctin Mountain Park.
- CHOH: Data for the Chesapeake and Ohio Canal National Historical Park.
- GWMP: Data for the George Washington Memorial Parkway.
- HAFE: Data for Harpers Ferry National Historical Park.
- MANA: Data for the Manassas National Battlefield Park.
- MONO: Data for the Monocacy National Battlefield.
- NACE: Data for the National Capital East Parks.
- PRWI: Data for the Prince William Forest Park.
- ROCR: Data for the Rock Creek Park.
- WOTR: Data for the Wolf Trap National Park for the Performing Arts.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable of dataset.
Forest_combined_df.nunique()

In [None]:
Grassland_combined_df.nunique()

## ***3. Data Wrangling***

### Data Wrangling Code

In [None]:
# Merge both datasets
Merged_df = pd.concat([Forest_combined_df, Grassland_combined_df], ignore_index=True)

In [None]:
# Drop Sub_Unit_Code column as it has very less non-null values
Merged_df.drop(columns=['Sub_Unit_Code'], inplace=True)

In [None]:
# impute null values in Site_Name column
Merged_df['Site_Name'] = Merged_df['Site_Name'].fillna("Unknown")

# impute null values in Distance column
Merged_df['Distance'] = Merged_df['Distance'].fillna("Unknown")

# impute null values in Sex column
Merged_df['Sex'] = Merged_df['Sex'].fillna("Undetermined")

# impute null values in NPSTaxonCode column
Merged_df['NPSTaxonCode'] = Merged_df['NPSTaxonCode'].fillna("N/A")

# impute null values in TaxonCode column
Merged_df['TaxonCode'] = Merged_df['TaxonCode'].fillna("N/A")

In [None]:
# impute null values in Previously_Obs column using mode
Merged_df['Previously_Obs'] = Merged_df['Previously_Obs'].fillna(Merged_df['Previously_Obs'].mode()[0])

In [None]:
# Drop rows where 'ID_Method' or 'AcceptedTSN' are null
Merged_df = Merged_df.dropna(subset=['ID_Method', 'AcceptedTSN'])

In [None]:
# CHANGE DATATYPES

# Fix Year column
Merged_df['Year'] = pd.to_numeric(Merged_df['Year'], errors='coerce').astype('Int64')

# Handle 'Start_time' and ''End_Time' Columns
# Convert to string and strip spaces
Merged_df['Start_Time'] = Merged_df['Start_Time'].astype(str).str.strip()
Merged_df['End_Time'] = Merged_df['End_Time'].astype(str).str.strip()

# Extract only the HH:MM:SS part (last 8 characters)
Merged_df['Start_Time'] = Merged_df['Start_Time'].str[-8:]
Merged_df['End_Time'] = Merged_df['End_Time'].str[-8:]

# Convert to proper time format
Merged_df['Start_Time'] = pd.to_datetime(Merged_df['Start_Time'], format='%H:%M:%S', errors='coerce').dt.time
Merged_df['End_Time'] = pd.to_datetime(Merged_df['End_Time'], format='%H:%M:%S', errors='coerce').dt.time

# Create a column for observation hour
Merged_df['Observation_Hour'] = pd.to_datetime(Merged_df['Start_Time'].astype(str), format='%H:%M:%S').dt.hour


In [None]:
bool_cols = ['Flyover_Observed', 'PIF_Watchlist_Status',
             'Regional_Stewardship_Status', 'Initial_Three_Min_Cnt']

for col in bool_cols:
    Merged_df[col] = Merged_df[col].astype(str).str.strip().str.lower().map(
        {'true': True, 'false': False, 'yes': True, 'no': False}
    )
cat_cols = ['Admin_Unit_Code', 'Site_Name', 'Plot_Name', 'Location_Type',
            'Observer', 'ID_Method', 'Distance', 'Sex', 'Common_Name',
            'Scientific_Name', 'Sky', 'Wind', 'Disturbance']

for col in cat_cols:
    Merged_df[col] = Merged_df[col].astype('category')

In [None]:
# Drop Duplicates from merged dataset
Merged_df.drop_duplicates(inplace=True)

In [None]:
Merged_df.info()

In [None]:
# Save cleaned dataset
Merged_df.to_csv("Bird_Monitoring_Clean_Merged_dataset.csv", index=False)

### What all manipulations have you done and insights you found?

#### **Key Manipulations:**

* **Merged Forest and Grassland Datasets** to create a single unified dataset for analysis.
* **Dropped `Sub_Unit_Code`** as it contained very few non-null values (sparse, low-utility data).
* **Imputed Missing Values**:

  * `Site_Name` and `Distance` → `"Unknown"`
  * `Sex` → `"Undetermined"`
  * `NPSTaxonCode` & `TaxonCode` → `"N/A"`
  * `Previously_Obs` → Filled using the most frequent (mode) value.
* **Dropped Rows with Missing `ID_Method` and `AcceptedTSN`** to ensure essential identification information is retained.
* **Converted `Year` to Integer** (`Int64`) for consistent numeric analysis.
* **Cleaned and Standardized Time Fields** (`Start_Time` and `End_Time`): removed extra spaces, extracted `HH:MM:SS`, converted to proper time format.
* **Created `Observation_Hour` Column** from `Start_Time` to enable hourly trend analysis.
* **Standardized Boolean-like Columns** (`Flyover_Observed`, `PIF_Watchlist_Status`, `Regional_Stewardship_Status`, `Initial_Three_Min_Cnt`) by mapping variations like `'yes'/'no'` and `'true'/'false'` to `True`/`False`.
* **Converted Categorical Columns** (e.g., location codes, observer, species, environmental conditions) to category dtype for efficiency and consistency.
* **Removed Duplicate Rows** to maintain data integrity.


#### **Insights Gained:**

* **Data Deduplication** ensures no repeated entries, preventing double counting in species observations.
* **Consistent Missing Value Handling** preserves maximum usable data while avoiding gaps in analysis.
* **Categorical Standardization** improves the accuracy of grouping, filtering, and summary statistics.
* **Time Cleaning and Hour Extraction** enables meaningful time-based pattern detection (e.g., peak bird activity hours).
* **Boolean Standardization** supports reliable filtering and aggregation for conservation status and observation methods.
* **Dropping Low-Value Columns** removes noise and improves dataset quality for focused analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 (Habitat Type Distribution)

In [None]:
# Habitat Type Distribution
plt.figure(figsize=(12, 6))
Merged_df['Location_Type'].value_counts().plot(kind='pie',autopct ='%1.1f%%', colors =['cornflowerblue', 'plum'])
plt.title("Observations by Habitat Type")
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pie chart because it clearly shows the proportional distribution of observations across different habitat types.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that 55.6% of observations occur in forests, while 44.4% are in grasslands.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this insight can guide resource allocation toward forest habitats, which have higher observations. No negative growth is indicated, but grassland-focused strategies might need extra attention to balance biodiversity monitoring.

#### Chart - 2 (Sex Distribution)

In [None]:
# Sex Ratio
sex_ratio = Merged_df['Sex'].value_counts(normalize=True) * 100
sex_ratio.plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=['silver', 'lightblue', 'pink'])
plt.ylabel("")
plt.title("Sex Distribution of Observed Birds")
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pie chart as it visually emphasizes the large disparity in sex identification among observed birds.

##### 2. What is/are the insight(s) found from the chart?

The data shows that 79% of observations have undetermined sex, 20.2% are male, and only 0.8% are female.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, This insight signals a gap in data quality — improving sex identification could enhance research and conservation planning, while leaving it unresolved may hinder accurate analysis and lead to less effective strategies.

#### Chart - 3 (Top 10 Bird Species)

In [None]:
# top 10 Bird Species
top_birds = Merged_df['Common_Name'].value_counts().nlargest(10)

# Convert to DataFrame for seaborn
top_birds_df = top_birds.reset_index()
top_birds_df.columns = ['Common_Name', 'Observation_Count']

plt.figure(figsize=(12, 6))
sns.barplot(
    data=top_birds_df,
    x='Observation_Count',
    y='Common_Name',
    palette='Set3',
    order=top_birds_df['Common_Name']
)
plt.title("Top 10 Bird Species by Observation Count")
plt.xlabel("Count")
plt.ylabel("Bird Species")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a horizontal bar chart as it effectively compares observation counts across multiple bird species.


##### 2. What is/are the insight(s) found from the chart?

The Northern Cardinal and Carolina Wren have the highest observation counts, indicating their dominance in the dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this insight can guide targeted conservation or tourism initiatives for popular species, with no direct negative growth, though lower-count species may require more monitoring to avoid decline.

#### Chart - 4 (Distance Distribution)

In [None]:
# Distance Distribution
plt.figure(figsize=(8, 6))
sns.countplot(data=Merged_df, x='Distance', palette='viridis')
plt.title("Bird Observation Distance Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen to effectively compare the counts of observations across different distance categories, as it clearly shows the relative magnitude of each group.

##### 2. What is/are the insight(s) found from the chart?

The insights are that most bird observations occurred at a distance of 50-100 meters, followed by those within 50 meters. A significantly smaller number of observations had an unknown distance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, This data could be used to optimize equipment or training for observers, potentially leading to a positive business impact. However, there are no insights that directly indicate negative growth, as the chart simply shows a distribution of data without any associated negative outcomes.

#### Chart - 5 (Top Observers)

In [None]:
# Top Observers
top_observers = Merged_df['Observer'].value_counts()
sns.barplot(x=top_observers.index, y=top_observers.values, palette='Set3')
plt.title("Top Observers")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen to easily compare the number of observations made by the top three observers.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that Elizabeth Oswald is the most frequent observer, followed closely by Kimberly Serno, with Brian Swimelar having the lowest number of observations among the top three.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from this chart could help create a positive business impact by identifying and rewarding the top observers, which can motivate them and others to increase their contributions. There are no insights that directly indicate negative growth, as the chart simply highlights the varying performance among the top contributors without any negative context.

#### Chart - 6 (Hourly Observations by hour)

In [None]:
# Hourly Observations by hour
plt.figure(figsize=(10, 6))
sns.countplot(data=Merged_df, x='Observation_Hour', palette='coolwarm')
plt.title("Bird Observations by Hour of Day")
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen to display the count of bird observations for each hour, as it effectively compares the number of observations across a discrete time series.

##### 2. What is/are the insight(s) found from the chart?

The insights are that bird observations peak at 7 a.m., with a high volume also occurring at 6 a.m. The number of observations decreases steadily after 7 a.m., with the lowest number of observations happening at 10 a.m.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can lead to a positive business impact by helping organizations schedule and optimize resources for bird-watching tours or research, concentrating efforts during peak observation times. There are no insights that directly indicate negative growth, as the chart simply shows the temporal distribution of observations.

#### Chart - 7 (Bird Species Observations Over Time)

In [None]:
# Bird Species Observations Over Time

# Group by Date and count bird observations
bird_counts = Merged_df.groupby('Date')['Common_Name'].count().reset_index()

plt.figure(figsize=(10, 4))
sns.lineplot(data=bird_counts, x='Date', y='Common_Name')

plt.title("Bird Species Observations Over Time")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A line chart was chosen to show the trend of bird species observations over time, as it is the most effective way to visualize changes and patterns over a continuous period.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals significant daily fluctuations in the number of bird species observations, with no clear long-term upward or downward trend. There are several noticeable peaks and valleys throughout the observed period.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can help businesses by informing them of the variability in daily observations, which could be useful for scheduling and resource planning. There are no insights that directly indicate negative growth, as the chart shows fluctuation rather than a consistent decline.

#### Chart - 8 (Boolean Category Distributions)

In [None]:
# Boolean Category Distributions

# Create subplots: 1 row, 4 columns
fig, axes = plt.subplots(1, 4, figsize=(18, 6))

# Plot 1: Flyover Observed
sns.countplot(data=Merged_df, x='Flyover_Observed', hue='Flyover_Observed', palette='Set1', legend=False, ax=axes[0])
axes[0].set_title('Flyover Observed')
axes[0].tick_params(axis='x', rotation=0)

# Plot 2: PIF Watchlist Status
sns.countplot(data=Merged_df, x='PIF_Watchlist_Status', hue='PIF_Watchlist_Status', palette='Set2', legend=False, ax=axes[1])
axes[1].set_title('PIF Watchlist Status')
axes[1].tick_params(axis='x', rotation=0)

# Plot 3: Regional Stewardship Status
sns.countplot(data=Merged_df, x='Regional_Stewardship_Status', hue='Regional_Stewardship_Status', palette='coolwarm', legend=False, ax=axes[2])
axes[2].set_title('Regional Stewardship Status')
axes[2].tick_params(axis='x', rotation=0)

# Plot 4: Initial_Three_Min_Cnt
sns.countplot(data=Merged_df, x='Initial_Three_Min_Cnt', hue='Initial_Three_Min_Cnt', palette='viridis', legend=False, ax=axes[3])
axes[3].set_title('Initial_Three_Min_Cnt')
axes[3].tick_params(axis='x', rotation=0)

# Main title
fig.suptitle('Boolean Category Distributions in Bird Observations', fontsize=18)

# Adjust layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()




##### 1. Why did you pick the specific chart?

A bar charts was chosen to compare the distributions of observations for four different boolean categories, as this layout allows for a clear, side-by-side comparison of the counts for 'True' and 'False' for each category.

##### 2. What is/are the insight(s) found from the chart?

The insights are that in most observations, the bird was not a flyover, was not on the PIF Watchlist, and was observed for more than three minutes. The Regional Stewardship Status, however, was 'False' in the majority of cases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can lead to a positive business impact by helping to focus conservation efforts on birds with a 'True' PIF Watchlist or Regional Stewardship Status, which are relatively low in number. There are no insights that directly indicate negative growth, as the charts simply provide a distribution of data without any negative outcomes.

#### Chart - 9 (Temperature Vs Humidity By Distance)

In [None]:
# Temperature Vs Humidity By Distance
sns.scatterplot(data=Merged_df, x='Temperature', y='Humidity', hue='Distance', alpha=0.6)
plt.title("Temperature vs Humidity by Distance")
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot was chosen to visualize the relationship between temperature and humidity, while using different colors to represent different observation distances. This allows for the simultaneous analysis of three variables.

##### 2. What is/are the insight(s) found from the chart?

The chart shows no clear correlation between temperature and humidity. Observations were made across a wide range of temperatures and humidity levels, and there is no apparent clustering based on the observation distance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from this chart do not directly suggest a positive or negative business impact, as they indicate a lack of correlation. The data shows that the distance of an observation is not dependent on temperature or humidity, which could be useful for businesses to know when planning observations, but does not inherently create growth or decline.











#### Chart - 10 (Sex Distribution by Habitat Type)

In [None]:
# Sex Distribution by Habitat Type
sns.countplot(data=Merged_df, x='Sex', hue='Location_Type', palette='pastel')
plt.title("Sex Distribution by Habitat")
plt.show()

##### 1. Why did you pick the specific chart?

A grouped bar chart was chosen to effectively compare the sex distribution of birds across two different habitat types (Forest and Grassland) side-by-side.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that most observations in both habitats have an undetermined sex. A higher number of male birds were observed in grasslands, while a higher number of observations with undetermined sex occurred in forests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The insights can help businesses by highlighting the need for more accurate sex determination during observations, especially for forest habitats. This could lead to improved data quality and a positive impact, but there is no direct evidence of negative growth.

#### Chart - 11 (Top Bird Species Distribution Across Environmental Factors)

In [None]:
# Top Bird Species Distribution Across Environmental Factors

# Create subplots: 1 row, 3 columns
fig, axes = plt.subplots(1, 3, figsize=(24, 8), sharey=True)  # sharey for same y-axis scale

# Plot 1: Top Bird species by Wind
sns.countplot(
    data=Merged_df,
    x='Common_Name', hue='Wind', palette='coolwarm',
    order=top_birds_df['Common_Name'],
    ax=axes[0]
)
axes[0].set_title("Top Bird Species by Wind")
axes[0].tick_params(axis='x', rotation=90)
axes[0].legend(title="Wind")

# Plot 2: Top Bird species by Sky
sns.countplot(
    data=Merged_df,
    x='Common_Name', hue='Sky', palette='Set2',
    order=top_birds_df['Common_Name'],
    ax=axes[1]
)
axes[1].set_title("Top Bird Species by Sky")
axes[1].tick_params(axis='x', rotation=90)
axes[1].legend(title="Sky")

# Plot 3: Top Bird species by Disturbance
sns.countplot(
    data=Merged_df,
    x='Common_Name', hue='Disturbance', palette='viridis',
    order=top_birds_df['Common_Name'],
    ax=axes[2]
)
axes[2].set_title("Top Bird Species by Disturbance")
axes[2].tick_params(axis='x', rotation=90)
axes[2].legend(title="Disturbance")

# Main title
fig.suptitle("Top Bird Species Distribution Across Environmental Factors", fontsize=20)

# Adjust layout
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()


##### 1. Why did you pick the specific chart?

A grouped bar chart was chosen to effectively compare the distribution of the top bird species across three different environmental factors (wind, sky, and disturbance) side-by-side. This allows for a detailed comparison of how each species' observations are influenced by these factors.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that most observations for the top bird species occurred during calm wind conditions, under clear or few clouds, and when there was no or moderate disturbance. The specific distribution varies by bird species.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help businesses by informing the optimal conditions for birdwatching tours or research, leading to a positive business impact. By knowing when to plan activities for a higher chance of observation, they can improve customer satisfaction or research outcomes. No direct negative growth is indicated, as the data simply shows correlations.

#### Chart - 12 (Observer Activity by Habitat Type)

In [None]:
# Observer Activity by Habitat Type
plt.figure(figsize =(8,6))
sns.countplot(data=Merged_df, x='Observer', hue='Location_Type', order=top_observers.index, palette='viridis')
plt.title("Top Observers by Habitat Type")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A grouped bar chart was chosen to effectively compare the number of observations made by each of the top observers in two different habitat types, Forest and Grassland, allowing for a side-by-side comparison.

##### 2. What is/are the insight(s) found from the chart?

The insights are that all three top observers made more observations in forest habitats than in grasslands. Elizabeth Oswald made the most observations overall, with the largest difference between her forest and grassland counts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can create a positive business impact by helping to understand and leverage the strengths of top observers. For instance, an organization could identify Elizabeth Oswald as a potential leader for forest-based observation projects. There are no insights that directly indicate negative growth, as the chart shows a difference in contribution rather than a decline.










#### Chart - 13 (Observation Hour by Habitat Type)

In [None]:
# Observation Hour vs Season vs Habitat
plt.figure(figsize =(8,6))
sns.countplot(data=Merged_df, x='Observation_Hour', hue='Location_Type', palette='coolwarm')
plt.title("Observation Hour by Habitat Type")
plt.show()


##### 1. Why did you pick the specific chart?

A grouped bar chart was chosen to effectively compare the number of observations across different hours of the day for two distinct habitat types, Forest and Grassland.

##### 2. What is/are the insight(s) found from the chart?

The insights are that forest observations peak at 7 a.m. and then decline, while grassland observations are more consistent in the morning and peak slightly later. At 10 a.m., there are more grassland observations than forest ones.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can help create a positive business impact by informing better scheduling of activities. For example, a business could organize forest-based tours earlier in the morning and grassland tours later in the morning to maximize observation opportunities. There are no insights that directly indicate negative growth, as the chart only shows the distribution of observations over time.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

* **Prioritize habitat-specific strategies** — Focus more monitoring and conservation in forests, which currently have higher observation counts, while strengthening grassland biodiversity programs to balance efforts.
* **Enhance data collection quality** — Train observers to improve sex determination and accurately record observation distances to close current data gaps.
* **Leverage peak observation times** — Schedule eco-tourism activities and research during early morning hours (especially 6–7 a.m.) to maximize sightings.
* **Target conservation for rare species** — Create special monitoring programs for low-observation species to prevent population decline.
* **Empower top observers** — Recognize and incentivize leading contributors like Elizabeth Oswald to mentor others and improve team-wide performance.
* **Integrate environmental condition tracking** — Use insights on wind, sky, and disturbance levels to plan optimal birdwatching and research conditions.
* **Diversify observation coverage** — Increase monitoring in varied seasons and times to capture a more complete picture of bird diversity.
* **Promote eco-tourism marketing** — Highlight popular species and high-sighting times to attract birdwatching enthusiasts and boost tourism revenue.
* **Collaborate with conservation bodies** — Partner with wildlife NGOs, research institutions, and local communities to implement habitat-specific conservation measures.


# **Conclusion**

This analysis highlights significant patterns in bird species distribution across forests and grasslands, revealing valuable insights for conservation and biodiversity management. Forest habitats showed slightly higher observation rates, with notable dominance by a few adaptable species. Observation success was strongly influenced by early morning hours, calm weather, and low disturbance. However, gaps in sex identification and limited monitoring of grassland species indicate areas for improvement. Leveraging top observers, enhancing data collection accuracy, and balancing habitat focus can strengthen conservation outcomes. These findings provide a foundation for informed decision-making, optimized resource allocation, and sustainable biodiversity preservation efforts in diverse ecosystems.
