#### Randy Baicich                                                                                    

# Capstone Project 2: 

# *Endangered Fish Species Flash Cards*

The dataset sourced to be used in this project was originally collected and is available for download from [Kaggle](https://www.kaggle.com/datasets/harshithgupta/endangered-fish-data?select=Combined_Less.csv).

*In this notebook, I will embark on an exciting journey through the Endangered Fish Species dataset, exploring its rich information and uncovering valuable insights. The primary objective is to conduct a comprehensive analysis of the data, delving into its various attributes and patterns. Additionally, I will leverage the power of Streamlit, a powerful app framework, to build an interactive application that enables users to explore and interact with the dataset in a user-friendly manner. Furthermore, visualizations will play a vital role in presenting the findings and showcasing the trends and relationships within the dataset. By combining data analysis, app development, and visualization techniques, this project aims to shed light on the endangered fish species, raise awareness about their conservation needs, and inspire actions that contribute to the preservation of our aquatic ecosystems. Let's dive in and discover the hidden stories behind the data!*

## Part 1: *Import and Clean the CSV file.*

#### *Import all necessary libraries.*

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

#### *Import the sourced Endangered Fish Species data.*

In [None]:
data = pd.read_csv('C:\Users\RedneckRandy\Documents\GitHub\Capstone-Project-2\Combined_Less.csv')


#### *Clean the CSV file/data.*

In [None]:
# Capitalize all columns
data.columns = [col.capitalize().strip() for col in data.columns]

In [None]:
# Remove extra space in column names
data.columns = data.columns.str.replace(' ', '')

In [None]:
# Remove columns starting with "Unnamed"
data = data.loc[:, ~data.columns.str.startswith('Unnamed')]

## Part 2: *Analysis of the data*.

In [None]:
# Show state with the most occurrences in the "state" column
most_common_state = data['State'].mode()[0]
print("State with the most occurrences:", most_common_state)

In [None]:
# Show Year with the most occurrences in the "Year" column
most_common_year = data['Year'].mode()[0]
print("Year with the most occurrences:", most_common_year)

In [None]:
# Total of 'SUMTOTAL' column
total_sum_total = data['SUMTOTAL'].sum()
print("Total SUMTOTAL:", total_sum_total)


In [None]:
# Find the most common temperature in the "temp" column
most_common_temp = data['Temp'].mode()[0]
print("Most common temperature:", most_common_temp))

# Part 3: *Visualize the data and communicate your results.*

#### *Visualization 1: Bar plot of Species vs State.*

In [None]:
data.groupby('Species')['State'].count().plot(kind='bar')
plt.xlabel('Species')
plt.ylabel('Count')
plt.title('Species vs State')
plt.show()

#### *Visualization 2: Scatter plot of Years.*

In [None]:
data.plot.scatter(x='Year', y='Index')
plt.xlabel('Year')
plt.ylabel('Index')
plt.title('Scatter plot of Year')
plt.show()

#### *Visualization 3: Bar plot of Species vs Temperature.*

In [None]:
data.groupby('Species')['Temp'].mean().plot(kind='bar')
plt.xlabel('Species')
plt.ylabel('Temperature')
plt.title('Species vs Temperature')
plt.show()

#### *Save the new cleaned CSV.*

In [None]:
data.to_csv('endangered_fish_sorted.csv', index=False)

# Part 4: *Findings, Summary, and Conclusion.*