In [None]:
"""
Authors: Zhenggang Li and Ryleigh J. Bruce
Date: May 14th, 2024

## Purpose: This notebook is a fast.ai ML model
## Note: Note: The authors generated this text in part with GPT-4,
OpenAI’s large-scale language-generation model. Upon generating
draft code, the authors reviewed, edited, and revised the code
to their own liking and takes ultimate responsibility for
the content of this code.
"""


This notebook is part of a project exploring the application of AI in wildlife documentation, specifically focusing on the classification of animal images. This introductory section outlines the project's aim and the role of this specific notebook within that context.

## Module: Mounting the File to Google Drive

Here the drive module is imported, allowing the Colab environment to access files within Google Drive. The notebook is then mounted to Google Drive so that it can interact with the files.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

This line of code establishes the variable `file_path` in order to reference the file path of the desired dataset more efficiently in the following modules.

In [None]:
# File path on Google Drive
file_path = '/content/drive/My Drive/shared-data/Notebook datafiles/combined_csv_animal_flag_justanimals_location_flat.xlsx'

# Generating a Graph using Data From an Excel File

## Module: Import Relevant Python Libraries

In this module, we use the `pip` package installer to install the `openpyxl` package, a Python library that is used to easily handle Excel files in Python. This will allow us to write code that will visualize the structured data within an Excel document later on.

In [None]:
!pip install openpyxl

In this module, the `pandas` and `matplotlib.pyplot` libraries are imported under aliases that are more easily recalled in subsequent code. The `pandas` library provides tools that allow for working with structured data, and the `matplotlib.pyplot` library is a plotting library that allows for the creation of visualizations of structured data such as graphs and plots.

In [None]:
#Import the pandas and matplotlib libraries using aliases
import pandas as pd
import matplotlib.pyplot as plt

## Module: Using Panda Library to Read an Excel File

The module below uses the alias `pd` that was established in the previous module to reference the `pandas` library. The `pd.read_excel()` function is used to read data from the file found in the `variable file_path`, and the `engine='openpyxl'` argument specifies that the `'openpyxl'` engine should be used to read the Excel file. This grants the script access to the data located within the Excel spreadsheet.

In [None]:
# Read Excel file
df = pd.read_excel(file_path, engine='openpyxl')

## Module: Creating a DataFrame and Aggregating Data Using the Pandas Library

The following line of code creates the DataFrame `species_sightings`, which groups together rows found in the SpeciesList column consisting of the same value. This allows the script to use the `sum()` function to aggregate the associated values in the SpeciesCount in order to determine the total number of sightings per species in the SpeciesList column.

In [None]:
#Process the data to get total sightings for each species
species_sightings = df.groupby('SpeciesList')['SpeciesCount'].sum().reset_index()

## Module: Plotting a Graph Using Extracted Structured Data

The following module uses functions found within the matplotlib.pyplot library to plot a graph using the extracted data.

Using the `figsize` attribute the script creates a figure that is ten inches wide and six inches tall to plot the graph within. The script then specifies that the bar chart must be created using the SpeciesList category and associated SpeciesCount values found within the `species_sightings` DataFrame. The color of the bar chart is specified using the `color` parameter.

The desired title for the bar graph is then set, as well as the x-axis and y-axis labels which will aid in graph legibility. The x-axis labels are rotated 45 degrees and are aligned to the right of the x-axis ticks to further enhance legibility.

The line plt.tight_layout() automatically adjusts the layout of the bar graph to ensure it fits nicely within the given 10" x 6" figure.

In [None]:
# Plot the data
plt.figure(figsize=(10, 6))
plt.bar(species_sightings['SpeciesList'], species_sightings['SpeciesCount'], color='skyblue')

# Adding titles and labels
plt.title('Total Number of Sightings for Each Animal Species') #adjust to the desired title name
plt.xlabel('Animal Species') #adjust to desired x-axis label
plt.ylabel('Total Number of Sightings') #adjust to desired y-axis label
plt.xticks(rotation=45, ha='right')

# Display the plot
plt.tight_layout()
plt.show()

# Manipulating Exisiting Code to Generate Additional Graphs

Now that a base module of code has been established for generating graphs,  a variety of graphs can be produced by altering simple values within the code.

The following line of code creates a `species_sightings` Dataframe grouped around the MoonPhase column in the datasheet by changing the `'SpeciesList`' string to `'MoonPhase'`.

In [None]:
#Determine the total number of species sightings during each moon phase
species_sightings = df.groupby('MoonPhase')['SpeciesCount'].sum().reset_index()

Since the `matplotlib.pyplot` library was imported previously, the script has access to the production of a range of plotting methods (an extensive list may be found [here](https://matplotlib.org/stable/plot_types/index.html)). To alter the type of plot being produced, simply change the `bar()` function in the second line of the module below to the desired plot type. In this example, it has been changed to `plot()`.

Graph color can also be altered by changing the value of the `color` parameter to the desired color's name or hexadecimal code. In this example, it has been changed from `'skyblue'` to `'#CCCCFF'`, the hexadecimal code for periwinkle.

In [None]:
# Plot the data
plt.figure(figsize=(10, 6))
plt.plot(species_sightings['MoonPhase'], species_sightings['SpeciesCount'], color='#CCCCFF')

# Adding titles and labels
plt.title('Total Number of Sightings for Each Moon Phase')
plt.xlabel('Moon Phase')
plt.ylabel('Total Number of Sightings')
plt.xticks(rotation=45, ha='right')

# Display the plot
plt.tight_layout()
plt.show()