# Exploratory Data Analysis

## Introduction to EDA
1. Collection
2. Preparation
3. Summarisation
4. Visualisation
5. Interpretation

## Shark Research Challenge

Welcome to the Shark Research Challenge! You have been hired as a data scientist by the Oceanic Research Institute to analyze data about sharks. 
The institute has collected detailed information on shark species, gender, geographical locations, and their movement patterns. 
Your expertise is needed to uncover critical insights that can help marine biologists understand shark behaviours and migration patterns better.

### Scenario

The institute is planning a major expedition and needs your help to make data-driven decisions. They have provided you with a dataset named `sharks.csv`, which includes various observations of sharks. The research team has tasked you with the following specific goals:

### Goals

1. **Species Analysis**: Determine the most and least common shark species in the dataset. This will help the team focus on species that may need more conservation efforts or those that are thriving.
2. **Gender and Health Analysis**: Provide insights into the gender, size and weight distribution of sharks. Understanding gender ratios and their characteristics can help in studying breeding patterns and population dynamics. Think about populations, but also consider individual sharks. Are there any interesting insights?
3. **Geographical Analysis**: Map the locations where sharks have been observed. Identifying hotspots can help in planning future research expeditions and conservation efforts. Where and when should you go to find a Great White? How about a Mako?
4. **Temporal Analysis**: Analyze the temporal trends in the dataset to see how shark observations vary over time. This includes understanding seasonal patterns and long-term trends.
5. **Movement Analysis**: Investigate the total distances travelled by the sharks. This will aid in understanding their migratory routes and the extent of their travel.

To assist the team, you need to perform data cleaning, exploratory data analysis (EDA), and generate visualizations. Use your Python and basic statistics knowledge to complete the following tasks.

## Instructions 

### Step 1: Load and Inspect the Dataset

1. Load the dataset from the provided CSV file.
2. Inspect the first few rows and basic information about the dataset.

### Step 2: Data Cleaning

1. Inspect the `weight` and `length` columns. Convert these columns to numerical values (weight in kg and length in meters).
2. Handle missing values in the dataset appropriately.

### Step 3: Species Distribution

1. Analyze the species distribution in the dataset.
2. Create a bar plot to visualize the top 10 shark species by count.

### Step 4: Gender and Characteristics Distribution

1. Analyze the distributions of gender, size and weight in the dataset.
2. Create a bar plot to visualize the gender distribution.

### Step 5: Geographical Distribution

1. Plot the geographical distribution of sharks using latitude and longitude data.

### Step 6: Temporal Trends

1. Convert the `datetime` and `tagDate` columns to datetime objects.
2. Analyze the number of records per year and per month.
3. Create line plots to visualize these trends.

### Step 7: Distance Traveled

1. Analyze the distribution of the total distance traveled by sharks.
2. Create a histogram to visualize the distribution.

### Step 8: Summary of Key Insights

Write a brief summary of the key insights you have gained from the data analysis. Focus on gender distribution, species distribution, geographical distribution, temporal trends, and distance traveled.


---
**Jupyter Refresher**

Click to run, edit code

---

### Data Collection
Import the data and any libraries that you might need

In [2]:
import pandas as pd # Data manipulation
import numpy as np # Data manipulation
import matplotlib.pyplot as plt # Visualisations
import seaborn as sns # Visualisations

Here, you can read the dataset from the CSV file into a "Dataframe", a matrix format similar to an Excel sheet, with rows and columns. Then the `info()` and `describe()` functions below will show you the summarised information from the dataframe and a brief description of the values, respectively. Finally, `head()` shows the first few rows of the dataframe.

In [7]:
file_path = 'sharks.csv'
sharks_dataframe = pd.read_csv(file_path)

In [8]:
# Display basic information about the dataset
sharks_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65793 entries, 0 to 65792
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   active      65793 non-null  int64  
 1   datetime    65793 non-null  object 
 2   id          65793 non-null  int64  
 3   latitude    65793 non-null  float64
 4   longitude   65793 non-null  float64
 5   name        65793 non-null  object 
 6   gender      65647 non-null  object 
 7   species     65793 non-null  object 
 8   weight      60385 non-null  object 
 9   length      65220 non-null  object 
 10  tagDate     65793 non-null  object 
 11  dist_total  65793 non-null  float64
dtypes: float64(3), int64(2), object(7)
memory usage: 6.0+ MB


In [9]:
sharks_df.describe()

Unnamed: 0,active,id,latitude,longitude,dist_total
count,65793.0,65793.0,65793.0,65793.0,65793.0
mean,1.0,119.90999,9.703767,-35.911564,12567.781934
std,0.0,91.296923,31.761692,59.129796,12754.985808
min,1.0,3.0,-45.62415,-103.96867,0.0
25%,1.0,38.0,-28.39886,-75.47251,3048.274
50%,1.0,98.0,27.95428,-68.3984,8177.352
75%,1.0,202.0,38.18823,20.9761,17811.853
max,1.0,326.0,53.65843,155.8543,46553.182


In [6]:
sharks_df.head()

Unnamed: 0,active,datetime,id,latitude,longitude,name,gender,species,weight,length,tagDate,dist_total
0,1,2014-07-06 04:57:28,3,-34.60661,21.15244,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
1,1,2014-06-23 02:40:09,3,-34.78752,19.42479,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
2,1,2014-06-15 13:15:44,3,-34.42487,21.09754,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
3,1,2014-06-03 02:23:57,3,-34.704323,20.210134,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662
4,1,2014-05-28 19:53:57,3,-34.65556,19.37459,Oprah,Female,White Shark (Carcharodon carcharias),686 lb,9 ft 10 in.,7 March 2012,2816.662


### Species Analysis

