<p style="text-align:center">
    <a href="https://www.ict.mahidol.ac.th/en/" target="_blank">
    <img src="https://www3.ict.mahidol.ac.th/ICTSurveysV2/Content/image/MUICT2.png" width="400" alt="Faculty of ICT">
    </a>
</p>

# Lab04: Basic Visualization

This lab assesses your ability to create effective data visualizations using Matplotlib and Seaborn. You will work with real-world datasets to generate various plot types, including line plots for time series data, bar charts for categorical data, and histograms/scatter plots for numerical data.

__Intructions:__
1. Append your ID at the end of this jupyter file name. For example, ```ITCS227_Lab04_Assignment_6788123.ipynb```
2. Complete each task in the lab.
3. Once finished, raise your hand to call a TA.
4. The TA will check your work and give you an appropriate score.
5. Submit the source code to MyCourse as record-keeping.

## Task 01: Time Series Visualization with Matplotlib and Seaborn

In this task, we'll use the Global Land Temperatures By Major City dataset from Berkeley Earth data ('files/GlobalLandTemperatures-small.csv.csv'). For this example, we'll focus on Bangkok, Thailand.

In [1]:
!pip install pandas matplotlib



In [2]:
# Install necessary libraries (if you haven't already):
# !pip install pandas matplotlib seaborn requests

#%matplotlib notebook

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Load the dataset into a dataframe
try:
    df = pd.read_csv('files/GlobalLandTemperatures-small.csv')
except pd.errors.ParserError as e:
    print(f"Error parsing CSV data: {e}")
    raise

In [4]:
# Filter data for Bangkok
bangkok_df = df[df['City'] == 'Bangkok'].copy()

# Convert 'dt' to datetime
bangkok_df['dt'] = pd.to_datetime(bangkok_df['dt'])

# Set 'dt' as index
bangkok_df.set_index('dt', inplace=True)

print("First 5 rows of Bangkok temperature data:")
print(bangkok_df.head())
print("\nDataset Info:")
bangkok_df.info()

First 5 rows of Bangkok temperature data:
            AverageTemperature  AverageTemperatureUncertainty     City  \
dt                                                                       
1950-01-01              25.109                          0.194  Bangkok   
1950-02-01              27.185                          0.301  Bangkok   
1950-03-01              29.122                          0.356  Bangkok   
1950-04-01              29.255                          0.387  Bangkok   
1950-05-01              28.805                          0.437  Bangkok   

             Country Latitude Longitude  
dt                                       
1950-01-01  Thailand   13.66N    99.91E  
1950-02-01  Thailand   13.66N    99.91E  
1950-03-01  Thailand   13.66N    99.91E  
1950-04-01  Thailand   13.66N    99.91E  
1950-05-01  Thailand   13.66N    99.91E  

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 765 entries, 1950-01-01 to 2013-09-01
Data columns (total 6 columns):
 #   Co

###  Basic Line Plots (Matplotlib)
Plot the average temperature over time - AverageTemperature on the y-axis and Date on the x-axis.

In [5]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(12, 6))
plt.plot(bangkok_df.index, bangkok_df['AverageTemperature'])
plt.title('Average Temperature in Bangkok Over Time')
plt.xlabel('Date')
plt.ylabel('Average Temperature (°C)')
plt.show()
```
</details>

### Time Series Plots with Seaborn
Use Seaborn's lineplot for a smoother visualization of AverageTemperature (y-axis) and dates (x-axis).

In [6]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(12, 6))
sns.lineplot(x=bangkok_df.index, y='AverageTemperature', data=bangkok_df)
plt.title('Average Temperature in Bangkok Over Time (Seaborn)')
plt.xlabel('Date')
plt.ylabel('Average Temperature (°C)')
plt.show()
```
</details>

### Rolling Statistics
a) Calculate a 12-month rolling average of the average temperature.

In [7]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
bangkok_df['12_month_rolling_avg'] = bangkok_df['AverageTemperature'].rolling(window=12).mean()
```
</details>


b) Plot a 12-month rolling average of the average temperature, using either the traditional Matplotlib or Seaborn methods.

In [8]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
#Matplotlib 
plt.figure(figsize=(12, 6))
plt.plot(bangkok_df.index, bangkok_df['AverageTemperature'], label='Average Temperature', alpha=0.7)
plt.plot(bangkok_df.index, bangkok_df['12_month_rolling_avg'], label='12-Month Rolling Average', color='red')
plt.title('Average Temperature and 12-Month Rolling Average in Bangkok')
plt.xlabel('Date')
plt.ylabel('Average Temperature (°C)')
plt.legend()
plt.show()

#Seaborn
plt.figure(figsize=(12, 6))
sns.lineplot(x=bangkok_df.index, y='AverageTemperature', data=bangkok_df, label='Average Temperature', alpha=0.7)
sns.lineplot(x=bangkok_df.index, y='12_month_rolling_avg', data=bangkok_df, label='12-Month Rolling Average')
plt.title('Average Temperature and 12-Month Rolling Average in Bangkok (Seaborn)')
plt.xlabel('Date')
plt.ylabel('Average Temperature (°C)')
plt.legend()
plt.show()
```
</details>

## Task 02: Visualizing Numerical and Categorical Data with Seaborn
In this task, you'll explore the Palmer Penguins dataset using Seaborn to create informative visualizations.This dataset contains measurements for three different penguin species observed in the Palmer Archipelago, Antarctica.

**Dataset Description:**
- `species`: Penguin species (Adelie, Chinstrap, Gentoo)
- `island`: Island where the penguin was observed (Torgersen, Biscoe, Dream)
- `bill_length_mm`: Bill length in millimeters
- `bill_depth_mm`: Bill depth in millimeters
- `flipper_length_mm`: Flipper length in millimeters
- `body_mass_g`: Body mass in grams
- `sex`: Penguin sex (Male, Female)

In [9]:
# Setup and Imports
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [10]:
# Load the dataset (using the Palmer Penguins dataset)
penguins = sns.load_dataset('penguins')

*Note*: Seaborn provides example datasets to assist exploring its visualization features. Learn more about these available datasets here: https://www.geeksforgeeks.org/seaborn-datasets-for-data-science/

In [11]:
# Display first few rows
penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [12]:
# Preliminary data processing

# Check for missing values
penguins.isnull().sum()

# Drop rows with missing values
penguins.dropna(inplace=True)

### Distribution Plots (Histograms and KDEs)
a) Create a histogram of 'bill_length_mm'. Add a Kernel Density Estimate (KDE) curve.

In [13]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.histplot(penguins['bill_length_mm'],kde=True)
plt.title('Distribution of Bill Length')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Frequency')
plt.show()
```
</details>

b) Create a similar plot for 'body_mass_g'.

In [14]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.histplot(penguins['body_mass_g'],kde=True)
plt.title('Distribution of Body Mass')
plt.xlabel('Body Mass (g)')
plt.ylabel('Frequency')
plt.show()
```
</details>

### Box Plots and Violin Plots (Categorical vs. Numerical)
a) Create box plots of 'flipper_length_mm' for each 'species'.

In [15]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.boxplot(x='species',y='flipper_length_mm',data=penguins)
plt.title('Flipper Length by Species')
plt.xlabel('Species')
plt.ylabel('Flipper Length (mm)')
plt.show()
```
</details>

b) Create violin plots of 'bill_depth_mm' for each 'island'.

In [16]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.violinplot(x='island',y='bill_depth_mm',data=penguins)
plt.title('Bill Depth by Island')
plt.xlabel('Island')
plt.ylabel('Bill Depth (mm)')
plt.show()
```
</details>

### Scatter Plots and Pair Plots (Numerical vs. Numerical)
a) Create a scatter plot of 'bill_length_mm' vs. 'bill_depth_mm'. Color the points by 'species'.

In [17]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.scatterplot(x='bill_length_mm',y='bill_depth_mm',hue='species',data=penguins)
plt.title('Bill Length vs. Bill Depth by Species')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Bill Depth (mm)')
plt.show()
```
</details>

b) Create a pair plot to visualize relationships between all numerical variables, colored by 'species'.

In [18]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
sns.pairplot(penguins,hue='species')
plt.show()
```
</details>

### Count Plots (Categorical Data)
a) Create a count plot to show the number of penguins from each 'island'.

In [19]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.countplot(x='island',data=penguins)
plt.title('Number of Penguins per Island')
plt.xlabel('Island')
plt.ylabel('Count')
plt.show()
```
</details>

b) Create a count plot to show the distribution of 'sex' within each 'species'. (Hint: use the 'hue' parameter)

In [20]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.countplot(x='species',hue='sex',data=penguins)
plt.title('Sex Distribution within Each Species')
plt.xlabel('Species')
plt.ylabel('Count')
plt.show()
```
</details>

### Heatmap (Correlation Matrix)
a) Calculate the correlation matrix for the numerical variables.

In [21]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
correlation_matrix = penguins.corr(numeric_only=True)
```
</details>

b) Create a heatmap to visualize the correlation matrix.

In [22]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8,6))
sns.heatmap(correlation_matrix,annot=True,cmap='coolwarm') # annot for values and cmap for color
plt.title('Correlation Heatmap')
plt.show()
```
</details>

## Optional: Other types of plots
This optional task focuses on two powerful Seaborn plot types: joint plots and swarm plots. These plots are particularly useful for exploring relationships between variables, especially when dealing with distributions and categorical data. We'll continue using the Palmer Penguins dataset.

### Joint Plots

a) Create a joint plot of 'bill_length_mm' vs. 'bill_depth_mm' using the default 'scatter' kind.

In [23]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
sns.jointplot(x='bill_length_mm',y='bill_depth_mm',data=penguins)
plt.suptitle("Joint Plot of Bill Length vs. Bill Depth (Scatter)", y=1.02) # Title adjustment
plt.show()
```
</details>

b) Create a joint plot of 'flipper_length_mm' vs. 'body_mass_g' using the 'kde' kind.

In [24]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
sns.jointplot(x='flipper_length_mm',y='body_mass_g',data=penguins,kind='kde')
plt.suptitle("Joint Plot of Flipper Length vs. Body Mass (KDE)", y=1.02)
plt.show()
```
</details>

c) Create a joint plot of 'bill_length_mm' vs. 'body_mass_g' using the 'hex' kind.

In [25]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
sns.jointplot(x='bill_length_mm',y='body_mass_g',data=penguins,kind='hex')
plt.suptitle("Joint Plot of Bill Length vs. Body Mass (Hex)", y=1.02)
plt.show()
```
</details>

d) Create a joint plot with regression line and histograms using kind='reg'.

In [26]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
sns.jointplot(x="bill_length_mm", y="body_mass_g", data=penguins, kind="reg")
plt.suptitle("Joint Plot with Regression and Histograms", y=1.02)
plt.show()
```
</details>

### JointGrid for more customization
a) Create a JointGrid for 'bill_length_mm' vs. 'bill_depth_mm'. Plot a scatterplot on the joint axes, a histogram on the marginal x axis, and a kde on the marginal y axis.

In [27]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
g = sns.JointGrid(x="bill_length_mm", y="bill_depth_mm", data=penguins)
g.plot_joint(sns.scatterplot)
g.plot_marginals(sns.histplot, kde=True)
plt.suptitle("Custom JointGrid", y=1.02)
plt.show()
```
</details>

### Swarm Plots
a) Create a swarm plot of 'bill_length_mm' for each 'species'.

In [28]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8, 6))  # Adjust figure size for better readability
sns.swarmplot(x='species', y='bill_length_mm', data=penguins)
plt.title('Bill Length Distribution by Species (Swarm Plot)')
plt.xlabel('Species')
plt.ylabel('Bill Length (mm)')
plt.show()
```
</details>

b) Create a swarm plot of 'body_mass_g' for each 'island'.

In [29]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8, 6))
sns.swarmplot(x='island', y='body_mass_g', data=penguins)
plt.title('Body Mass Distribution by Island (Swarm Plot)')
plt.xlabel('Island')
plt.ylabel('Body Mass (g)')
plt.show()
```
</details>

c) Combine swarmplot with boxplot

In [30]:
# Your code here

<details><summary><span style="color:red">&#x1F6C8; Help</span> (Use this only as a last resort!!)</summary>
    
```Python
plt.figure(figsize=(8, 6))
sns.boxplot(x='species', y='bill_length_mm', data=penguins, whis=np.inf)
sns.swarmplot(x='species', y='bill_length_mm', data=penguins, color=".2")
plt.title("Bill Length by Species (Boxplot + Swarmplot)")
plt.show()
```
</details>

<p style="text-align:center;">That's it! Congratulations! <br> 
    Now, call an LA to check your solution. Then, upload your code on MyCourses.</p>