# 🧪 Visualizing Biological Data with ggplot2 in R
## 💡 Introduction
In this notebook, we will explore the classic iris dataset, which is commonly used to demonstrate basic data analysis and visualization techniques in R. While this dataset represents measurements of flower parts, the same principles apply when visualizing gene expression data, sequencing results, or other biological measurements.

## ⚙️ Load R Environment in Jupyter

In [None]:
%load_ext rpy2.ipython

## 📦 Load the iris Dataset

In [None]:
%%R
data(iris)

Let’s inspect the first few rows to understand the structure.

In [None]:
%%R
head(iris)

## 🔍 Explore the Dataset Structure

In [None]:
%%R
# View first 3 rows
iris[1:3,]

In [None]:
%%R
# View a subset of columns: only Sepal.Length, Sepal.Width, Petal.Length
iris[1:3,1:3]

Check the column names to understand the variables available:

In [None]:
%%R
colnames(iris)

In [None]:
%%R
colnames(iris)[c(1,3,4)]

In [None]:
%%R
head(iris[,c(1,3,4)])

## 🎨 Basic Data Visualization with ggplot2
First, load the necessary library:

In [None]:
%%R

#install.packages('ggplot2')
library(ggplot2)

# install.packages('ggforce')  # Uncomment if not installed
library(ggforce)

### 1️⃣ Basic Scatter Plot
Let’s plot Sepal Length vs Sepal Width, colored and shaped by Species:

In [None]:
%%R
ggplot(iris) + 
  geom_point(aes(x = Sepal.Length, y = Sepal.Width,
                 col = Species, shape = Species)) +
  labs(title = "Sepal Dimensions by Species",
       x = "Sepal Length (cm)",
       y = "Sepal Width (cm)",
       color = "Species",
       shape = "Species") +
  theme_minimal()

### 2️⃣ Encoding More Dimensions (Size, Color Gradient)
We can encode 5 variables in one plot:
- `x`: Sepal.Length
- `y`: Sepal.Width
- `color`: Petal.Width (continuous)
- `size`: Petal.Length (continuous)
- `shape`: Species (categorical)

In [None]:
%%R
ggplot(iris) + 
  geom_point(aes(x = Sepal.Length, y = Sepal.Width,
                 col = Petal.Width, shape = Species, size = Petal.Length)) +
  scale_color_viridis_c() +
  labs(title = "Iris Data with Encoded Petal Traits",
       x = "Sepal Length",
       y = "Sepal Width",
       color = "Petal Width",
       size = "Petal Length") +
  theme_classic()

### 3️⃣ Flip the Axes (Petal Dimensions on X/Y)

In [None]:
%%R
ggplot(iris) + 
  geom_point(aes(x = Petal.Length, y = Petal.Width,
                 col = Sepal.Width, shape = Species, size = Sepal.Length)) +
  scale_color_viridis_c() +
  labs(title = "Petal Size Distribution by Species",
       x = "Petal Length",
       y = "Petal Width",
       color = "Sepal Width",
       size = "Sepal Length") +
  theme_light()

### 4️⃣ Visualizing Distributions with Segments

In [None]:
%%R
ggplot(iris) +
  geom_segment(aes(x = 1:nrow(iris), y = 0,
                   xend = 1:nrow(iris), yend = Petal.Length,
                   col = Species)) +
  labs(title = "Petal Length Distribution Across Samples",
       x = "Sample Index", y = "Petal Length (cm)",
       color = "Species") +
  theme_minimal()


### 5️⃣ 📏 Cluster Visualization Using ggforce
Ellipses can help visualize grouping or clustering based on species.

In [None]:
%%R
ggplot(iris) + 
  geom_point(aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_mark_ellipse(aes(x = Sepal.Length, y = Sepal.Width, fill = Species))

  

## 🧠 Summary

| What We Learned            | Description                                                              |
|:----------------------------|:-------------------------------------------------------------------------|
| Data loading and subsetting | Used `iris` dataset and explored columns                                 |
| ggplot2 basics              | Created scatter plots with `geom_point()`                                |
| Advanced aesthetics         | Mapped multiple variables using `color`, `size`, `shape`                 |
| Clustering visualization    | Used `ggforce::geom_mark_ellipse()` to highlight groups                  |
