#Task 1: Load the data
Import the necessary libraries for creating our figures. Libraries are imported using "import library_name". You can use as to create a shorter alias for the library, making your code more concise and easier to read.

First, load the dataset using the `read_excel()` function from the pandas library.
```python
pd.read_excel(dataset_name.xlsx)
```

# Task 2: Inspect data
## 1. `describe()` - Statistical Summary

**Purpose:** Generates descriptive statistics for numerical columns in your dataset.

**What it shows:**
- **count**: Number of non-null values
- **mean**: Average value
- **std**: Standard deviation (measure of spread)
- **min**: Minimum value
- **25%**: First quartile (25th percentile)
- **50%**: Median (50th percentile)
- **75%**: Third quartile (75th percentile)
- **max**: Maximum value
```python

# Basic usage - only numerical columns
print(penguins.describe())
```

## 2. `columns` - Column Names (attribute, not a function)

**Purpose:** Returns the column labels of the DataFrame as an Index object.
```python
# Get column names
print(penguins.columns)

## 3. `info()` - Concise Summary

**Purpose:** Provides a concise summary of the DataFrame including memory usage, data types, and non-null counts.

**What it shows:**
- **RangeIndex**: Number of rows
- **Data columns**: Number of columns
- **Column names with data types**: Each column's name and dtype
- **Non-null count**: How many non-missing values per column
- **Dtypes summary**: Count of each data type
- **Memory usage**: Approximate memory consumption
```python
penguins.info()

## 4. `head()` - First Rows

**Purpose:** Returns the first n rows of the DataFrame (default n=5).
```python
# First 5 rows (default)
print(penguins.head())

# First 10 rows
print(penguins.head(10))


#Task 3: Line plots
To draw a figure for a specific mouse, we need to get the data related to this mouse. his is carried out using data_mouse = df[df['Mouse'] == mouse_id]


## Creating a Line Plot: Step-by-Step Guide

## Objective
Create a line plot to visualize how YL32 abundance changes over time for Mouse 1.

---

## Step 1: Set Up the Figure
First, create a figure and axes object with appropriate size:
```python
fig, ax = plt.subplots(figsize=(8, 5))
```

**What this does:**
- Creates a blank canvas (figure) that is 8 inches wide and 5 inches tall
- `ax` is the axes object where we'll draw our plot

---



## Step 2: Plot the Line with Markers
Now plot your data with both a line and markers:
```python
ax.plot(day, yl32, marker="o", linewidth=2, markersize=8)
```
To access day and yl32 values, use data['column_name']

**Parameters:**
- First argument: x-axis data (Day)
- Second argument: y-axis data (YL32)
- `marker="o"`: Add circular markers at each point
- `linewidth=2`: Make the line 2 points thick
- `markersize=8`: Make markers size 8

---

## Step 3: Add X-axis Label
Label the x-axis to indicate what it represents:
```python
ax.set_xlabel("X label", fontsize=12)
```

---

## Step 4: Add Y-axis Label
Label the y-axis with the measurement and units:
```python
ax.set_ylabel("Y label", fontsize=12)
```

---

## Step 5: Add a Title
Give your plot a descriptive title:
```python
ax.set_title("Mouse 1: YL32 over time", fontsize=14, fontweight="bold")
```

**Tips:**
- Make the title larger than axis labels (fontsize=14 vs 12)
- Use bold to make it stand out

---

## Step 6: Add a Grid (Optional)
Add a background grid to make it easier to read values:
```python
ax.grid(alpha=0.3, linestyle="--")
```

**Parameters:**
- `alpha=0.3`: Make grid semi-transparent (0=invisible, 1=solid)
- `linestyle="--"`: Use dashed lines instead of solid

---

## Step 7: Finalize and Display
Adjust layout and show the plot:
```python
plt.tight_layout()
plt.show()
```

**What this does:**
- `tight_layout()`: Automatically adjusts spacing to prevent labels from overlapping
- `show()`: Displays the plot in the output

---

---

## Try These Modifications

### Change the color:
```python
ax.plot(day, yl32,
        marker="o", linewidth=2, markersize=8, color='steelblue')
```

### Change the marker style:
```python
marker="s"   # Square
marker="^"   # Triangle up
marker="v"   # Triangle down
marker="D"   # Diamond
marker="*"   # Star
```

### Save the figure:
```python
plt.savefig('figure_name.png', dpi=300, bbox_inches='tight')
```

## Plot the Log YL32 Value for the Same Mouse

**Use the previous steps:**
```python
fig, ax = plt.subplots(figsize=(8, 5))
print(data.head)

# Add labels and title
ax.set_xlabel("Day", fontsize=12)
ax.set_ylabel("LogYL32 abundance", fontsize=12)
ax.set_title("Mouse 1: YL32 over time", fontsize=14, fontweight="bold")

# Add grid for easier reading
ax.grid(alpha=0.3, linestyle="--")

# Save figure
plt.tight_layout()
plt.savefig('fig.png')
```

---

**Important:** Plot data using the same command but **change the y value to log₁₀(YL32)**
```python
ax.plot(day, log10 YL32, marker="o", linewidth=2, markersize=8)
```


## Plot the log YL32 value for all mice

**Use the previous steps:**
```python
fig, ax = plt.subplots(figsize=(8, 5))
print(data.head)

# Add labels and title
ax.set_xlabel("Day", fontsize=12)
ax.set_ylabel("LogYL32 abundance", fontsize=12)
ax.set_title("Mouse 1: YL32 over time", fontsize=14, fontweight="bold")

# Add grid for easier reading
ax.grid(alpha=0.3, linestyle="--")

# Save figure
plt.tight_layout()
plt.savefig()
```

---

**Important:** Use the same function to plot data but  use hue argument and pass Mouse to plot the lineplot for each mouse. To pass the value correctly, use the column name as shown in task 1.
```python
ax.plot(day, log10 YL32, marker="o",, hue=mouse, linewidth=2, markersize=8)



# Task 4: Creating a Box Plot for Bacterial Abundance Distribution

## Objective
Create a box plot to visualize the **distribution of bacterial abundance** (log₁₀-transformed) across different timepoints and treatment groups.

---

## Step 1: Define the Column Name
First, create a variable for the log-transformed column name based on the strain you want to plot:
```python
log_col = f'log10 {strain}'
```

**What this does:**
- Uses an f-string to create the column name dynamically
- Example: if `strain = 'YL32'`, then `log_col = 'log10 YL32'`

---

## Step 2: Remove Missing Data
Filter out rows with missing values in the log column:
```python
df_filtered = df.dropna(subset=[log_col])
```

**Why?**
- Ensures your plot only includes complete data
- Prevents errors from NaN values

---

## Step 3: Create the Figure
Set up your figure and axes with appropriate size:
```python
fig, ax = plt.subplots(figsize=figsize)
```

**Note:** `figsize` should be defined earlier (e.g., `figsize=(10, 6)`)

---

## Step 4: Create the Box Plot
Use seaborn's `boxplot()` to visualize the distribution:
```python
sns.boxplot(
    data=df,
    x='timepoint after AB application',
    y=log_col,
    hue='Group',
    ax=ax,
    palette='husl',
    linewidth=1.5,
)
```

**Parameters explained:**
- **`data=df`**: Your dataframe
- **`x='timepoint after AB application'`**: X-axis shows different timepoints
- **`y=log_col`**: Y-axis shows log₁₀ bacterial abundance
- **`hue='Group'`**: Different colors for different treatment groups
- **`palette='husl'`**: Color scheme (creates visually distinct colors)
- **`linewidth=1.5`**: Thickness of box plot lines

---


---

## Step 5 Follow the same steps as in the previous plots to add the x and y labels, title, save the figure etc...

**Parameters:**
- **`dpi=300`**: High resolution (300 dots per inch)
- **`bbox_inches='tight'`**: Remove extra whitespace around the plot


---


## Box Plot Components

**Understanding the box plot:**
- **Box**: Shows the interquartile range (IQR) - middle 50% of data
- **Line inside box**: Median value
- **Whiskers**: Extend to 1.5 × IQR or to min/max values
- **Diamonds/circles**: Outliers beyond whiskers


## Experiment with:
different strains





##b) For violinplot, replace boxplot with violinplot in step 4:

To create the violin plot
Use seaborn's `violinplot()` to visualize the data distribution:
```python
sns.violinplot(
    data=df,
    x='timepoint after AB application',
    y=log_col,
    hue='Group',
    ax=ax,
    palette='husl',
    linewidth=1.5,
)
```

**Parameters explained:**
- **`data=df`**: Your dataframe
- **`x='timepoint after AB application'`**: Variable on x-axis (categories)
- **`y=log_col`**: Variable on y-axis (bacterial abundance)
- **`hue='Group'`**: Color violins by experimental group
- **`ax=ax`**: Draw on the axes we created
- **`palette='husl'`**: Color scheme for different groups
- **`linewidth=1.5`**: Thickness of violin outlines

---


## New Step 5 : Overlay Individual Data Points
Show individual data points on top of the box plot. Reuse the previous code for box plot and add this step as number 5. The rest of the code remains as is
```python
sns.stripplot(
    data=df,
    x='timepoint after AB application',
    y=log_col,
    hue='Group',
    ax=ax,
    dodge=True,
    alpha=0.6,
    size=4,
    color='black',
    legend=False
)
```

**Parameters explained:**
- **`dodge=True`**: Separates points by group (aligns with boxes)
- **`alpha=0.6`**: Makes points semi-transparent
- **`size=4`**: Size of each point
- **`color='black'`**: All points are black
- **`legend=False`**: Don't create a second legend


# Task 5: Errorbar

## Step 1: Calculate Summary Statistics
Compute the mean, standard error, and count for each group at each timepoint:
```python
from scipy import stats

summary = df.groupby(['Group', 'timepoint after AB application'])[log_col].agg([
    ('mean', 'mean'),
    ('sem', stats.sem),  # Standard Error of Mean
    ('count', 'count')
]).reset_index()
```

**What this does:**
- **`groupby()`**: Groups data by 'Group' and 'timepoint after AB application'
- **`agg()`**: Calculates multiple statistics:
  - **`mean`**: Average value
  - **`sem`**: Standard Error of the Mean (measures precision of the mean)
  - **`count`**: Number of observations
- **`reset_index()`**: Converts the result back to a regular dataframe

**Note:** Make sure to import `from scipy import stats` at the beginning of your notebook.

---

## Step 2: Create the Figure
Set up your figure with appropriate size:
```python
fig, ax = plt.subplots(figsize=figsize)
```

---

## Step 3: Prepare Colors for Each Group
Get the unique groups and assign colors:
```python
groups = summary['Group'].unique()
colors = sns.color_palette("husl", n_colors=len(groups))
```

**What this does:**
- **`groups`**: Gets a list of all unique group names
- **`colors`**: Creates a color palette with one color per group

---

## Step 4: Plot Each Group with Error Bars
Loop through each group and plot its data:
```python
for i, group in enumerate(groups):
    # Filter data for current group
    group_data = summary[summary['Group'] == group]
    
    # Plot with error bars
    ax.errorbar(
        group_data['timepoint after AB application'],
        group_data['mean'],
        yerr=group_data['sem'],
        marker='o',
        markersize=8,
        linewidth=2,
        capsize=5,
        capthick=2,
        label=group,
        color=colors[i],
        alpha=0.8
    )
```

**Parameters explained:**
- **`x`**: Timepoints (x-axis values)
- **`y`**: Mean values (y-axis values)
- **`yerr`**: Error bar heights (SEM in this case)
- **`marker='o'`**: Circular markers at each point
- **`markersize=8`**: Size of the markers
- **`linewidth=2`**: Thickness of the connecting line
- **`capsize=5`**: Width of error bar caps
- **`capthick=2`**: Thickness of error bar caps
- **`label=group`**: Name for legend
- **`color=colors[i]`**: Color for this group
- **`alpha=0.8`**: Slight transparency (80% opaque)

---

### Alternative: Use Standard Deviation instead
```python
summary = df.groupby(['Group', 'timepoint after AB application'])[log_col].agg([
    ('mean', 'mean'),
    ('std', 'std'),  # Standard Deviation
    ('count', 'count')
]).reset_index()

# Then plot with yerr=group_data['std']
# Don't forget to update y-label to "Mean ± SD"
```

---

## Customization Tips

### Change marker styles for each group:
```python
markers = ['o', 's', '^', 'D', 'v']  # circle, square, triangle, diamond, triangle down

for i, group in enumerate(groups):
    # ... in errorbar():
    marker=markers[i % len(markers)]
```

### Add confidence intervals (95% CI):
```python
from scipy import stats

summary['ci'] = summary.apply(
    lambda row: 1.96 * row['sem'],  # 95% CI ≈ 1.96 × SEM
    axis=1
)

# Then use yerr=group_data['ci']
# Update y-label to "Mean ± 95% CI"
```

### Remove connecting lines (show only points):
```python
ax.errorbar(..., linestyle='none')  # No line, only markers and error bars
```

