## Lab 1: Bash Scripting for Automation

### Requirements:

1. You are expected to complete **2 problems during the 107-minute lab session**.

2. **Before you leave**, please ask me to review your code — I need to record your progress.

3. To complete and submit this lab, you must choose **one** of the following options:
   - **Option 1:** Copy this notebook to your Google Drive and edit it there.  
     - File → Save a copy in Drive  
     - Make your edits, then go to File → Download → `.ipynb`.
   - **Option 2:** Download this notebook to your local computer, open it in Colab from your file, and edit it locally with your favorite IDE.  
     - File → Download → `.ipynb`.
4. Before submission, rename the file. Replace `YOURNAME` with your full name (e.g., `CSI3680-Lab1-LoriXu.ipynb`)
5. Submit the **renamed and completed** `.ipynb` file to Moodle.
  - Make sure you submit your final edited `.ipynb` file — not a blank copy.


### ⚠️ A Note on AI Tools

While I cannot control whether you use AI tools, **you are fully responsible for the code you submit**. If you choose to use any assistance, make sure you thoroughly understand what your script is doing. You may be asked to explain your work during or after the lab.


### Before We Start...
Let me quickly introduce a trick that will make your life (and mine) much easier when working in Google Colab/Jupyter Notebook.

Since you've already learned about **Here Documents**, we can use them to **create and run shell scripts directly inside Colab**, without manually uploading files to the `/content` directory.

Here's how you can create a script called `demo.sh`, write code into it, and run it — all in one cell:

```bash
%%bash
cat << 'BASH' > demo.sh
#!/bin/bash
echo "Hello from demo.sh"
# Add your script logic here
BASH

chmod +x demo.sh
./demo.sh
```

The quotes around `'BASH'` prevent variable expansion — always include them!

You can change `demo.sh` to any filename you want (e.g., `user_summary.sh`, `backup.sh`).

### Problem 1: User Summary Generator

You are asked to generate a custom summary of system users using a given `passwd.fake` file.

**Requirements**

write a script `user_summary.sh` that:
1. Reads from a fake password file called `passwd.fake`, which is formatted like `/etc/passwd` (colon-separated).
2. For each line, print:
  - The username (first field)
  - The UID (third field)
  - Label as:
    - "Regular user" if UID >= 1000
    - "System user" if UID < 1000
3. At the end, print how many regular users there are.

Use `awk` for field processing.

**Expected Output Example for Problem 1**

When your script is working correctly, the output should look like this:
```
root (UID=0) is a system user
user1 (UID=1000) is a regular user
user2 (UID=1001) is a regular user
nobody (UID=65534) is a regular user
Total regular users:  3
```

Make sure your script prints **exactly this format**, as I may check for exact wording and spacing.


First thing first, set up for fake data:

In [None]:
# Simulate passwd.fake for Problem 1
%%bash
cat << 'EOF' > passwd.fake
root:x:0:0:root:/root:/bin/bash
user1:x:1000:1000:User One:/home/user1:/bin/bash
user2:x:1001:1001:User Two:/home/user2:/bin/zsh
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
EOF

Now with the faked `/etc/passwd.fake`, use `awk` to process each line of `passwd.fake`.

In [None]:
%%bash
cat << 'BASH' > user_summary.sh
#!/bin/bash
# This script summarizes users from a passwd-like file using awk.
awk -F: '
{
  # TODO: Print the username and UID
  # Example: alice (UID=1000)


  # TODO: Check if UID >= 1000
  # If so, print "is a regular user" and count it
  # Else, print "is a system user"
}
END {
  # TODO: Print total number of regular users
}
' passwd.fake
BASH

chmod +x user_summary.sh
./user_summary.sh

### Problem 2: Disk Report & Backup Automation

Write a script called `disk_backup.sh` that does the following:
1. Use the `df -h .` command to check disk usage of the current directory
  - Save the output to a file called `disk_report.txt`
  - Add a line to the file that shows the current date and time
2. Create a compressed backup of the folder `mydir` into a file called `backup.tar.gz`.
3. Print:
  - The number of files that were backed up
  - The total size of the backup file


Again, we need to fake some data

In [None]:
# Create dummy files for backup testing
!mkdir -p mydir && cd mydir && touch file{1..5}.txt && echo "hello world" > file1.txt && cd ..

In [None]:
%%bash

cat << 'BASH' > disk_backup.sh
#!/bin/bash

# TODO: Save disk usage of the current directory to disk_report.txt

# TODO: Add the current timestamp to the report

# TODO: Create a compressed backup of the mydir directory
# Hint: Use 'tar'

# TODO: Count the number of files in the archive
# Hint: Use 'tar -tf'
file_count=$()
echo "Archived $file_count files."

# TODO: Show the size of the backup archive
# Hint: Use 'du' to get the size
# Hint: Use 'cut -f1' to get the size value only
backup_size=$()
echo "Backup size: $backup_size"
BASH

chmod +x disk_backup.sh
./disk_backup.sh


**Expected Output Example for Problem 2**

When your script is working correctly, the output should look like this:
```
Archived 6 files.
Backup size: 4.0K
```

Make sure your script prints **exactly this format**, as I may check for exact wording and spacing.


### Problem 3: Argument Math
Write a script called `math_ops.sh` that:
1. Accepts a list of **integer numbers** as command-line arguments.
2. If no arguments are provided, print a usage message and exit.
3. If arguments are given:
  - Print each argument on a new line
  - Compute the sum of all arguments
  - Compute the average, round to 2 decimal places using `bc`

In [None]:
%%bash

cat << 'BASH' > math_ops.sh
#!/bin/bash

# TODO: Check if at least one argument is provided
# If not, print: "Usage: ./math_ops.sh num1 num2 ..."


sum=0

# TODO: Loop through each argument and print it
# Also compute the sum

# TODO: Calculate the average using bc
# Hint: Use echo "scale=2; sum / count" | bc

# TODO: Print sum and average


BASH

chmod +x math_ops.sh
# Test run with some numbers
./math_ops.sh 3 7 10


**Expected Output Example for Problem 3**

When your script is working correctly, the output should look like this:

```
Argument: 3
Argument: 7
Argument: 10
Sum: 20
Average: 6.66
```

Make sure your script prints **exactly this format**, as I may check for exact wording and spacing.


### Problem 4: Log Analyzer (with `getopts`)

Write a script called `log_analyzer.sh` that processes a log file with command-line options. It should support the following flags:
| Option | Description                                |
| ------ | ------------------------------------------ |
| `-f`   | Path to the log file (required)            |
| `-e`   | Count how many lines contain “error”       |
| `-i`   | Print a list of unique IP addresses found after the word "from" |


#### Mini Intro: `sort` and `uniq -c`
1. Suppose we have a list of IPs (some repeated), try this in a Bash cell:
    ```
    echo -e "192.168.0.1\n10.0.0.3\n192.168.0.2\n10.0.0.3\n192.168.0.2" | sort
    ```
    This sorts the lines alphabetically.


2. Now, let's count how many times each IP appears
    ```
    echo -e "192.168.0.1\n10.0.0.3\n192.168.0.2\n10.0.0.3\n192.168.0.2" | sort | uniq -c
    ```
    `uniq -c` gives a count of repeated lines, but it only works if the lines are already sorted.

3. Let's go one step further — sort by count (highest first)
    ```
    echo -e "192.168.0.1\n10.0.0.3\n192.168.0.2\n10.0.0.3\n192.168.0.2" | sort | uniq -c | sort -nr
    ```
    `sort -nr` means:
      - `-n`: sort by number
      - `-r`: reverse (so highest comes first)

Let's fake some data to start

In [None]:
# Simulate sample.log
%%bash
cat << 'EOF' > sample.log
[2025-09-25 10:00] INFO Connection from 192.168.0.1
[2025-09-25 10:01] ERROR Failed to open file
[2025-09-25 10:02] WARNING Retry attempt
[2025-09-25 10:03] ERROR Timeout from 10.0.0.3
[2025-09-25 10:04] INFO Connection from 192.168.0.2
[2025-09-25 10:05] ERROR Disk full from 10.0.0.3
[2025-09-25 10:06] INFO Connection from 192.168.0.2
EOF

In [None]:
%%bash
cat << 'BASH' > log_analyzer.sh
#!/bin/bash
# Initialize flags
show_errors=false
show_ips=false

# TODO: Use getopts to handle -f (file), -e (error count), -i (top IPs)
# Hint: f requires an argument; e and i are flags


# TODO: Check if logfile variable is set
# Hint: Use -z to check if the variable is actually set.

# TODO: Check if file exists

# TODO: If -e is set, count lines with 'error' (case-insensitive)

# TODO: If -i is set, extract IPs that appear after the word 'from'
# Hint: Use grep
# Hint: in awk, $NF means the last field on the line
# Hint: Use sort + uniq -c to count duplicates

BASH

chmod +x log_analyzer.sh
# Test run
./log_analyzer.sh -f sample.log -e -i


**Expected Output Example for Problem 4**

When your script is working correctly, the output should look like this:

```
Error lines: 3
IP addresses:
      2 192.168.0.2
      2 10.0.0.3
      1 192.168.0.1
```
Make sure your script prints **exactly this format**, as I may check for exact wording and spacing.


---

### AI Usage Declaration (REQUIRED)

Please **check the box** that best describes your use of AI tools (e.g., ChatGPT, GitHub Copilot, Google Gemini, etc.) during this lab.  
There is no penalty for using AI, but you must be honest and take full responsibility for understanding your code.

☐ I did **not** use any AI assistance for this lab.

☐ I used AI for **minor guidance only** (e.g., fixing syntax errors, clarifying commands).

☐ I used AI for **larger portions** of the code (e.g., writing full functions or scripts), but I made sure I fully understand it.

☐ I relied heavily on AI and **do not fully understand** all parts of the code I submitted.

*(You may be asked to explain your answers or your code in person.)*
