# Linux and Shell for Data Engineers (Interview Edition)

## 📀 Section 0: Set Up WSL2 + Ubuntu

**Before You Begin:**

This notebook guides you through core Linux skills using **WSL2 (Windows Subsystem for Linux)**.

### Install WSL2 and Ubuntu

1. Open PowerShell and run:
```powershell
wsl --install
```
2. Reboot and choose **Ubuntu** when prompted
3. Launch "Ubuntu" from the Start menu

### Verify Your Environment
Run these in your WSL terminal:
```bash
whoami
uname -a
lsb_release -a
echo $SHELL
```

**🔍 Interview Questions:**
- What version of Linux are you running?
- How would you explain what WSL2 is to a hiring manager?

---

## 📁 Section 1: Create a Project Workspace

```bash
mkdir ~/de_lab
cd ~/de_lab
mkdir raw logs processed scripts archive
```

Create sample log files:
```bash
touch logs/access.log logs/error.log logs/system.log
```

Add sample content:
```bash
echo "INFO pipeline started" > logs/access.log
echo "ERROR file missing" > logs/error.log
echo "INFO checkpoint created" > logs/system.log
```

**🤔 Interview Questions:**
- How would you organize data files for a pipeline project?
- What directory structure would you use for separating raw vs cleaned data?

---

## 🗳️ Section 2: Navigation and File Basics

```bash
pwd
ls -la
cd logs
ls -lh
```

Create and move files:
```bash
cd ~/de_lab
mkdir temp
cd temp
touch one.txt two.txt three.txt
mkdir archive
mv *.txt archive/
```

**🤔 Interview Questions:**
- What command shows your current directory?
- How would you move all `.csv` files into a subfolder?

---

## 🔍 Section 3: Viewing and Editing Files

```bash
echo "Hello, Data Engineer!" > greeting.txt
cat greeting.txt
nano greeting.txt
```

```bash
head -n 5 logs/error.log
tail -n 5 logs/error.log
```

**🤔 Interview Questions:**
- How do you examine just the top of a large file?
- When would you use `nano` vs `cat`?

---

## 🔎 Section 4: Search and Filter

```bash
grep -i "error" logs/error.log
wc -l logs/error.log
sort names.txt | uniq -c
```

Create sample data:
```bash
echo -e "alice\nbob\nalice\ncarol" > names.txt
```

**🤔 Interview Questions:**
- How would you count how many times the word "error" appears in a log file?
- How would you count the number of unique names in a file?

---

## 🔐 Section 5: Permissions and Ownership

```bash
touch secure.txt
chmod 600 secure.txt
ls -l secure.txt
```

**🤔 Interview Questions:**
- What does `chmod 600` do?
- How do you make a file readable by all users but only editable by the owner?

---

## ⚖️ Section 6: Processes and System Monitoring

```bash
sleep 60 &
jobs
ps aux | grep sleep
kill %1
df -h
du -sh logs/
```

**🤔 Interview Questions:**
- How do you check if a job is running in the background?
- How would you find and stop a process that's using too much memory?

---

## 📊 Section 7: Environment Variables and Aliases

```bash
export DATA_PATH=~/de_lab/raw
echo $DATA_PATH
alias ll='ls -alh'
```

Make it persistent:
```bash
echo "export DATA_PATH=~/de_lab/raw" >> ~/.bashrc
echo "alias ll='ls -alh'" >> ~/.bashrc
source ~/.bashrc
```

**🤔 Interview Questions:**
- Why do engineers use environment variables in production scripts?
- How would you persist an alias across sessions?

---

## 📂 Section 8: Shell Script Writing

```bash
cd ~/de_lab/scripts
nano archive.sh
```

Inside `archive.sh`:
```bash
#!/bin/bash
mkdir -p ../archive
mv ../*.csv ../archive/
echo "Moved files on $(date)" >> ../move.log
```

```bash
chmod +x archive.sh
./archive.sh
```

**🤔 Interview Questions:**
- What’s the difference between `chmod +x` and running with `bash script.sh`?
- Why should logs be written to a timestamped file?

---

## ✅ Section 9: Interview Challenge Tasks (No Command Hints)

1. Display the last 5 lines of the largest `.log` file
2. Move all `.log` files into a `backup` folder
3. Count how many lines in `error.log` contain the word "missing"
4. Start a background process and kill it
5. Make a file writable by all users
6. Find the word "checkpoint" in any file
7. List all `.txt` files inside `archive/`
8. Write a shell script to move `.json` files to `processed/`
9. Set an environment variable `PIPELINE_USER` and use it in a command
10. Add a permanent alias to your shell for `ll='ls -alh'`

---

## 🎓 Final Reflection

> - ✅ List 5 shell commands you're now fluent with:
> - 💬 How would you explain your Linux fluency in a job interview?
> - 🧠 What’s one new concept or tool that surprised you?
> - 🛠️ What’s one task you’d automate tomorrow with a shell script?

