In [1]:
# Colab cell
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
# Adjust these two for YOUR repo
REPO_OWNER = "ywanglab"
REPO_NAME  = "STAT4160"   # e.g., unified-stocks-team1

BASE_DIR   = "/content/drive/MyDrive/dspt25"
CLONE_DIR  = f"{BASE_DIR}/{REPO_NAME}"
REPO_URL   = f"https://github.com/{REPO_OWNER}/{REPO_NAME}.git"

import os, pathlib
pathlib.Path(BASE_DIR).mkdir(parents=True, exist_ok=True)


In [3]:
import os, subprocess, shutil, pathlib

if not pathlib.Path(CLONE_DIR).exists():
    !git clone {REPO_URL} {CLONE_DIR}
else:
    # If the folder exists, just ensure it's a git repo and pull latest
    os.chdir(CLONE_DIR)
    # !git status
    # !git pull --rebase # !git pull --ff-only
os.chdir(CLONE_DIR)
print("Working dir:", os.getcwd())

Working dir: /content/drive/MyDrive/dspt25/STAT4160



```python
# Verify we’re on Linux
!uname -a
```

* Prints kernel/system info: kernel name, version, build date, architecture (`x86_64`), etc.
* In Colab you’ll see something like:

  ```
  Linux <hostname> 5.15.0-...-generic #... SMP x86_64 GNU/Linux
  ```
`uname` = **Unix name** — it reports basic information about the operating system and kernel.

### `uname -a`

The `-a` flag means *“all”*: print every piece of info `uname` can provide in one line.

Typical output (example from Ubuntu on Colab):

```
Linux 1234567890abcdef 5.15.0-107-generic #117-Ubuntu SMP Wed Jun 12 18:19:38 UTC 2024 x86_64 GNU/Linux
```

### Breaking it down

* **Linux** → Kernel name (system type).
* **1234567890abcdef** → Hostname (unique identifier of the machine).
* **5.15.0-107-generic** → Kernel release (version string).
* **#117-Ubuntu SMP Wed Jun 12 18:19:38 UTC 2024** → Kernel build number, vendor, SMP (Symmetric Multi-Processing), and build date/time.
* **x86\_64** → Machine hardware architecture (64-bit Intel/AMD).
* **GNU/Linux** → Operating system type (kernel + userland).





```python
!lsb_release -a || cat /etc/os-release
```

* Tries to print **Linux distribution info**.
* `lsb_release -a` shows fields like Distributor, Release, Codename.
* If `lsb_release` isn’t installed, `||` means “otherwise run” → so it falls back to `cat /etc/os-release`, which has fields like `NAME="Ubuntu"` and `VERSION="20.04 LTS"`.

`lsb_release` is a small tool that reports the **Linux distribution information** according to the [Linux Standard Base (LSB)](https://refspecs.linuxfoundation.org/lsb.shtml).

---

### `lsb_release -a`

The `-a` flag means **“all information”**, so you see every available field.

Example output on Ubuntu (say, Colab):

```
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal
```

---

### What each field means

* **Distributor ID** → The vendor/distribution (Ubuntu, Debian, Fedora, etc.).
* **Description** → A human-readable string (includes name, version, LTS tag).
* **Release** → Version number (20.04, 22.04, etc.).
* **Codename** → Development codename (Ubuntu 20.04 = *focal*, 22.04 = *jammy*, etc.).

### Notes

* Not all distros install `lsb_release` by default.

  * In Colab, if you run `!lsb_release -a` and get `command not found`, you can install it:

    ```bash
    !apt-get install -y lsb-release
    ```
* That’s why many tutorials give a fallback:

  ```bash
  lsb_release -a || cat /etc/os-release
  ```







In [None]:
# Verify we’re on Linux
!uname -a
!lsb_release -a || cat /etc/os-release
# Where are we?
!pwd
!ls -la

Linux 435528ece323 6.1.123+ #1 SMP PREEMPT_DYNAMIC Sun Mar 30 16:01:29 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy
/content/drive/MyDrive/dspt25/STAT4160
total 674
-rw------- 1 root root      0 Sep  8 20:47 1
drwx------ 2 root root   4096 Sep 11 15:40 backups
drwx------ 2 root root   4096 Aug 14 21:30 data
drwx------ 2 root root   4096 Sep  4 14:47 docs
drwx------ 2 root root   4096 Aug 28 15:48 docs1
drwx------ 2 root root   4096 Aug 28 14:15 _freeze
drwx------ 2 root root   4096 Aug 15 21:30 .git
-rw------- 1 root root    205 Sep  4 14:47 .gitattributes
drwx------ 2 root root   4096 Aug 22 20:49 .github
-rw------- 1 root root    452 Sep 11 21:30 .gitignore
-rw------- 1 root root     76 Sep 14 22:20 greet.sh
drwx------ 2 root root   4096 Aug 25 13:18 homework
-rw------- 1 root root 604095 Sep  4 14:48 index.pdf
-rw------- 1 root root    188 Sep 11 15:21 index.qmd
drwx------ 

`whoami` is one of the simplest but most useful Unix/Linux commands.

---

### 🔹 What it does

* Prints the **effective username** of the current process.
* In plain English: *“Which user am I logged in as?”*

---

### 🔹 Example

```bash
$ whoami
david
```

This tells you that your shell is running as the user `david`.

In **Colab** (and many Docker/VM environments), you’ll usually see:

```bash
$ whoami
root
```

because the notebook process runs with root privileges.



In [None]:
!echo $SHELL
!whoami
!pwd
!ls -lah /

/bin/bash
root
/content/drive/MyDrive/dspt25/STAT4160
total 456K
drwxr-xr-x   1 root root 4.0K Sep 17 12:59 .
drwxr-xr-x   1 root root 4.0K Sep 17 12:59 ..
lrwxrwxrwx   1 root root    7 Jun 27  2024 bin -> usr/bin
drwxr-xr-x   2 root root 4.0K Apr 18  2022 boot
drwxr-xr-x   1 root root 4.0K Sep 17 13:03 content
-rw-r--r--   1 root root 4.3K Jul 10  2024 cuda-keyring_1.1-1_all.deb
drwxr-xr-x   1 root root 4.0K Sep 15 18:04 datalab
drwxr-xr-x   5 root root  360 Sep 17 12:59 dev
-rwxr-xr-x   1 root root    0 Sep 17 12:59 .dockerenv
drwxr-xr-x   1 root root 4.0K Sep 17 12:59 etc
drwxr-xr-x   2 root root 4.0K Apr 18  2022 home
drwxr-xr-x   3 root root 4.0K Sep 17 12:59 kaggle
lrwxrwxrwx   1 root root    7 Jun 27  2024 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Jun 27  2024 lib32 -> usr/lib32
lrwxrwxrwx   1 root root    9 Jun 27  2024 lib64 -> usr/lib64
lrwxrwxrwx   1 root root   10 Jun 27  2024 libx32 -> usr/libx32
drwxr-xr-x   2 root root 4.0K Jun 27  2024 media
drwxr-xr-x   2 root root 

In [None]:
ls -li

total 657
62 -rw------- 1 root root      0 Sep  8 20:47 1
65 drwx------ 2 root root   4096 Sep 11 15:40 [0m[01;34mbackups[0m/
39 drwx------ 2 root root   4096 Aug 14 21:30 [01;34mdata[0m/
49 drwx------ 2 root root   4096 Sep  4 14:47 [01;34mdocs[0m/
48 drwx------ 2 root root   4096 Aug 28 15:48 [01;34mdocs1[0m/
47 drwx------ 2 root root   4096 Aug 28 14:15 [01;34m_freeze[0m/
70 -rw------- 1 root root     76 Sep 14 22:20 greet.sh
45 drwx------ 2 root root   4096 Aug 25 13:18 [01;34mhomework[0m/
52 -rw------- 1 root root 604095 Sep  4 14:48 index.pdf
64 -rw------- 1 root root    188 Sep 11 15:21 index.qmd
67 -rw------- 1 root root   1976 Sep 11 21:20 Makefile
66 -rw------- 1 root root   2015 Sep 11 21:20 Makefile.bak
53 -rw------- 1 root root     23 Sep  4 14:48 myfirstfrommac.txt
54 -rw------- 1 root root     22 Sep  4 14:48 myfirstlocalfile.txt
41 drwx------ 2 root root   4096 Aug 17 20:07 [01;34mnotebooks[0m/
46 drwx------ 2 root root   4096 Aug 27 20:43 [01;34mquarto-

**symbolic link** (symlink):

```
lrwxrwxrwx  1 root root    7 Jun 27  2024 bin -> usr/bin
^---------- ^ ^    ^      ^ ^^^^^^^^^ ^^^^^^^^^^^^^^^^^
type/perms  | |    |      |     mtime   filename + target
            | |    |      size
            | |    group
            | owner
            link count
```


* **`l`** at the very beginning → it’s a **symbolic link** (not a regular file or directory).
* **`rwxrwxrwx`** → permissions. For symlinks these are usually shown as all-read/write/execute, but **they’re ignored**; access is determined by the target (`usr/bin`).
* **`1`** → link count (for symlinks, often always `1`).
* **`root root`** → owner = root, group = root.
* **`7`** → file size, i.e. the length of the link’s target path string `"usr/bin"` (7 characters).
* **`Jun 27 2024`** → last modification time (when the symlink itself was created/changed, not necessarily the target).
* **`bin -> usr/bin`** → file is called `bin`, and it points to `usr/bin`.


* Whenever you access `/bin`, the system transparently redirects you to `/usr/bin`.
* This is a common setup on modern Linux distros: historically `/bin` was its own directory, but now it’s just a symlink to unify everything under `/usr/bin`.


---

##  Hard link vs Symlink

### 1) **Hard link**

* Both names point to the **same inode** (same actual file data on disk).
* If you delete one, the data still exists until *all* hard links are gone.
* Must be on the same filesystem (partition).
* No “arrow” — they’re indistinguishable at the inode level.

```
+-----------------------+
|  Inode #12345         |  <-- contains metadata (size, perms) + data blocks
|  Data: "Hello world"  |
+-----------------------+
   ^           ^
   |           |
fileA.txt   fileB.txt   (hard links — both are first-class names)
```

`ls -li` would show the same inode number for both.

---

### 2) **Symbolic link (symlink)**

* A tiny file that just **stores a path string** to another file or directory.
* Works across filesystems and can point to directories.
* If the target goes away, the symlink is “broken.”

```
fileA.txt
   |
   v
+-----------------------+
|  Inode #12345         |
|  Data: "Hello world"  |
+-----------------------+

fileB.txt (symlink)
   |
   v
(string "fileA.txt")
```

When you open `fileB.txt`, the system follows the string → finds `fileA.txt`.

---

###  Key Differences

| Feature                  | Hard Link                          | Symlink                       |
| ------------------------ | ---------------------------------- | ----------------------------- |
| Same inode?              |  Yes                              |  No (points to another path) |
| Works if target deleted? |  (until last link removed)        |  (becomes broken)            |
| Cross-filesystem?        |  No                               |  Yes                         |
| Can link dirs?           |  Usually no (except `.` and `..`) |  Yes                         |

---

### Real example in Linux

```bash
# Hard link: both point to same inode
ln fileA.txt fileB.txt

# Symlink: points to path
ln -s fileA.txt fileC.txt

ls -li
# inode   perms   name
12345 -rw-r--r-- fileA.txt
12345 -rw-r--r-- fileB.txt   # hard link, same inode
12346 lrwxrwxrwx fileC.txt -> fileA.txt   # symlink
```

---

So:

* `/bin -> usr/bin` is a **symlink** (like `fileC.txt` above).
* A **hard link** would mean `/bin` and `/usr/bin` are the *same inode*, which isn’t possible across different directories and filesystems.


A **hard link** requires two directory entries to point to the **same inode**.
But inodes are **unique only within a single filesystem (partition)**.
So:

* If `fileA.txt` is on `/dev/sda1` and you try to hard-link it into `/mnt/usb` (on `/dev/sdb1`), it fails:

  ```bash
  ln fileA.txt /mnt/usb/fileB.txt
  # ln: failed to create hard link ... Invalid cross-device link
  ```

---

A **symlink** doesn’t care about inodes.
It’s just a **tiny file containing a path string** (e.g., `"../mnt/usb/fileA.txt"`).
When you access the symlink, the kernel follows the stored path and resolves it — even if the target is on another filesystem.

So you *can* do this:

```bash
ln -s /mnt/usb/fileA.txt ~/fileB.txt
```

Now `~/fileB.txt` is a symlink, and whenever you open it, the kernel goes to `/mnt/usb/fileA.txt`, which lives on a completely different filesystem.

---

###  Example

```
/ (root, ext4 on /dev/sda1)
│
├── home/user/fileB.txt  (symlink → /mnt/usb/fileA.txt)
│
└── mnt/usb (vfat on /dev/sdb1)
     └── fileA.txt
```

* Hard link here ❌ impossible (different filesystems).
* Symlink  works fine: `/home/user/fileB.txt` points across into `/mnt/usb`.

---



## Create Hard link

* Command:

  ```bash
  ln SOURCE TARGET
  ```
* Example:

  ```bash
  echo "hello" > fileA.txt
  ln fileA.txt fileB.txt
  ```

Now:

```bash
ls -li fileA.txt fileB.txt
```

might show:

```
12345 -rw-r--r-- 2 user user 6 Sep 15 10:00 fileA.txt
12345 -rw-r--r-- 2 user user 6 Sep 15 10:00 fileB.txt
```

Same inode number (`12345`), link count `2`. Both names point to the *same* file data.



## Create Symbolic link (symlink)

* Command:

  ```bash
  ln -s TARGET LINK_NAME
  ```
* Example:

  ```bash
  ln -s fileA.txt fileC.txt
  ```

Now:

```bash
ls -l fileC.txt
```

shows:

```
lrwxrwxrwx 1 user user 9 Sep 15 10:00 fileC.txt -> fileA.txt
```

`fileC.txt` is a shortcut pointing to `fileA.txt`.
If you delete `fileA.txt`, `fileC.txt` becomes a **broken symlink**.



* `&&` → run the next command only if the previous one succeeded.


```bash
!touch a.txt b.txt && ls -l
```

* `touch a.txt b.txt` → creates two empty files (`a.txt` and `b.txt`).


```bash
!echo "hello" > a.txt
```

* Writes the string `"hello"` into `a.txt`, overwriting any previous contents.



```bash
!cp a.txt c.txt && mv b.txt docs.txt && ls -l
```

* `cp a.txt c.txt` → copy `a.txt` into a new file `c.txt`.
* `mv b.txt docs.txt` → rename `b.txt` to `docs.txt`.
* `ls -l` → now you’ll see: `a.txt`, `c.txt`, `docs.txt`.


```bash
!head -n 1 a.txt && tail -n +1 c.txt
```

* `head -n 1 a.txt` → prints the first line of `a.txt` (should be `hello`).
* `tail -n +1 c.txt` → prints from line 1 onward of `c.txt` (also `hello`).
* So you see the same text twice.


```bash
!rm -i c.txt  # you can skip -i in automated runs
```

* `rm -i` → remove `c.txt`, with `-i` = “interactive” (asks for confirmation: `rm: remove regular file 'c.txt'?`).
* In automated runs, you can drop `-i` to avoid prompts.




### 1. `cd` inside a `!` or `%%bash` cell

* In Colab, **each `!` or `%%bash` cell runs in its own subshell**.
* If you do:

  ```python
  !cd /content/play
  !echo hello > a.txt
  ```

  the `cd` only affects the first subshell. The second `!` starts fresh in the **default working dir** (usually `/root` in Colab).
  → So `a.txt` ends up in `/root`.

**Fix:** chain them in the same subshell:

```python
!cd /content/play && echo hello > a.txt
```

or use `%%bash` with multiple lines:

```bash
%%bash
cd /content/play
echo hello > a.txt
```

---

### 2. Using `%cd` (IPython magic)

* If you use `%cd`, it changes the working directory for the **whole notebook kernel**.
* Then `!echo hello > a.txt` will create the file in that new directory.

Example:

```python
%cd /content/play
!echo hello > a.txt
!ls -l
```




In [None]:
!mkdir -p /content/play && cd /content/play && pwd
!touch a.txt b.txt && ls -l .
!echo "hello" > a.txt
!cp a.txt c.txt && mv b.txt docs.txt && ls -l .
!head -n 1 a.txt && tail -n +1 c.txt
!rm -i c.txt  # you can skip -i in automated runs
!ls -l .

/content/play
total 657
-rw------- 1 root root      0 Sep  8 20:47 1
-rw------- 1 root root      6 Sep 17 13:33 a.txt
drwx------ 2 root root   4096 Sep 11 15:40 backups
-rw------- 1 root root      0 Sep 17 13:33 b.txt
drwx------ 2 root root   4096 Aug 14 21:30 data
drwx------ 2 root root   4096 Sep  4 14:47 docs
drwx------ 2 root root   4096 Aug 28 15:48 docs1
-rw------- 1 root root      0 Sep 17 13:28 docs.txt
drwx------ 2 root root   4096 Aug 28 14:15 _freeze
-rw------- 1 root root     76 Sep 14 22:20 greet.sh
drwx------ 2 root root   4096 Aug 25 13:18 homework
-rw------- 1 root root 604095 Sep  4 14:48 index.pdf
-rw------- 1 root root    188 Sep 11 15:21 index.qmd
-rw------- 1 root root   1976 Sep 11 21:20 Makefile
-rw------- 1 root root   2015 Sep 11 21:20 Makefile.bak
-rw------- 1 root root     23 Sep  4 14:48 myfirstfrommac.txt
-rw------- 1 root root     22 Sep  4 14:48 myfirstlocalfile.txt
drwx------ 2 root root   4096 Aug 17 20:07 notebooks
drwx------ 2 root root   4096 Aug 27 

In [None]:
%%bash
mkdir -p /content/play && cd /content/play && pwd
touch a.txt b.txt && ls -l .
echo "hello" > a.txt
cp a.txt c.txt && mv b.txt docs.txt && ls -l .
head -n 1 a.txt && tail -n +1 c.txt
rm -i c.txt  # you can skip -i in automated runs
ls -l .

/content/play
total 0
-rw-r--r-- 1 root root 0 Sep 17 13:38 a.txt
-rw-r--r-- 1 root root 0 Sep 17 13:38 b.txt
total 8
-rw-r--r-- 1 root root 6 Sep 17 13:38 a.txt
-rw-r--r-- 1 root root 6 Sep 17 13:38 c.txt
-rw-r--r-- 1 root root 0 Sep 17 13:38 docs.txt
hello
hello


rm: remove regular file 'c.txt'? 

* any string; ? single char; [abc] any of set; {foo,bar} brace expansion.



```bash
{item1,item2,item3}
```

Expands into separate words: `item1 item2 item3`.


```bash
echo {foo,bar}
```

Output:

```
foo bar
```


###  More uses

**1. Prefixes and suffixes**

```bash
echo file{1,2,3}.txt
```

Output:

```
file1.txt file2.txt file3.txt
```

**2. Nested braces**

```bash
echo {A,B}{1,2}
```

Output:

```
A1 A2 B1 B2
```

**3. Ranges**

```bash
echo {1..5}
# → 1 2 3 4 5

echo {a..d}
# → a b c d
```

**4. With steps**

```bash
echo {0..10..2}
# → 0 2 4 6 8 10
```

---

###  Notes

* Expansion happens **before** the command runs (like globbing with `*`).
* It’s not a loop; it just creates multiple arguments in one shot.
* Commonly used for quick file creation:

  ```bash
  touch project/{notes,report,data}.txt
  ```

  creates three files: `notes.txt`, `report.txt`, `data.txt`.



In [None]:
!ls -1 /bin/ba*  | head

/bin/base32
/bin/base64
/bin/basename
/bin/basenc
/bin/bash
/bin/bashbug


`find . -name "*.txt"`

* **Purpose:** locate files by name (or other attributes).
* `.` → start search from current directory.
* `-name "*.txt"` → match filenames ending in `.txt`.
* Output: a list of paths to matching files.

`grep -R "pattern" .`

* **Purpose:** search file **contents** for a text pattern.
* `-R` (or `--recursive`) → search through all files under the given directory (`.` here).
* `"pattern"` → the text/regex to match inside files.
* Output: matching lines, with filename + line content.

**Example**

```bash
grep -R "error" .
```

might print:

```
./logs/app.log:2025-09-14 Error: File not found
./src/main.py:# raise ValueError("error")
```

---

###  Key difference

* `find` → *search by file properties* (name, size, date, type).
* `grep` → *search inside files* (line contents).
* They work great together:

  ```bash
  find . -name "*.txt" -exec grep -n "pattern" {} \;
  ```

  → find only `.txt` files and then search each for `"pattern"`.

  That’s a very handy **`find … -exec grep …`** pattern



1. **`find .`**
   → start searching in the current directory (`.`).

2. **`-name "*.txt"`**
   → only consider files whose names end with `.txt`.

3. **`-exec … {} \;`**

   * Run a command on each matching file.
   * `{}` is a placeholder that gets replaced with the filename.
   * `\;` ends the `-exec` clause (the backslash escapes the `;` so the shell doesn’t eat it).

4. **`grep -n "pattern"`**

   * `grep` searches file contents for `"pattern"`.
   * `-n` shows the **line number** of each match.

So: for each `.txt` file under `.`, grep its contents for `"pattern"`, printing the filename + line number + matching line.

---

###  Example

Suppose your directory has:

* `notes/todo.txt`
* `docs/report.txt`

and inside `todo.txt` you have:

```
1. fix pattern match
2. review notes
```

Run:

```bash
find . -name "*.txt" -exec grep -n "pattern" {} \;
```

Output:

```
./notes/todo.txt:1:1. fix pattern match
```

---

###  Variations

* Run `grep` on **all matches at once** (faster if many files):

  ```bash
  find . -name "*.txt" -exec grep -n "pattern" {} +
  ```

  The `+` batches multiple filenames into one grep run, instead of one-by-one.

* Case-insensitive:

  ```bash
  find . -name "*.txt" -exec grep -ni "pattern" {} +
  ```

* Show only filenames that contain a match:

  ```bash
  find . -name "*.txt" -exec grep -l "pattern" {} +
  ```







In [None]:
%%bash
cd /content/play
find . -name "*.txt"
grep -R "pattern" .

./c.txt
./docs.txt
./a.txt


CalledProcessError: Command 'b'cd /content/play\nfind . -name "*.txt"\ngrep -R "pattern" .\n'' returned non-zero exit status 1.

```bash
cat > hello.sh <<'EOF'
#!/usr/bin/env bash
echo "Hello from a script"
EOF
```



1. **`cat > hello.sh`**

   * Redirects `cat`’s standard output into the file `hello.sh`.
   * So whatever `cat` prints will get saved into `hello.sh`.

2. **`<<'EOF'`** → *here-document (heredoc)*

   * This tells the shell: *“take everything from the following lines until you see a line with just `EOF`.”*
   * All that text is fed into `cat` as input.
   * The quotes around `'EOF'` mean: *“don’t expand variables, commands, etc. inside the block.”* (so `$VAR` would be written literally, not expanded).

3. **Inside the heredoc**

   ```
   #!/usr/bin/env bash
   echo "Hello from a script"
   ```

   This becomes the contents of `hello.sh`.

4. **Closing delimiter `EOF`**

   * When the shell sees this line, it ends the heredoc and finishes writing the file.


File `hello.sh` is created with contents:

```bash
#!/usr/bin/env bash
echo "Hello from a script"
```



In [None]:
%%bash
cd /content/play
cat > hello.sh <<'EOF'
#!/usr/bin/env bash
echo "Hello from a script"
EOF
ls -l hello.sh
chmod u+x hello.sh
./hello.sh
ls

-rw-r--r-- 1 root root 47 Sep 17 13:55 hello.sh
Hello from a script
a.txt
c.txt
docs.txt
hello.sh



###  File descriptors

* `0` = **stdin** (input)
* `1` = **stdout** (normal output)
* `2` = **stderr** (error output)

---

###  `2> stderr`

* `2>` means “redirect **stderr** (file descriptor 2) to …”
* Example:

  ```bash
  ls /no/such/dir 2> errors.txt
  ```

  * The error message goes into `errors.txt`.
  * Nothing shows on the screen.
  * Standard output (fd 1) is unaffected.

---

### `&> stdout+stderr`

* `&>` is shorthand in bash for redirecting **both stdout (1) and stderr (2)** to the same destination.
* Example:

  ```bash
  ls /etc /no/such/dir &> all_output.txt
  ```

  * Both the directory listing (`stdout`) and the error message (`stderr`) are written to `all_output.txt`.

Equivalent long form:

```bash
ls /etc /no/such/dir > all_output.txt 2>&1
```

* `> all_output.txt` → send stdout to the file.
* `2>&1` → send stderr (`2`) to wherever stdout (`1`) is currently going.

---

###  Quick demo

```bash
# Send stdout only
echo "hello" > out.txt

# Send stderr only
ls /bad/path 2> err.txt

# Send both together
(ls /etc; ls /bad/path) &> both.txt
```

---

**Summary:**

* `2>` = redirect just **stderr**.
* `&>` = redirect **both stdout + stderr** to the same file.
* Classic form `> file 2>&1` does the same thing as `&> file`.




### 1. Count the number of lines in the file

```bash
wc -l stocks.csv
```

* `wc` = **word count** (also counts lines, bytes, etc.).
* `-l` = lines only.
* Prints the number of lines in `stocks.csv`.
* If your CSV has 6 rows (including header), this prints `6 stocks.csv`.

---

### 2. List unique tickers with counts

```bash
cut -d, -f1 stocks.csv | tail -n +2 | sort | uniq -c | sort -nr
```

* `cut -d, -f1` → split each line on comma (`,`), take field 1 (the `ticker`).
* `tail -n +2` → skip the header row (start at line 2).
* `sort` → sort tickers alphabetically (needed before `uniq`).
* `uniq -c` → `-c` count consecutive duplicates → “how many times each ticker appears.”
* `sort -nr` → numeric reverse sort → biggest counts first.

Output example:

```
3 AAPL
1 NVDA
1 MSFT
```

Means `AAPL` appears 3 times, `NVDA` once, `MSFT` once.

---

### 3. Filter rows with AAPL and take the last 2

```bash
grep '^AAPL,' stocks.csv | tail -n 2
```

* `grep '^AAPL,' stocks.csv` → find rows where the first field is `AAPL` (`^` = beginning of line, so it won’t match e.g. `MYAAPL`).
* `tail -n 2` → show the last 2 of those lines.

 If your CSV looked like:

```
ticker,price
AAPL,180
MSFT,420
AAPL,181
NVDA,120
AAPL,182
```

This prints:

```
AAPL,181
AAPL,182
```




In [None]:
%%bash
cd /content/play
# Make a small CSV
cat > stocks.csv <<'EOF'
ticker,price
AAPL,180
MSFT,420
AAPL,181
NVDA,120
AAPL,182
EOF
# Count rows, list unique tickers with counts
wc -l stocks.csv
cut -d, -f1 stocks.csv | tail -n +2 |sort|  uniq -c | sort -nr
# Filter rows with AAPL and take last 2
grep '^AAPL,' stocks.csv | tail -n 3

6 stocks.csv
      3 AAPL
      1 NVDA
      1 MSFT
AAPL,180
AAPL,181
AAPL,182



```bash
!apt-get update -qq && apt-get install -y -qq vim
```


1. **`apt-get update`**

   * Downloads the latest package lists (metadata about what versions are available).
   * Must be run before installing, so you don’t try to fetch outdated versions.

2. **`-qq` (quiet quiet)**

   * Suppresses most output.
   * `-q` = quiet, `-qq` = even quieter.
   * In Colab it keeps the logs from being too spammy.

4. **`apt-get install -y -qq vim`**

   * Installs the `vim` package.
   * `-y` = assume “yes” to prompts (otherwise you’d have to type `y` to confirm).
   * `-qq` again for less output.




In [None]:
!apt-get update -qq && apt-get install -y -qq vim

###  Searching in `vim`

1. Press `/` → this puts you in **search mode** (look at the bottom of the screen).
2. Type your search string (e.g. `pattern`) and press **Enter**.

   * The cursor jumps to the **next match** after the current position.
   * All matches are usually highlighted.

-
* **`n`** → move to the **next match** in the same direction as your last search.
* **`N`** → move to the **previous match** (opposite direction).

So:

* If you searched `/pattern` (forward), `n` goes forward again, `N` goes backward.
* If you searched `?pattern` (backward), then `n` goes backward, `N` goes forward.

---

###  Example

Inside `vim`, type:

```
/TODO
```

Press **Enter** → cursor jumps to first `TODO`.

* Press `n` to go to the next `TODO`.
* Press `N` to go back to the previous one.

---
**Summary:**

* `/pattern` → search forward for “pattern.”
* `?pattern` → search backward.
* `n` → repeat in same direction.
* `N` → repeat in opposite direction.




### shebang
```bash
#!/usr/bin/env bash
```

* Tells the OS: *“run this script with bash”*.
* `env` looks up `bash` in your `$PATH`, so it works even if `bash` isn’t in `/bin/bash`.


```bash
set -euo pipefail
```

This is a best practice to make Bash scripts safer:

* `-e` → exit immediately if a command fails (non-zero status).
* `-u` → error if you try to use an unset variable.
* `-o pipefail` → if any command in a pipeline fails, the whole pipeline fails.

Together, they prevent “silent failures.”


```bash
NAME=${1:-World}
```

* `$1` = the first argument passed to the script.
* `:-World` = if `$1` is empty or unset, use `"World"` as the default.
* Example:

  * Run with `./hello.sh Alice` → `NAME="Alice"`.
  * Run with `./hello.sh` → `NAME="World"`.



In [None]:
%%bash
cd /content/play
cat > greet.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
NAME=${1:-World}
echo "Hello, $NAME!"
EOF
chmod u+x greet.sh
./greet.sh
./greet.sh Linux

Hello, World!
Hello, Linux!


bash: line 1: cd: /content/play: No such file or directory


In a shell (Linux, macOS, Colab, etc.), the command

```bash
which python
```

asks: *“Which executable will be run if I type `python`?”*


* The shell looks at your **PATH** (a list of directories it searches for commands).
* `which` shows the **first matching executable** it finds.

Example on Ubuntu/Colab:

```bash
$ which python
/usr/bin/python
```

or (common in modern distros where `python` points to Python 3):

```
/usr/bin/python3
```

---

### Variants

* `which python3` → find the `python3` executable.
* `which pip` → find the `pip` executable being used.
* `type -a python` → show all matches in PATH, not just the first.




In [None]:
!echo $PATH
!which python

```bash
!tree -L 2 /content/play
```


* `tree` → a command-line program that prints directories as a tree diagram.
* `-L 2` → “limit depth to 2 levels.”

  * Level 1 = `/content/play` itself
  * Level 2 = its immediate subdirectories and their files
* `/content/play` → the directory to inspect.

---

### Example output

If `/content/play` contained this:

```
/content/play
├── a.txt
├── docs.txt
├── hello.sh
└── data
    ├── stocks.csv
    └── notes.txt
```

* The **first level** shows files directly in `/content/play`.
* The **second level** shows the contents of the `data/` subdirectory.
* Deeper nested subdirectories would be hidden because of `-L 2`.




In [None]:
!apt-get update -qq
!apt-get install -y -qq tree  # example
!tree -L 2 /content/play


```python
%pip install -q pandas
```

* `%pip` is an **IPython magic**, so it installs into the **same Python environment** that your notebook kernel is using.
* `-q` = quiet, reduces the amount of log spam.
* This ensures `pandas` is available for immediate import in your notebook.

*(Using `%pip` is safer than `!pip` because it ties directly to the notebook’s interpreter — no mismatch issues.)*


```bash
python -c "import pandas as pd; print(pd.__version__)"
```

* `python -c "…"` runs a short Python command given as a string.
* `import pandas as pd` loads pandas.
* `print(pd.__version__)` prints the installed version number.






In [None]:
%pip install -q pandas
!python -c "import pandas as pd; print(pd.__version__)"

* `cmd --help` (portable)
* `man cmd` (manual; may be minimal in Colab)

###  `apropos`

* Searches the **manual (man) page descriptions** for keywords.
* Syntax:

  ```bash
  apropos keyword
  ```
* Example:

  ```bash
  apropos copy
  ```

  might print:

  ```
  cp (1)               - copy files and directories
  memcpy (3)           - copy memory area
  strncpy (3)          - copy a string
  ```
* Useful when you don’t know the exact command name, but know what you want to do.
* Behind the scenes, it searches the man page database (`whatis` database).

---

###  `tldr`

* Community-driven **simplified manuals** with **concise examples**.
* Syntax:

  ```bash
  tldr command
  ```
* Example:

  ```bash
  tldr tar
  ```

  might show:

  ```
  tar

  Create an archive from files:
    tar -cf archive.tar file1 file2

  Extract an archive:
    tar -xf archive.tar

  List contents:
    tar -tf archive.tar
  ```
* Much shorter and more practical than the full `man tar`.
* Needs to be installed first:

  ```bash
  sudo apt-get install -y tldr   # on Ubuntu/Debian
  tldr --update                  # fetch latest pages
  ```

---

###  Key difference

* **`apropos`** = search engine for man pages (find what command you might need).
* **`tldr`** = cheat sheet with examples (learn quickly how to *use* the command).



In [None]:
!ls -1 /usr/bin/g* 2>/dev/null | wc -l

```bash
awk -F, 'NR>1 {max[$1] = ($2>max[$1] ? $2 : max[$1])} END {for (k in max) print k "," max[k]}' stocks.csv | sort
```


1. **`-F,`**

   * Sets the field separator to a comma, so `$1` = first column (ticker), `$2` = second column (price).

2. **`NR>1 { … }`**

   * `NR` = record number (line number).
   * `NR>1` skips the header line.
   * For each subsequent line:

     ```awk
     max[$1] = ($2 > max[$1] ? $2 : max[$1])
     ```

     * `max[$1]` is an associative array keyed by ticker.
     * Compare the current price `$2` with the stored maximum.
     * If greater, update it; otherwise keep the old max.

3. **`END { for (k in max) print k "," max[k] }`**

   * After reading all lines, loop over the array and print `ticker,max_price`.

4. **`| sort`**

   * Pipe the output into `sort` so the results are ordered alphabetically by ticker.


###  Example

Input `stocks.csv`:

```
ticker,price
AAPL,180
MSFT,420
AAPL,181
NVDA,120
AAPL,182
```

Run the command:

Output:

```
AAPL,182
MSFT,420
NVDA,120
```

**Summary:**
This `awk` command finds the **maximum price per ticker** in a CSV, skipping the header, and prints the results as `ticker,max_price`, sorted alphabetically.




In [None]:
%%bash
cd /content/play
awk -F, 'NR>1 {max[$1] = ($2>max[$1] ? $2 : max[$1])} END {for (k in max) print k "," max[k]}' stocks.csv | sort

AAPL,182
MSFT,420
NVDA,120


bash: line 1: cd: /content/play: No such file or directory
