# Linux/Shell Commands Primer for Drug Discovery
## Prerequisites for the Alzheimer's Drug Discovery Project

This notebook introduces essential Linux/shell commands used in computational drug discovery workflows. Run through these examples to get comfortable before diving into the main project.

**Learning Objectives:**
- Navigate file systems
- Manage files and directories
- Install Python packages
- Run external tools (like PaDEL)
- Understand command-line data processing

---
# 1. File System Navigation

Understanding where you are and what files exist is fundamental.

## 1.1 Print Working Directory (`pwd`)
Shows your current location in the file system.

In [None]:
# Where am I?
!pwd

## 1.2 List Directory Contents (`ls`)
See what files and folders exist.

In [None]:
# Basic listing
!ls

In [None]:
# Detailed listing with hidden files
# -l = long format (permissions, size, date)
# -a = all files (including hidden ones starting with .)
# -h = human-readable sizes (KB, MB, GB)
!ls -lah

In [None]:
# List only CSV files
!ls *.csv 2>/dev/null || echo "No CSV files found" 

## 1.3 Change Directory (`cd`)
In Jupyter, we use Python's `os.chdir()` instead of `cd` command.

In [None]:
import os

# Show current directory
print("Before:", os.getcwd())

# Change to parent directory
os.chdir('..')
print("After going up:", os.getcwd())

# Go back (you'll need to adjust this path)
# os.chdir('/content')

---
# 2. File Operations

Creating, viewing, copying, and managing files.

## 2.1 Create Files (`touch`, `echo`)

In [None]:
# Create an empty file
!touch test_file.txt
!ls -l test_file.txt

In [None]:
# Create a file with content using echo
!echo "Hello, Drug Discovery!" > greeting.txt
!cat greeting.txt

In [None]:
# Append to a file (>> instead of >)
!echo "This is line 2" >> greeting.txt
!echo "This is line 3" >> greeting.txt
!cat greeting.txt

## 2.2 View File Contents (`cat`, `head`, `tail`)

In [None]:
# cat = concatenate and print entire file
!cat greeting.txt

In [None]:
# Create a longer file for demo
!seq 1 20 > numbers.txt
!cat numbers.txt

In [None]:
# head = first N lines (default 10)
!head -5 numbers.txt

In [None]:
# tail = last N lines
!tail -5 numbers.txt

## 2.3 Count Lines, Words, Characters (`wc`)

In [None]:
# wc = word count
# Output: lines, words, characters
!wc numbers.txt

In [None]:
# Just line count (-l)
!wc -l numbers.txt

## 2.4 Copy, Move, Remove Files

In [None]:
# Copy a file
!cp greeting.txt greeting_backup.txt
!ls greeting*

In [None]:
# Move/rename a file
!mv greeting_backup.txt greeting_copy.txt
!ls greeting*

In [None]:
# Remove files (be careful!)
!rm greeting_copy.txt test_file.txt numbers.txt greeting.txt
!ls greeting* 2>/dev/null || echo "Files cleaned up!" 

---
# 3. Package Management (`pip`)

Installing Python libraries for data science and chemistry.

## 3.1 Basic pip Commands

In [None]:
# Check pip version
!pip --version

In [None]:
# List installed packages
!pip list | head -20

In [None]:
# Check if a specific package is installed
!pip show pandas

## 3.2 Installing Packages

In [None]:
# Install a single package
# -q = quiet (less output)
!pip install rdkit -q
print("rdkit installed!")

In [None]:
# Install multiple packages at once
!pip install seaborn matplotlib numpy -q
print("Visualization packages installed!")

## 3.3 Using `uv` for Faster Installation (Recommended)

`uv` is a modern, fast Python package installer that handles dependency conflicts better.

In [None]:
# Install uv first
!pip install uv -q
print("uv installed!")

In [None]:
# Use uv to install packages (much faster!)
# --system = install to system Python (needed in Colab)
!uv pip install --system pandas numpy scikit-learn --quiet
print("Packages installed with uv!")

---
# 4. Working with Data Files

Commands useful for exploring CSV and text data files.

## 4.1 Create a Sample CSV File

In [None]:
# Create a sample molecule data file
!echo "molecule_id,smiles,ic50,class" > sample_molecules.csv
!echo "MOL001,CCO,500,active" >> sample_molecules.csv
!echo "MOL002,CCCO,1500,intermediate" >> sample_molecules.csv
!echo "MOL003,CCCCO,15000,inactive" >> sample_molecules.csv
!echo "MOL004,CC(C)O,800,active" >> sample_molecules.csv
!echo "MOL005,CCOCC,12000,inactive" >> sample_molecules.csv

!cat sample_molecules.csv

## 4.2 Explore CSV with Shell Commands

In [None]:
# Count rows (excluding header)
!tail -n +2 sample_molecules.csv | wc -l

In [None]:
# View specific columns using cut
# -d',' = delimiter is comma
# -f1,3 = fields 1 and 3
!cut -d',' -f1,3 sample_molecules.csv

In [None]:
# Search for patterns using grep
# Find all "active" molecules
!grep "active" sample_molecules.csv

In [None]:
# Count occurrences of each class
!cut -d',' -f4 sample_molecules.csv | sort | uniq -c

## 4.3 Create SMILES File (for PaDEL)

In drug discovery, we often need to create `.smi` files for fingerprint calculation.

In [None]:
# Extract SMILES and ID columns, tab-separated, no header
!tail -n +2 sample_molecules.csv | cut -d',' -f2,1 | tr ',' '\t' > sample.smi
!cat sample.smi

---
# 5. Downloading Files (`wget`, `curl`)

Getting data from the internet.

In [None]:
# Download a file using wget
# -q = quiet
# -O = output filename
!wget -q -O test_download.txt https://raw.githubusercontent.com/datasets/covid-19/main/README.md
!head -5 test_download.txt

In [None]:
# Clean up
!rm -f test_download.txt sample_molecules.csv sample.smi

---
# 6. Running External Tools

In drug discovery, we often run external tools like PaDEL for fingerprint calculation.

## 6.1 Running Shell Scripts

In [None]:
# Create a simple shell script
!echo '#!/bin/bash' > hello.sh
!echo 'echo "Hello from shell script!"' >> hello.sh
!echo 'echo "Current date: $(date)"' >> hello.sh
!cat hello.sh

In [None]:
# Make it executable and run
!chmod +x hello.sh
!bash hello.sh

In [None]:
# Clean up
!rm hello.sh

## 6.2 Checking Java (Required for PaDEL)

In [None]:
# PaDEL requires Java - check if it's installed
!java -version 2>&1 | head -1

---
# 7. Quick Reference Card

| Command | Purpose | Example |
|---------|---------|---------|
| `pwd` | Print working directory | `!pwd` |
| `ls` | List files | `!ls -lah` |
| `cat` | View file contents | `!cat file.txt` |
| `head` | First N lines | `!head -10 file.csv` |
| `tail` | Last N lines | `!tail -10 file.csv` |
| `wc -l` | Count lines | `!wc -l file.csv` |
| `cp` | Copy file | `!cp src.txt dst.txt` |
| `mv` | Move/rename | `!mv old.txt new.txt` |
| `rm` | Remove file | `!rm file.txt` |
| `grep` | Search pattern | `!grep "active" data.csv` |
| `cut` | Extract columns | `!cut -d',' -f1,2 data.csv` |
| `pip install` | Install package | `!pip install pandas` |
| `wget` | Download file | `!wget URL` |

---

**Next Step:** Now that you're comfortable with shell commands, proceed to the **Python Libraries Primer** notebook!