# Learning Supercomputer

This tutorial, I am going to take notes about learning how to get start with using a super computer. The one that I am using right now is in UTC.

### LogIn:

- Using ssh. The command line: `ssh user_name@epyc.simcenter.utc.edu`. Then password. This are the id & password for hpc specific.
  
or, I can log in through OnDemand:

- https://utc-ondemand.research.utc.edu

# Check the accessible resources you have

In a supercomputer, there are multiple __Nodes__. A node is essentially a complete computer unit within the supercomputer. It has it's own Processors, RAM, Storage & Network Interface. Some nodes are CPU based, some are GPU based. CPU focused nodes can have non-dedicated GPUs. GPU based nodes also have simple CPU processors. 


## JobID and UserName:

`squeue -u $USER`

```
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
152370   general sys/dash   -----  R    1:32:02      1 epyc15
```

So, 152370 was the job id, running by me, and user name is `-----`


## Node

To learn about the partition of nodes present in the cluster, use the command:

`sinfo`

Then learn about which exact partition your'e in:

`hostname`

And in that partition, what exact node you're using:

`squeue -u $USER`

```
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
152370   general sys/dash   -----  R    1:32:02      1 epyc15
```

so, `epyc15` is the exact node that I am using.


## Core

To see total cores available in that node:

`lscpu`

Or, specifically: 

`lscpu | grep "CPU"`





#### CPU focused node

```
Server/Node
└── Socket (Physical CPU Package)
    └── Cores (Physical Processing Units)
        └── Threads (Logical Processing Units)
```

In regular PC/Laptops, there are only one socket, where processor is placed/soldired. In HPC, there can be more than one socket. In a single socket, there are many cores. In a single core, there can be one thread or multiple threads. You can visualize core vs thread like this: 

__Core__
```
Core (Kitchen)
├── Can process instructions (Cook meals)
├── Has its own resources:
│   ├── Registers (Cooking tools)
│   ├── ALU (Stove)
│   └── Cache (Ingredients shelf)
└── Works independently of other cores
```

__Single Thread__
```
Core (Kitchen)
└── Thread 1 (One Chef)
    └── Uses 100% of core resources
```

__Multi Thread__
```
Core (Kitchen)
├── Thread 1 (Chef 1)
│   └── Uses core resources when Thread 2 is waiting
└── Thread 2 (Chef 2)
    └── Uses core resources when Thread 1 is waiting
```


Also, let's learn about CPU cache:

__Cache__
```
L1 Cache = Chef's Immediate Workspace
├── Smallest (32KB in your CPU)
├── L1i: instruction cache, L1d: Data Cache
├── Fastest access
└── Like ingredients right on chef's cutting board

L2 Cache = Kitchen Counter
├── Larger (1MB in your CPU)
├── Slightly slower
└── Like ingredients kept within arm's reach

L3 Cache = Kitchen Storage Room
├── Largest (38.5MB in your CPU)
├── Slower than L1/L2, but faster than RAM
└── Like the pantry/refrigerator shared by all chefs

Data Access Flow:
CPU Register → L1 → L2 → L3 → RAM → Storage
(Fastest) ---------------→ (Slowest)

```



For detailed cpu information:

- `lscpu`



## RAM

To see the overall RAM summary of that cluster: (-h means human readable format)

`free -h`

From that, you'll know `free` memory size. And then 

`scontrol show job __job#___ | grep -i 'mem'`

If the output says `MinMemoryNode=0`, then I can use all useable free memory



lscpu                    # Detailed CPU information
nproc                    # Number of processing units
cat /proc/cpuinfo        # Raw CPU info
top                      # Real-time CPU usage and processes
htop                     # Interactive process viewer (if installed)

# GPT code to get a summary:

In [3]:
import psutil
import platform
import os

# CPU Information
print("=== CPU INFO ===")
print(f"Physical cores: {psutil.cpu_count(logical=False)}")
print(f"Total cores (including logical): {psutil.cpu_count(logical=True)}")
print(f"CPU Frequency: {psutil.cpu_freq().current:.2f} MHz")
print(f"CPU Usage per core: {psutil.cpu_percent(percpu=True)}%")

# Memory Information
print("\n=== MEMORY INFO ===")
memory = psutil.virtual_memory()
print(f"Total RAM: {memory.total / (1024**3):.2f} GB")
print(f"Available RAM: {memory.available / (1024**3):.2f} GB")
print(f"RAM Usage: {memory.percent}%")

# Disk Information
print("\n=== DISK INFO ===")
partitions = psutil.disk_partitions()
for partition in partitions:
    try:
        partition_usage = psutil.disk_usage(partition.mountpoint)
        print(f"\nDevice: {partition.device}")
        print(f"Mountpoint: {partition.mountpoint}")
        print(f"File system type: {partition.fstype}")
        print(f"Total Size: {partition_usage.total / (1024**3):.2f} GB")
        print(f"Used: {partition_usage.used / (1024**3):.2f} GB")
        print(f"Free: {partition_usage.free / (1024**3):.2f} GB")
        print(f"Usage: {partition_usage.percent}%")
    except:
        continue

# System Information
print("\n=== SYSTEM INFO ===")
print(f"Operating System: {platform.system()}")
print(f"OS Version: {platform.version()}")
print(f"Machine: {platform.machine()}")
print(f"Processor: {platform.processor()}")

=== CPU INFO ===
Physical cores: 6
Total cores (including logical): 12
CPU Frequency: 1105.00 MHz
CPU Usage per core: [15.6, 8.5, 15.8, 10.3, 13.4, 10.7, 13.8, 9.3, 12.7, 8.7, 13.0, 9.3]%

=== MEMORY INFO ===
Total RAM: 15.76 GB
Available RAM: 4.23 GB
RAM Usage: 73.2%

=== DISK INFO ===

Device: C:\
Mountpoint: C:\
File system type: NTFS
Total Size: 199.37 GB
Used: 143.65 GB
Free: 55.72 GB
Usage: 72.1%

=== SYSTEM INFO ===
Operating System: Windows
OS Version: 10.0.26100
Machine: AMD64
Processor: Intel64 Family 6 Model 166 Stepping 0, GenuineIntel
