## Lab 1 - Unix shell

**Exercise 1 - Searching for users with highest number of running processes**

The shell command `ps -ef` displays information about every running process (the `--no-headers` option eliminates the first line or *header*):  

In [1]:
%%bash
n=$(ps -ef --no-headers | wc -l)
echo "there are $n processes:"
ps -ef | head -5
echo "..."
ps -ef | tail -5

there are 2350 processes:
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 Mar27 ?        00:03:19 /sbin/init
root           2       0  0 Mar27 ?        00:00:01 [kthreadd]
root           3       2  0 Mar27 ?        00:00:00 [rcu_gp]
root           4       2  0 Mar27 ?        00:00:00 [rcu_par_gp]
...
root     3856561       2  0 Apr07 ?        00:00:08 [kworker/192:2-mm_percpu_wq]
root     3999334       2  0 Apr07 ?        00:00:00 [kworker/28:2]
root     4190481       2  0 Apr08 ?        00:00:08 [kworker/174:2-events]
root     4190492       2  0 Apr08 ?        00:00:06 [kworker/227:1-rcu_gp]
root     4190497       2  0 Apr08 ?        00:00:00 [kworker/191:0]


The first column, named `UID`, contains the username. Using only `cut`, `sort` and `uniq` commands, try to get the users with highest number of running processes, sorted in descending order, something such as:
```
   2279 root
     18 ana_mat+
     11 jupyter+
      8 aandres
      7 shima_m+
      7 mahdi_m+
      4 systemd+
      2 kernoops
      2 avahi
      1 syslog
      1 rtkit
      1 message+
```

In the previous example, user names are truncated (those ending with a `+`). Try using the command `ps -eo user --no-headers` instead, which only displays the username but does not truncate it. The final result could be something such as:
```
   2279 root
     18 ana_mathmode
     11 jupyter-mpenagaricano
      8 aandres
      7 shima_mathmode
      7 mahdi_mathmode
      2 kernoops
      2 avahi
      1 systemd-timesync
      1 systemd-resolve
      1 systemd-oom
      1 systemd-network
      1 syslog
      1 rtkit
      1 messagebus
```

**Exercise 2 - List of open ports**

The shell command `netstat -tulpn` lists open ports (add `2> /dev/null` to redirect the `stderr` to null):

In [2]:
%%bash
netstat -tulpn 2> /dev/null | head -5
echo ...
netstat -tulpn 2> /dev/null | tail -5

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:15001         0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8099          0.0.0.0:*               LISTEN      -                   
...
udp        0      0 127.0.0.53:53           0.0.0.0:*                           -                   
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           -                   
udp        0      0 0.0.0.0:39761           0.0.0.0:*                           -                   
udp6       0      0 :::5353                 :::*                                -                   
udp6       0      0 :::38216                :::*                                -                   


The first column, named `Proto`, shows to the protocol (`tcp`, `tcp6`, `udp`, `udp6`) and the fourth, named `Local Address`, shows the IP address and port number (`address:port`) on the local machine where a network service is listening or a connection is established. The `127.0.0.1` local address refers to the loopback interface, which is a virtual network interface used for communication **within the same machine**, and The `0.0.0.0` local address is a wildcard address that instructs the service to listen on **all available network interfaces on the machine**, including loopback, local network, and external interfaces.

Using only `grep`, `tr`, `cut` and `sort` commands, try to get in a single line the sorted list of ports being used only internally (i.e. `Local Address = 127.0.0.1:port`):
```
631 3000 8099 15001 34193 34239 34833 35809 38791 39045 40863 41117 42391 43415 43775 45335 45567 45673 46083 46719 47077 48343 48363 49331 50011 52303 53959 54075 54533 56515 57143 57385 57497 58607 60237 60481
```
**Hint:** The command `tr -s " "` squeezes (compresses) multiple consecutive spaces into a single space:

In [3]:
!echo "Squeeze     the    white  spaces    in    this   text" | tr -s " "

Squeeze the white spaces in this text


**Exercise 3 - Processing the Iris flower data set**

The **Iris Flower Data Set** (https://en.wikipedia.org/wiki/Iris_flower_data_set) is a well-known dataset used in statistics, machine learning, and data science. The dataset contains measurements of physical characteristics of three species of iris flowers:
* Species: Iris setosa, versicolor, and virginica.
* Features: Four numerical measurements (in centimeters) for each flower:
   * Sepal length
   * Sepal width
   * Petal length
   * Petal width
*Size: 150 samples (50 samples per species).

You can see a CSV (Comma-Separated Values) version of the dataset at https://raw.githubusercontent.com/pandas-dev/pandas/refs/heads/main/pandas/tests/io/data/csv/iris.csv :

In [4]:
%%bash
URL="https://raw.githubusercontent.com/pandas-dev/pandas/refs/heads/main/pandas/tests/io/data/csv/iris.csv"
curl -s $URL | head -5
echo ...
curl -s $URL | tail -5

SepalLength,SepalWidth,PetalLength,PetalWidth,Name
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
...
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica

Extract, without saving locally the `iris.csv` file, all the unique species names (i.e. uniques values of field `Name`):
```
Iris-setosa
Iris-versicolor
Iris-virginica
```

Create, without saving locally the `iris.csv` file, a CSV file for each specie (without a `Name` column). The output of

`head -5 *.csv && echo -e "\n-------\n" && wc -l *.csv`

should display:

```
==> Iris-setosa.csv <==
SepalLength,SepalWidth,PetalLength,PetalWidth
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2

==> Iris-versicolor.csv <==
SepalLength,SepalWidth,PetalLength,PetalWidth
7.0,3.2,4.7,1.4
6.4,3.2,4.5,1.5
6.9,3.1,4.9,1.5
5.5,2.3,4.0,1.3

==> Iris-virginica.csv <==
SepalLength,SepalWidth,PetalLength,PetalWidth
6.3,3.3,6.0,2.5
5.8,2.7,5.1,1.9
7.1,3.0,5.9,2.1
6.3,2.9,5.6,1.8

-------

  51 Iris-setosa.csv
  51 Iris-versicolor.csv
  51 Iris-virginica.csv
 153 total
```