## Extracting data using `cut` command

We can extract a specific column/field from a delimited text file, by mentioning

-   the delimiter using the `-d` option, or
-   the field number using the `-f` option.

In [1]:
!head data/passwd

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin


In [4]:
!cut -d":" -f1 data/passwd

root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
list
irc
gnats
nobody
_apt
messagebus
theia
mongodb
ntp
cassandra
postgres


In [5]:
!cut -d":"  -f1,3,6  data/passwd

root:0:/root
daemon:1:/usr/sbin
bin:2:/bin
sys:3:/dev
sync:4:/bin
games:5:/usr/games
man:6:/var/cache/man
lp:7:/var/spool/lpd
mail:8:/var/mail
news:9:/var/spool/news
uucp:10:/var/spool/uucp
proxy:13:/bin
www-data:33:/var/www
backup:34:/var/backups
list:38:/var/list
irc:39:/var/run/ircd
gnats:41:/var/lib/gnats
nobody:65534:/nonexistent
_apt:100:/nonexistent
messagebus:101:/nonexistent
theia:1000:/home/theia
mongodb:102:/var/lib/mongodb
ntp:103:/nonexistent
cassandra:104:/var/lib/cassandra
postgres:105:/var/lib/postgresql


In [6]:
!cut -d":"  -f3-6  data/passwd

0:0:root:/root
1:1:daemon:/usr/sbin
2:2:bin:/bin
3:3:sys:/dev
4:65534:sync:/bin
5:60:games:/usr/games
6:12:man:/var/cache/man
7:7:lp:/var/spool/lpd
8:8:mail:/var/mail
9:9:news:/var/spool/news
10:10:uucp:/var/spool/uucp
13:13:proxy:/bin
33:33:www-data:/var/www
34:34:backup:/var/backups
38:38:Mailing List Manager:/var/list
39:39:ircd:/var/run/ircd
41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats
65534:65534:nobody:/nonexistent
100:65534::/nonexistent
101:102::/nonexistent
1000:1000:,,,:/home/theia
102:106::/var/lib/mongodb
103:107::/nonexistent
104:108:Cassandra database,,,:/var/lib/cassandra
105:109:PostgreSQL administrator,,,:/var/lib/postgresql


## Transforming data using `tr`

`tr` is a filter command used to translate, squeeze, and/or delete characters.

### Translate from one character set to another

The command below translates all lower case alphabets to upper case.

In [7]:
!echo "Shell Scripting"  | tr "[a-z]"  "[A-Z]"

SHELL SCRIPTING


You could also use the pre-defined character sets also for this purpose:

In [8]:
!echo "Shell Scripting"  | tr "[:lower:]"  "[:upper:]"

SHELL SCRIPTING


### Squeeze repeating occurrences of characters

The `-s` option replaces a sequence of a repeated characters with a single occurrence of that character.

The command below replaces repeat occurrences of 'space' in the output of `ps` command with one 'space':

In [9]:
!ps

  PID TTY           TIME CMD


In [11]:
!ps | tr -s " "

 PID TTY TIME CMD
56445 ttys000 0:00.45 /bin/zsh -c ps | tr -s " "
56447 ttys000 0:00.00 tr -s 


### Delete characters

We can delete specified characters using the `-d ` option.

The command below deletes all digits:

In [12]:
!echo "My login pin is 5634"  | tr -d "[:digit:]"

My login pin is 


## Loading data into PostgreSQL database

In this exercise we will create a table called '**users**' in the PostgreSQL database. This table will hold the user account information.

The table 'users' will have the following columns:

1.  uname
2.  uid
3.  home

### Connect to the database server

Run the command below to login to PostgreSQL server

In [None]:
!psql --host=database-1.cy8ltogyfgas.us-east-1.rds.amazonaws.com --port=5432 --username=postgres --password

You will get the psql prompt: 'postgres=#'

### Connect to a database

We will use a database called **postgres** which is already available by default.

To connect to this database, run the following command at the 'postgres=#' prompt.

In [None]:
\c template1

You will get the following message:

`You are now connected to database "postgres" as user "postgres".`

### Create the table

Run the following statement at the 'postgres=#' prompt:

In [None]:
create table users(username varchar(50),userid int,homedirectory varchar(100));

If the table is created successfully, you will get the message below.

`CREATE TABLE`

In this exercise, we will create a shell script which does the following.

-   Extract the user name, user id, and home directory path of each user account defined in the data/passwd file.
-   Save the data into a comma separated (CSV) format.
-   Load the data in the csv file into a table in PostgreSQL database.

### Extract required user information from `data/passwd`

In this step, we will extract user name (field 1), user id (field 3), and home directory path (field 6) using the cut command.

In [13]:
!cut -d":" -f1,3,6 data/passwd > data/extracted-data.txt

In [16]:
!head -5 data/extracted-data.txt

root:0:/root
daemon:1:/usr/sbin
bin:2:/bin
sys:3:/dev
sync:4:/bin


### Transform the data into CSV format

The extracted columns are separated by the original “:” delimiter.

We need to convert this into a “,” delimited file.

In [17]:
!tr ":" "," < data/extracted-data.txt > data/transformed-data.csv

In [19]:
!head -5 data/transformed-data.csv

root,0,/root
daemon,1,/usr/sbin
bin,2,/bin
sys,3,/dev
sync,4,/bin


### Load the data into the table ‘users’ in PostgreSQL

To load data from a shell script, we will use the psql client utility in a non-interactive manner.

This is done by sending the database commands through a command pipeline to psql with the help of echo command.

In [None]:
!echo "\c postgres;\COPY users FROM '/content/data/transformed-data.csv' DELIMITERS ',' CSV;" | psql --host=database-1.cy8ltogyfgas.us-east-1.rds.amazonaws.com --username=postgres

Run the command below to verify that the table users is populated with the data

In [None]:
!echo '\c postgres; \\SELECT * from users;' | psql --host=database-1.cy8ltogyfgas.us-east-1.rds.amazonaws.com --username=postgres