# 1. Using `cut` command


## A. Applying cut to text


<br>**IN**:
<br>`echo "database" | cut -c1-4`
<br>**OUT**: 
<br>data
<br>
<br>**IN**:
<br>`echo "database" | cut -c5-8`
<br>**OUT**: 
<br>base
<br>
<br>**IN**:
<br>`echo "database" | cut -c1,5`
<br>**OUT**: 
<br>db

---

## B. Applying cut to demilter separated file

<br>Suppose we have a file `/etc/passwd` with delimiter ":" and content as below:<br><br>
`root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin`
<br>
<br>
<br>1. To extract a column, we use `-d` for specifying delimiter and `-f` specifying column location
<br>
<br>**IN:** <br> 
`cut -d":" -f1 /etc/passwd`
<br>**OUT:** <br>
`
root
daemon
bin
sys
`
<br><br>
<br>2. To extract more than 1 column, we pass in column locations separated by comma in `-f`
<br>
<br>**IN:** <br> 
`cut -d":" -f1,3,5 /etc/passwd`
<br>**OUT:** <br>
`
root:0:root
daemon:1:daemon
bin:2:bin
sys:3:sys
`
<br><br>
<br>3. To extract a continuous range of columns, pass in the range in `-f`
<br>
<br>**IN:** <br> 
`cut -d":" -f1-3 /etc/passwd`
<br>**OUT:** <br>
`
root:x:0
daemon:x:1
bin:x:2
sys:x:3
`
<br><br>


# 2. Using `tr` command

`tr` command is used for transforming data.

Transform from lower to upper
<br>**IN:**<br>`echo "Shell Scripting" | tr "[:lower:]" "[:upper:]"`
<br>**OUT:**<br>`SHELL SCRIPTING`
    
Transform from upper to lower
<br>**IN:**<br>`echo "Shell Scripting" | tr "[:upper:]" "[:lower:]"`
<br>**OUT:**<br>`shell scripting`

`-d` parameter is used for deleting characters.
<br>
Delete digits
<br>
<br>**IN:**<br>
`echo "1234 are number followed by 567" | tr -d "[:digits:]"`
<br>**OUT:**<br>
` are number followed by `

Delete specific characters
<br>**IN:**<br>
`echo "1234 are number followed by 567" | tr -d "followed"`
<br>**OUT:**<br>
`1234 are number by 567`

# 3. Working with postgres server

<br>a. Start postgres: `start_postgres`
<br>b. Connect to database server with credentials: `psql --username=postgres --host=localhost`
<br>c. Connect to database: `\c template1`
<br>d. Create Table: `create table users(username varchar(50), userid int, homedirectory(200));`
<br>e. Quit psql client: `\q`

In [2]:
# Bash Script

"""
# This script
# Extracts data from /etc/passwd file into a CSV file.

# The csv data file contains the user name, user id and 
# home directory of each user account defined in /etc/passwd

# Transforms the text delimiter from ":" to ",".
# Loads the data from the CSV file into a table in PostgreSQL database.

# Extract Phase
echo "Extracting data"

# Extract the columns 1 (user name), 2 (user id) and 
# 6 (home directory path) from /etc/passwd

cut -d":" -f1,3,6 /etc/passwd > extracted-data.txt

# Transform Phase
echo "Transforming data"

# read the extracted data and replace the colons with commas.

tr ":" "," < extracted-data.txt > transformed-data.csv

# Load Phase
echo "Loading data"

# Send the instructions to connect to 'template1' and
# copy the file to the table 'users' through command pipeline.

echo "\c template1; \COPY users FROM '/home/project/transformed-data.csv' DELIMITERS ',' CSV;" | psql --username-postgres --host=localhost

"""

'\n# This script\n# Extracts data from /etc/passwd file into a CSV file.\n\n# The csv data file contains the user name, user id and \n# home directory of each user account defined in /etc/passwd\n\n# Transforms the text delimiter from ":" to ",".\n# Loads the data from the CSV file into a table in PostgreSQL database.\n\n# Extract Phase\necho "Extracting data"\n\n# Extract the columns 1 (user name), 2 (user id) and \n# 6 (home directory path) from /etc/passwd\n\ncut -d":" -f1,3,6 /etc/passwd > extracted-data.txt\n\n# Transform Phase\necho "Transforming data"\n\n# read the extracted data and replace the colons with commas.\n\ntr ":" "," < extracted-data.txt > transformed-data.csv\n'

In [None]:
# cp-access-log.sh
# This script downloads the file 'web-server-access-log.txt.gz'
# from "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0250EN-SkillsNetwork/labs/Bash%20Scripting/ETL%20using%20shell%20scripting/".

# The script then extracts the .txt file using gunzip.

# The .txt file contains the timestamp, latitude, longitude 
# and visitor id apart from other data.

# Transforms the text delimeter from "#" to "," and saves to a csv file.
# Loads the data from the CSV file into the table 'access_log' in PostgreSQL database.

# Download the access log file

wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0250EN-SkillsNetwork/labs/Bash%20Scripting/ETL%20using%20shell%20scripting/web-server-access-log.txt.gz"

# Unzip the file
gunzip -f web-server-access-log.txt.gz

# Extraction
echo "Extracting data"

cut -d"#" -f1-4 web-server-access-log.txt > extracted-data.txt

# Transformation
echo "Transforming data"

tr "#" "," < extracted-data.txt > transformed-data.csv

# Loading
echo "Loading data"

echo "\c template1;\COPY access_log FROM '/home/project/transformed-data.csv' DELIMITERS ',' CSV HEADER;" | psql --username=postgres --host=localhost




Verify:
    echo '\c template1; \\SELECT * from access_log;' | psql --username=postgres --host=localhost