# **Hands-on Lab: Populating a Data Warehouse using PostgreSQL**

https://www.coursera.org/learn/getting-started-with-data-warehousing-and-bi-analytics/ungradedLti/7tDHW/hands-on-lab-populating-a-data-warehouse-using-postgresql



## **Purpose of the Lab:**

The lab is designed to provide hands-on experience in creating and managing a production database using PostgreSQL within the IBM Skills Network Labs (SN Labs) Cloud IDE. You will learn how to launch a PostgreSQL server instance, utilize the pgAdmin graphical user interface (GUI) for database operations, and execute essential tasks like creating a database, designing tables, and loading data. The lab focuses on building a foundation in database management by guiding learners through the process of setting up a 'Production' database and populating it with data following a star schema design.

## **Benefits of Learning the Lab:**

Engaging in this lab offers significant benefits for learners seeking to deepen their understanding of database management systems, particularly PostgreSQL. By working through the lab, you will gain practical skills in SQL, database creation, table design, and data manipulation, which are crucial for roles in data engineering, database administration, and data science. The hands-on approach helps in consolidating knowledge of database schemas and SQL queries, thereby enhancing the learner's ability to manage and analyze data effectively in real-world scenarios. Additionally, familiarity with tools like pgAdmin and the Cloud IDE environment adds valuable experience to your skill set, preparing you for advanced database projects and tasks.

## **Software Used in this Lab**

To complete this lab you will utilize the [PostgreSQL Database] relational database service available as part of IBM Skills Network Labs (SN Labs) Cloud IDE. SN Labs is a virtual lab environment used in this course.



# **Database Used in this Lab**

Production database is used in this lab.

The production database contains:

- DimCustomer
- DimMonth
- FactBilling

## **Objectives**

In this lab you will:

- Create production related database and tables in a PostgreSQL instance.
- Populate the production data warehouse byloading the tables from Scripts.

## **Lab Structure**

In this lab, you will complete several tasks in which you will learn how to create tables and load data in the PostgreSQL database service using the pgAdmin graphical user interface (GUI) tool.

# **Data Used in this Lab**

The following are the SQL data files used in this lab.

The production database contains:

- [DimCustomer](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/PostgresData/DimCustomer.sql)
- [DimMonth](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/PostgresData/DimMonth.sql)
- [FactBilling](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/PostgresData/FactBilling.sql)
- [Star Schema](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/PostgresData/star-schema.sql)

Task A: Create a database

- Access the pgAdmin GUI tool.

- In the left tree-view, right-click on Databases> Create > Database.
  
- In the Database box, type Production as the name for your new database, and then click Save. Proceed to Task B.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/3.png>

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/4.png>

# **Task B: Create tables**

Now, that you have your PostgreSQL service active and have created the **Production database** using pgAdmin, let’s go ahead and create a few tables to populate the database and store the data that we wish to eventually upload into it.

1. In the top of the page go to **Query tool** and then click on **Open File**. Next a new page pops up called **Select File**. Click on **Upload** icon as shown in the screenshot.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/6.png>

In the new blank page that appears drag and drop the star-schema.sql file inside the blank page. Once the star-schema.sql file is successfully loaded, click on the X icon on the left hand side of the page as shown in the screenshot.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/7.png>

Once you click on the X icon a new page appears with the file star-schema.sql. Select the star-schema.sql file from the list and click on Select tab.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/8.png>

Once the file opens up click on the Run option to execute the star-schema.sql file.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/9.png>

Next, right-click on the Production database and click on Refresh option from the dropdown.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/10.png>

After the database is refreshed the 3 tables(DimCustomer, DimMonth,FactBilling) are created under the Databases > Production > Schema > Public > Tables.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/11.png>

# **Task C: Load tables**

1. Click on **Query tool** and then click **Open** file and click on **Upload** icon.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/13.png>

In the new blank page that appears drag and drop the DimCustomer.sql file inside the blank page. Once the DimCustomer.sql file is successfully loaded.

Click on the small X icon on the left hand side of the page as shown in the screenshot.

Once you click on the X icon a new page appears with the file DimCustomer.sql. Select the DimCustomer.sql file from the list and click on Select tab.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/15.png>

Once the file opens up, click on the Run option to execute the DimCustomer.sql file.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/16.png>

Note: Repeat the steps as given in Task C to upload the remaining sql files to insert data in DimMonth and FactBilling.

Let’s run the command below on the PostgreSQL Tool.

In [2]:
#select count(*) from public."DimMonth";

You should see an output as seen in the image below.

<img src = https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0260EN-SkillsNetwork/labs/BIWorkaroundFiles/week2/images/29.png>

You are encouraged to run more sql queries.

# **Practice exercises**

### **Problem 1: Using the PostgreSQL tool, find the count of rows in the table FactBilling**


    
Use the select statement along with count function on the table FactBilling

In [3]:
#select count(*) from public."FactBilling";

### **Problem 2: Using the PostgreSQL tool, create a simple MQT named avg_customer_bill with fields customerid and averagebillamount.**


    
Use the create materilized view command.

In [None]:
# CREATE MATERIALIZED VIEW  avg_customer_bill (customerid, averagebillamount) AS
# (select customerid, avg(billedamount)
# from public."FactBilling"
# group by customerid
# );

Click the Run All Button to run the statement. You should see status as Success in the Result section

### **Problem 3: Refresh the newly created MQT**

    
Use the refresh materialized view command.

In [4]:
# REFRESH MATERIALIZED VIEW avg_customer_bill;

### **Problem 4: Using the newly created MQT find the customers whose average billing is more than 11000.**


    
Use the select statement on the MQT with a where clause on the column averagebillamount.

In [5]:
# select * from avg_customer_bill where averagebillamount > 11000;

Congratulations! You have successfully finished the Populating a Data Warehouse lab.