- Removing Duplicates
- Standardising the data
- Evaluating Null or blank values
- Removing any unnecessary columns
-
Display of duplicate records in the table
Dealing with redundant data which is demonstrated by the row number indicating the same record occurs more than once and thus dealt with by eliminating them
-
Standardising the data
Finding issues in the data and fixing it such as changing the date and time format to 'YYYY-MM-DD'
-
Evaluating Null or blank values
Removing null columns and blank values in the staged data for which has been demonstrated by the industry column below.
-
Partition By results
-
Substrings + Use cases
with Fuzzymatch
-
Window Functions vs Group By
with rolling total
Starts at a specific value and adds on values from subsequent rows based on the partitions. In this case the starting point is Pam's salary which is conseuently added to Angela's salary to get the 83k and so forth till the final value of 124k. This is partitioned by the unique value of gender in male vs female hence from Jim starts at 45k and the rule applies to get the final value of 313k.
with row_num, rank and dense_rank