![image.png](attachment:image.png)

# Week 5 Overview
This week, we will explore the essentials of data types, including structured, semi-structured, and unstructured data, while examining how to manage them and the tools that support their handling. We'll dive into database fundamentals by reviewing the evolution of database systems, their key components, and the role of database management systems (DBMS), as well as data models and architecture. Lastly, we'll focus on relational database basics, covering the relational model, keys and relationships, and the use of structured query language (SQL).

![image-2.png](attachment:image-2.png)

### Learning Objectives
At the end of this week, you will be able to: 
- Differentiate between structured, unstructured, and semi-structured data  
- Recall database fundamentals and components  
- Explain relational database fundamentals

# Topic Overview: Structured, Semi-Structured, and Unstructured Data
In this topic, we cover the concept of data types in databases, reviewing use cases and examples to build foundational knowledge. We explore the challenges and considerations involved in managing different data types, including structured, unstructured, and semi-structured data. Finally, we highlight tools and technologies that streamline the handling of diverse data types, ensuring effective storage, processing, and analysis. 

### Learning Objectives 
- Differentiate between structured, unstructured, and semi-structured data

## 1.1 Lesson: Data Types
In this video, we explore structured, semi-structured, and unstructured data, highlighting their unique characteristics, use cases, and management challenges. We’ll also review the tools and technologies commonly used to handle each type of data, helping you choose the right solution for efficient and scalable database management.

### Structured Data
- In finance, structured data is crucial for processing transactions and generating reports. For example, a bank’s database will hold customer account information, balances, and transaction history in a structured format for quick retrieval through queries and reporting.

### Unstructured Data
- A common use case for unstructured data is in social media analytics. Social media platforms generate a massive amount of unstructured data, such as user posts, comments, and interactions. This data can be analyzed to understand user sentiment and engagement.

### Semi-Structured Data
- In e-commerce, semi-structured data, like product reviews stored in JavaScript object notation (JSON) format, can be used to build recommendation engines. The flexibility of semi-structured data allows companies to easily incorporate new fields like customer ratings or product specifications without having to completely restructure the database as if they were using a relational structure. 


## 1.2 Lesson: Considerations in Managing Different Data Types
Managing diverse data types in modern organizations requires a sophisticated understanding of their unique characteristics and challenges. **Structured data**, with its well-defined schema and organized format, typically demands rigorous validation processes and strict data governance policies to maintain consistency and integrity. Organizations must implement comprehensive data quality checks, establish clear data entry protocols, and maintain detailed metadata documentation to ensure structured data remains reliable and usable over time. 

**Semi-structured** data presents distinct management challenges due to its flexible schema and varying formats. While it offers greater adaptability than structured data, organizations must develop robust parsing mechanisms and implement flexible storage solutions that can accommodate evolving data structures. JSON and XML documents, common forms of semi-structured data, require specialized tools for validation, indexing, and querying. Organizations need to balance the flexibility of semi-structured formats with the need for consistent data retrieval and analysis capabilities. 

**Unstructured data**, such as text documents, images, and videos, requires fundamentally different management approaches. Organizations must implement sophisticated content management systems that can handle large file sizes, provide efficient search capabilities, and maintain appropriate metadata. Storage considerations become particularly critical, as unstructured data often consumes significantly more space than its structured counterparts. Additionally, organizations need to implement effective tagging and categorization systems to make unstructured data discoverable and usable. 

The integration of these different data types often presents the greatest challenge. Organizations must develop comprehensive data management strategies that can seamlessly handle transitions between structured, semi-structured, and unstructured data. This might involve implementing data lakes or hybrid storage solutions that can accommodate all data types while maintaining data lineage and ensuring proper access controls. Modern data management platforms must also consider the performance implications of processing and analyzing different data types, particularly when real-time or near-real-time access is required.

## 1.3 Lesson: Tools and Technologies for Handling Different Data Types
For **structured data**, traditional relational database management systems (RDBMS) like Oracle, PostgreSQL, and MySQL remain foundational tools, offering robust transaction support, data integrity enforcement, and sophisticated query optimization. These systems excel at managing tabular data with fixed schemas and provide comprehensive features for data validation, indexing, and complex joins. Enterprise-grade tools like SAP HANA and Microsoft SQL Server extend these capabilities with in-memory processing and advanced analytics features particularly relevant to data science applications. 

When dealing with **semi-structured** data, document-oriented databases like MongoDB and Couchbase have emerged as powerful solutions. These NoSQL databases excel at handling JSON-like documents with flexible schemas, making them ideal for applications with evolving data structures. Meanwhile, graph databases such as Neo4j and Amazon Neptune specialize in managing highly interconnected data, offering unique capabilities for relationship analysis and pattern matching that traditional RDBMS cannot efficiently provide. 

For **unstructured data** management, specialized systems have evolved to address specific needs. Content management systems like Alfresco and SharePoint provide robust platforms for document management, while specialized image and video databases like GridFS and Milvus offer optimized storage and retrieval capabilities. Apache Solr and Elasticsearch have become essential tools for full-text search and analysis of unstructured text data, providing advanced features like faceted search and natural language processing capabilities. 

Modern data lake technologies like Apache Hadoop and cloud-based solutions such as Amazon S3 and Azure Data Lake Storage have revolutionized the ability to store and process multiple data types in a single ecosystem. These platforms provide the flexibility to store raw data in any format while offering integration with various processing tools. Data virtualization platforms like Denodo and TIBCO Data Virtualization further enable organizations to create unified views across different data types and storage systems, abstracting the complexity of underlying data structures from end users.

# Topic Overview: Database Fundamentals
In this topic, we review the fundamentals of database systems, exploring their evolution and core functions. We cover the key components of database systems, delve into the roles of DBMSs, and examine how they facilitate data organization and access. Finally, we discuss data models and database architecture, providing a comprehensive understanding of how databases are designed and structured to meet diverse needs. 

### Learning Objectives 
- Recall database fundamentals and components

## 2.1 Lesson: Overview of Database Systems
In this video, we’ll cover the fundamentals of database systems, including their evolution, key components, and the role of DBMSs. We’ll explore how databases have advanced from early hierarchical models to modern relational and NoSQL systems and discuss the essential functions of a DBMS, such as data storage, transaction management, and security. Finally, we’ll dive into data models and database architectures, explaining how data is organized and accessed in various systems, from centralized to cloud-based solutions.

### Understanding Database Fundamentals and Concepts

**Key Components of Database Systems**

- DataBase Engine: Manages
    - Transactions
    - Queries
    - Data Manipulation
- Database Schema - Defines how a database is organized, including tables, fields, relationships, and constraints.
    - Tables - store data and records in a relational format with records representing individual data entries
    - Indexes optimize data retrieval speed by providing a lookup mechanism.
    - Views are virtual tables providing specific data representations, useful for security and simplifying queries.
    - Metadata describes the structure, format, and characteristics of the stored data. 


As we reviewed in the video, we saw the evolution of database systems and their different models to handle various data needs. We discussed how early hierarchical and network models evolved into relational databases using SQL for structured data, followed by the emergence of NoSQL databases like MongoDB and Cassandra specifically designed to handle unstructured and semi-structured data.  

As reviewed, modern organizations often employ a combination of different database types — relational (SQL), NoSQL, and NewSQL — based on their specific needs and objectives. For document-style semi-structured records, we highlight the Document Model using JSON as an example. We also briefly mentioned how distributed architectures are used when high availability is needed, while centralized architectures are chosen for simpler management requirements but have scalability limitations. 

![image-3.png](attachment:image-3.png)


## 2.2 Lesson: Components of a Database Management System (DBMS)
___
A DBMS is a powerful tool that enables us to efficiently store, retrieve, and manage data. To understand how a DBMS works, let’s explore its key components and their roles: 

### Database Engine:
___
This is the heart of the DBMS. It handles all the behind-the-scenes work of storing, retrieving, and updating data, ensuring your database operations run smoothly.

### Query Processor
___
Ever wonder how your SQL queries turn into results? That’s the job of the query processor. It interprets your commands, optimizes their execution, and ensures the database engine understands your request.

### Schema Management
___
Think of this as the blueprint keeper. It organizes the structure of the database — tables, relationships, constraints, and indexes — so your data remains structured and accessible.

### Data Dictionary
___
This component acts like a reference book for your database. It keeps track of metadata, such as table definitions and user permissions, making it easier to manage the database.

### Transaction Management
___
When multiple operations need to occur together (like a bank transfer), transaction management ensures they’re processed reliably, maintaining the database’s integrity by adhering to atomicity, consistency, isolation, and durability (ACID) principles.

### Concurrency Control
___
In a multi-user environment, this component ensures that everyone can access the database without conflicts or data inconsistencies, even when several people are working simultaneously. 

### Storage Management
___
This handles how data is stored on disk or other media. It’s all about making sure the database uses space efficiently while maintaining performance.

### Backup and Recovery Module
___

Data loss can be catastrophic, but this component provides mechanisms to back up data and recover it in case of failures or corruption.

### Security Management
___
Protecting your data is crucial. Security management controls access to the database, ensuring that only authorized users can view or modify sensitive information.

### Reporting and Monitoring Tools
___
Finally, these tools help you keep an eye on your database’s performance and create useful reports or dashboards to analyze your data.

# Topic Overview: Relational Database Fundamentals
In this topic, we introduce the relational database model and its guiding principles, providing a foundation for understanding its structure. We then explore the role of keys and relationships in organizing data and delve into the use of SQL as a powerful tool for interacting with relational databases.  

### Learning Objectives 
- Explain relational database fundamentals 

## 3.1 Lesson: Relational Database Model
In this video, we’ll cover the relational database model, introduced by Edgar F. Codd in 1970. We'll explore its core principles, including tables, primary and foreign keys, normalization, and the ACID properties that ensure data integrity. Despite newer models, the relational model remains essential for managing structured data and is foundational in data-related fields.


A relational database model is a way of organizing data into tables, also known as relations, where data is stored in rows and columns. Each table represents an entity, like customers or orders, and the columns define the attributes of those entities. This model is based on mathematical principles, making it robust and flexible for structured data storage. 

**Key principles of the relational model include:** 
- Primary Keys: Unique identifiers for each row in a table, ensuring data is distinct. 
- Foreign Keys: Fields that link tables together, establishing relationships between entities while enforcing referential integrity. 
- Relationships: Defining how tables and records are connected using primary and foreign keys. 
- Normalization: The process of organizing data to reduce redundancy and improve efficiency, ensuring consistency and minimizing anomalies. 
- ACID Properties: Ensuring reliable database transactions, where ACID stands for atomicity, consistency, isolation, and durability. 
- Indexes: Structures that enhance query performance by providing fast access to rows based on specific column values. 

Relational databases also enforce data types and constraints, ensuring that each column holds valid and consistent data. Users interact with these databases using SQL to retrieve, update, and manipulate data efficiently. 

This model is the foundation of most traditional database systems and is used for applications requiring structured, consistent data, such as customer relationship management (CRM) systems, enterprise resource planning (ERP) systems, and transactional systems in industries like banking and health care. Modern relational databases have also adapted to handle distributed data and integrate with NoSQL systems to meet evolving business needs.

## 3.2 Lesson: Keys and Relationships
In this video, we’ll cover the essential concepts of keys and relationships in relational databases. These concepts are critical for organizing and linking data across tables. We’ll focus on primary keys, foreign keys, and the types of relationships—one-to-one, one-to-many, and many-to-many—that connect data. Understanding how keys work ensures data integrity and efficient querying, while relationships define how data is accessed and combined across multiple tables.

### Relational Database Terminology

| Term | Definition |
| :--- | :--- |
| `Keys` | Attributes or sets of attributes that help to uniquely identify records in a table and establish links between tables. |
| `Primary Key` | Unique identifier for each record in a table, it ensures that no two rows in a table have same value for primary key attribute. Primary Keys cannot have null values.  |
| `Foreign Key` | Field or collection of fields in one table that refers to primary key in another table. Foreign keys create links between two tables, establishing relationships between entities between tables. This allows data to be joined, allowing queries across tables. |

As we discussed, keys are a fundamental component of any relational database system and offer a foundation for relationships to be defined and data being stored most efficiently. As we get further into the module, you will see the concept of keys for relating entities and information continually present itself.

### Relationships
| Relationship Type | Rules |
| :--- | :--- |
| `One-to-One` | In a One-to-One relationship, each record in table A corresponds exactly to table B. Can be useful for separating sensitive or optional data from the main table. |
| `One-to-Many` | in a One-to-Many relationship, a single record from table A can relate to multiple records from table B. This is established using a foreign key in table B. | 
| `Many-to-Many` | In a many-to-many relationship, multiple records in table A can relate to multiple records in table B, and vice versa. This relationship is typically implemented using a junction or a bridge table that contains foreign keys referencing both tables. | 


## 3.3 Lesson: Structured Query Language
Suppose you’ve designed a well-structured relational database — tables are organized, relationships are defined, and normalization has minimized redundancy. But how do you actually interact with this database? How do you retrieve specific data, update records, or generate reports? 

Unlocking the Potential of Relational Databases
Enter SQL, the powerful language that brings your database to life. Whether you need to find all customers who made a purchase last week, update inventory levels, or join data across multiple tables, SQL is the tool that makes it possible. Let’s explore how SQL empowers you to unlock the full potential of a relational database. 

SQL is the standard tool for interacting with relational databases. It allows you to communicate with the database to perform a variety of operations, such as retrieving data, updating records, deleting entries, and even defining the structure of the database itself. SQL transforms the static data stored in tables into actionable insights, enabling businesses and applications to make data-driven decisions. 

### Core Components of SQL
#### Data Querying
SQL’s most common use is for querying data using the `SELECT` statement. This command allows you to specify the columns you want to retrieve, apply conditions with `WHERE`, and even sort results with `ORDER BY`. 

For example:

![image-6.png](attachment:image-6.png)

#### Data Manipulation
SQL also includes commands to modify data directly within the database, including: 

- `INSERT`: Adds new records. 
- `UPDATE`: Modifies existing records. 
- `DELETE`: Removes unwanted data.

For example:

![image-5.png](attachment:image-5.png)

#### Data Definition
Additionally, SQL allows you to define and modify the database structure with data definition language (DDL) commands, including: 

- `CREATE TABLE`: Creates new tables. 
- `ALTER TABLE`: Modifies the structure of an existing table. 
- `DROP TABLE`: Deletes a table and its data. 

For example:

![image-4.png](attachment:image-4.png)

#### Data Control

SQL also manages access to the database with data control language (DCL) commands: 

`GRANT`: Provides permissions to users. 

![image-7.png](attachment:image-7.png)

`REVOKE`: Removes Permissions

![image-8.png](attachment:image-8.png)


#### Why SQL Matters
SQL bridges the gap between raw data and actionable insights. It enables users to:

- Analyze data trends by aggregating and summarizing data with functions like:  
    - COUNT — used to provide a count of a number of records given parameterized details 
    - SUM — used to perform standard math operations for supported numerical data types 
    - AVG — used to perform average operations for supported numerical data types 
- Combine data from multiple tables using JOIN operations to reveal relationships and dependencies. 
- Ensure data integrity through constraints and transaction control, ensuring reliable and consistent database operations. 


#### The Power of SQL
The true power of SQL lies in its ability to simplify complex operations and make them accessible. From basic queries to advanced data transformations, SQL is a versatile language that empowers users to interact with relational databases efficiently. As we dive deeper into SQL, you’ll discover how to write effective queries, optimize performance, and harness its full potential to bring your database designs to life. 

We will provide examples of some of these commands in an SQL Query editor during the Live Session, and you will get some hands-on experience using SQL in the Azure Portal using your Azure SQL Database as a part of this week’s homework. 

## All you need to know about SQL Joins

### What are Joins?
JOINS in SQL are commands which are used to combine rows from two or more tables, based on a related column between those tables.  There are predominantly used when a user is trying to extract data from tables which have one-to-many or many-to-many relationships between them.

Now, that you know what joins mean, let us next learn the different types of joins.

#### How many types of Joins are there in SQL?
There are mainly four types of joins that you need to understand. They are:

- INNER JOIN
- FULL JOIN
- LEFT JOIN
- RIGHT JOIN

You can refer to the below image

![image-9.png](attachment:image-9.png)

### How do I know which join to use in SQL?

Let us look into each one of them. For your better understanding of this concept, I will be considering the following three tables to show you how to perform the Join operations on such tables.

#### Employee Table: 

| EmpID | EmpFname | EmpLname | Age | EmailID | PhoneNo | Address |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1	| Vardhan| Kumar | 22 | vardy@abc.com| 9876543210 | Delhi |
| 2 | Himani | Sharma |32 | himani@abc.com |9977554422 | Mumbai |
| 3	| Aayushi | Shreshth | 24 | aayushi@abc.com	| 9977555121 | Kolkata |
| 4	| Hemanth | Sharma | 25	| hemanth@abc.com | 9876545666 | Bengaluru |
| 5	| Swatee | Kapoor | 26 | swatee@abc.com	| 9544567777 | Hyderabad |
