### Procedural vs Declarative

#### Procedural
- **How to do.**
- Detailed steps, instructions
- Control flow (loops, conditionals)
- Python scripts

#### Declarative
- **What to do.**
- Desired outcome, not steps
- High-level operations
- SQL queries

#### Key Differences
- **Focus**:
  - *Procedural - process*
  - *Declarative - result*
- **Complexity**:
  - *Procedural: more detailed, more code*
  - *Declarative: more abstract, less code*

#### Examples
- **Procedural**: Adding a student to CSV
  ```python
  add_student('21', 'New Student', 'M', '2003-05-01', 'new.student@example.com', '303456789012')
  ```
- **Declarative**: Adding a student in SQL
  ```sql
  INSERT INTO students (RollNumber, Name, Gender, DateOfBirth, Email, Aadhar) VALUES (21, 'New Student', 'M', '2003-05-01', 'new.student@example.com', '303456789012');
  ```

### Drawbacks of Using File Systems to Store Data

- **Data Redundancy and Inconsistency**
  - *Duplicate data in multiple files, inconsistent information*

- **Difficulty in Accessing Data**
  - *Requires new programs for each task*

- **Data Isolation**
  - *Data spread across multiple files and formats*

- **Integrity Problems**
  - *Constraints hidden in code, hard to manage*
  - *Example: Ensuring account balance > 0*

- **Atomicity of Updates**
  - *Partial updates during failures lead to inconsistencies*
  - *Example: Funds transfer should fully complete or not happen*

- **Concurrent Access Issues**
  - *Multiple users need access for performance*
  - *Uncontrolled access causes inconsistencies*
  - *Example: Two users withdrawing from the same account simultaneously*

- **Security Problems**
  - *Difficult to restrict user access to specific data*

### Data Management Overview

**Physical Data Management (Bookkeeping)**
- _**Durability**: Prone to physical damage (rodents, humidity)_
- _**Scalability**: Difficult to maintain over time with multiple registers_
- _**Security**: Susceptible to tampering_
- _**Retrieval**: Time-consuming to find previous entries_
- _**Consistency**: Vulnerable to human errors_

**Electronic Data Management (Spreadsheets)**
- _**Durability**: Less prone to physical damage_
- _**Scalability**: Easier to search, insert, and modify records_
- _**Security**: Can be password protected_
- _**Usability**: Reduces manual effort in computations_

**Challenges with File Systems**
- _**Efficiency Issues**: Slow with growing data_
- _**Limitations**: Spreadsheet row limits_
- _**Data Consistency**: Challenges with concurrent processing_
- _**Access Control**: Difficult to manage permissions centrally_
- _**Risk**: System crashes can cause data loss_

### History of Database Systems

**1950s and early 1960s**
- *Data processing with magnetic tapes*
- *Sequential access only*
- ***Punched cards for input***

**Late 1960s and 1970s**
- ***Hard disks allow direct data access***
- *Network and hierarchical data models prevalent*
- ***Ted Codd defines relational data model***
- *IBM Research begins System R prototype*
- *UC Berkeley starts Ingres prototype*
- ***High-performance transaction processing***

**1980s**
- *Research prototypes evolve into commercial systems*
- ***SQL standardization***
- ***Parallel and distributed databases***
- *Object-oriented database systems*

**1990s**
- *Large decision support and data mining*
- ***Multi-terabyte data warehouses***
- *Rise of Web commerce*

**Later 2000s**
- *Giant data storage systems emerge*

### Comparison: File Handling via Python vs DBMS

**Scalability (Data Amount)**
- *Py: Difficult for insert, update, query; DBMS: Built-in scalability for large data.*

**Scalability (Structure Changes)**
- *Py: Hard to alter record structure; DBMS: Easy attribute changes with SQL.*

**Execution Time**
- *Py: Seconds; DBMS: Milliseconds.*

**Persistence**
- *Py: Manual updates to files; DBMS: Automatic data persistence.*

**Robustness**
- *Py: Manual data robustness; DBMS: Automated backup and recovery.*

**Security**
- *Py: Complex OS security; DBMS: Database-level access control.*

**Productivity**
- *Py: Coding-intensive for data management; DBMS: Streamlined with standard queries.*

**Arithmetic Operations**
- *Py: Supports arithmetic; DBMS: Limited arithmetic capabilities.*

**Costs**
- *Py: Low hardware, software costs; DBMS: High investment needed.*

### Comparison (Detailed)

**Scalability**
- *Py: Efficiency decreases as records increase due to search time and OS limitations.*
- *DBMS: Efficiently scales with built-in indexing for quick data access.*

**Structural Change**
- *Py: Adding/removing attributes requires manual handling per record.*
- *DBMS: Supports adding attributes with default values and constraints for safe removal.*

**Time and Efficiency**
- *Py: Quick for small datasets; slower than DBMS for large datasets. 1GB file: a few seconds.*
- *DBMS: Fast, even for large datasets due to optimized SQL queries. 1GB file: a few milliseconds.*

**Programmer's Productivity**
- *Py: Manual enforcement of data constraints; requires extensive maintenance.*
- *DBMS: Built-in mechanisms ensure data consistency and integrity, reducing programmer effort.*

**Arithmetic Operations**
- *Py: Extensive support for arithmetic and logical operations.*
- *DBMS: Limited support; complex computations must be done externally to SQL.*

**Costs and Complexity**
- *Py: Low-cost setup and maintenance; minimal specialized resources.*
- *DBMS: High initial and ongoing costs; requires dedicated hardware, software, and DBAs.*

### Levels of Abstraction

**Physical Level**
- *Describes how records are stored physically on a storage device*

**Logical Level**
- *Defines data stored in database and relationships among data fields*

**View Level**
- *Application programs hide data type details*
- *Views can hide sensitive information (e.g., employee salary) for security*

### Schema and Instances

- _**Schema**: Defines the structure and organization of data._
- _**Instance**: Represents the actual data values stored in the database at a specific point in time._

### Physical Data Independence

- _**Definition**: Ability to modify the physical schema without altering the logical schema._
- _**Implication**: Applications rely on the logical schema for data access and operations, regardless of changes in how data is physically stored._

### Data Models

- _**Describing**:_
  - *Data*
  - *Data relationships*
  - *Data semantics*
  - *Data constraints*
  
- _**Relational Model**: Organizes data into tables (relations) consisting of rows (tuples) and columns (attributes), linked by keys._
- _**Entity-Relationship Data Model**: Visualizes entities (objects) and their relationships in databases, aiding in database design._
- _**Object-Based Data Models**: Extend relational models to include object-oriented features, supporting complex data types and inheritance._

#### Other Older Models

- _**Network Model**: Organizes data in a graph-like structure with nodes representing records and edges defining relationships._
- _**Hierarchical Model**: Structures data in a tree-like format, where each record has a single parent record and multiple child records._

### Data Definition Language (DDL)
  - *Defines database schema.*

  - *Generates data dictionary.*

#### Data Dictionary

- *Contains metadata (information about data), including:*
  - ***Database schema***
    - *Structure and organization of data.*
  - ***Integrity constraints***
    - *Ensure data accuracy and consistency.*
    - *Example: Primary Key (e.g., ID uniquely identifies instructors).*
  - ***Authorization information***
    - *Controls access to data.*
    - *Determines who can access what.*

### Data Manipulation Language (DML)

- *Language for accessing and manipulating data organized by the data model (query language).*

- *Pure*: *Used for proving computational properties and optimization.*
  - *Relational Algebra (focus in this course)*
  - *Tuple Relational Calculus*
  - *Domain Relational Calculus*

- *Commercial*: Used in commercial systems.
  - *SQL is the most widely used commercial language.*

### Database Design

- **Logical Design**
  - *Defines database schema.*
  - *Involves:*
    - *Choosing relation schemas.*
    - *Deciding attributes distribution.*

- **Physical Design**
  - *Determines physical layout of database.*

### Design Approaches

- **Entity Relationship Model**
  - *Captures business requirements.*
  - *Models enterprise as entities and relationships.*
  - *Diagrammatically represented by an entity-relationship diagram.*

- **Normalization Theory**
  - *Computer Science perspective.*
  - *Formalizes and tests designs for inefficiencies.*

### Object-Relational Data Models

- **Relational Model**
  - *Flat, atomic values.*
  
- **Object-Relational Data Models**
  - *Extend relational model with object-oriented features.*
  - *Support complex attribute types (e.g., non-atomic values, nested relations).*
  - *Preserve relational foundations and declarative data access.*
  - *Ensure upward compatibility with existing relational languages.*

### XML: eXtensible Markup Language

- **Definition**
  - *Defined by the World Wide Web Consortium (W3C).*
  - *Describes name-value pairs using tags.*

- **Original Purpose**
  - *Initially a document markup language, not database-focused.*

- **Data Exchange**
  - *Enables creation of custom tags and structures for data exchange.*
  - *Used widely for data interchange formats beyond documents.*

- **Tools and Support**
  - *Numerous tools for parsing, browsing, and querying XML documents.*

#### Demo XML Data of Students

```xml
<students>
  <student>
    <RollNumber>1</RollNumber>
    <Name>Alice</Name>
    <Gender>Female</Gender>
    <DateOfBirth>2000-03-15</DateOfBirth>
    <Email>alice@example.com</Email>
    <Aadhar>123456789012</Aadhar>
  </student>
  <student>
    <RollNumber>2</RollNumber>
    <Name>Bob</Name>
    <Gender>Male</Gender>
    <DateOfBirth>2001-05-20</DateOfBirth>
    <Email>bob@example.com</Email>
    <Aadhar>234567890123</Aadhar>
  </student>
  <!-- Add more students as needed -->
</students>
```

### Database Engine Components

**1. Storage Manager**
- *Interface between low-level data and applications.*
- *Interaction with OS file manager.*
- *Efficient storage, retrieval, and updates.*
- *Tasks:*
  - *Storage access.*
  - *File organization.*
  - *Indexing and hashing.*

**2. Query Processing**
- *Steps: Parsing, translation, optimization, evaluation.*
- *Cost estimation based on statistical information.*

**3. Transaction Management**
- *Ensures database consistency despite failures.*
- *A transaction performs a single logical function.*
- *Handles system and transaction failures.*
- *Concurrency control manages interactions among concurrent transactions.*



### Database Architecture

**Centralized**
- *Single database server.*
- *Limited scalability and fault tolerance.*
- *Simple setup and management.*

**Distributed**
- *Data distributed across multiple sites or servers.*
- *Improved fault tolerance and scalability.*
- *Complex data management and synchronization.*

**Cloud**
- *Database services provided via cloud infrastructure.*
- *Scalable, flexible, and cost-effective.*
- *Accessed via internet, offers on-demand resources.*

# Activity/Assignment Questions

- **Evolution of data management practices:**
  
  1. *Punched cards for input*
  2. *Hard disks for direct access to data*
  3. *Parallel and distributed databases*
  4. *Data warehousing*

- Both **Sybase**, **Informix**, **MySQL** and **PostgreSQL** are client/server RDBMS.

- **File handling via Python vs DBMS**
  - *File handling via Python takes more time compared to DBMS.*
  - *DBMS is optimized for insert, update, and querying operations.*

- **Ease of handling records**
  - *DBMS is easier for insert, update, and querying compared to Python.*

- **Arithmetic operations**
  - *DBMS has limited support for complex arithmetic operations.*
  - *Python is better for extensive arithmetic and logical operations.*

- **Preferred Applications for DBMS**
  - *Applications with large datasets.*
  - *Applications with concurrent transactions.*

- **Preferred Applications for File Systems**
  - *Applications with small datasets.*
  - *Applications with no dedicated database administrators.*

- **Schemas in database systems**
  - *Describes the overall structure of the database system.*
  - *Similar to the type information of a variable in programming languages.*

- **Instances in database system**
  - *It is the actual content of the database at a particular instant of time.*
  - *Similar to the value of a variable in programming languages.*


- **Levels of Database Abstraction**
  - *Physical level: Describes how records are actually stored.*
  - *Logical level: Describes data and relationships.*
  - *View level: Application programs hide data details.*

- **Logical Data Independence**
  - *Changes at the logical level of DBMS do not affect the view level.*

- **Physical Data Independence**
  - *Changes at the physical level do not affect the Logical and View Levels.*

- **Programmer Working on Database Data Structures**
  - *Working at the Physical Level of abstraction.*

- **Defines and manipulates the schema of a database**
  - *Data Definition Language (DDL)*

- **Creates new records and manipulates existing records in a database**
  - *Data Manipulation Language (DML)*


- **Data Dictionary Content**
  - *Database schema*
  - *Integrity constraints*
  - *Authorizations*

- **Modification of Data Dictionary**
  - *Only commands from Data Definition Language (DDL) can lead to modifications in the Data Dictionary.*

- **Logical Design of a Database**
  - *How should a database be structured to fulfill business requirements?*
  - *What are the attributes to be recorded in the database?*
  - *How the attributes can be distributed among the collection of schemas?*

- **Physical Design of a Database**
  - *What should be the physical layout of the database?*


- **Entity Relationship Model (E-R Model)**
  - *Associated with a diagrammatic representation known as E-R diagram.*
  - *Models a real-world enterprise as collections of entities and relationships.*

- **Normalization Theory**
  - *Tries to formalize and evaluate a design as good or bad, and tests the quality of design.*


- **Database Design Models**
  - *Entity-Relationship Model is widely used in the planning and designing phase of a database system.*

- **Data Model for Dynamic Interconnected Data**
  - *Graph Model is suitable for systems maintaining large sets of complex interconnected data with dynamically changing semantics.*

- **Object-Relational Data Model**
  - *It contains atomic values.*
  - *It allows attributes of tuples to have complex data types, including non-atomic values.*
  - *It provides upward compatibility with existing relational languages.*

- **XML (Extensible Markup Language)**
  - *Defined by the WWW Consortium (W3C).*
  - *Its ability to specify new tags and create nested tag structures made it versatile for exchanging data, not just documents.*
  - *A wide variety of tools are available for parsing, browsing, and querying XML documents/data.*


- **Concurrency Control Manager**
  - *Controls the interaction among concurrent transactions to ensure consistency of the database.*

- **Storage Manager**
  - *Provides the interface between the low-level data stored in the database and application programs.*


- **Complex Arithmetic Computation in DBMS**
  - *DBMS does NOT provide an efficient platform for doing complex arithmetic computations on the data.*
  
- **Creating Access Rules**
  - *It is NOT easier to create access rules in a file system compared to a DBMS.*

- **Data Redundancy**
  - *Storing multiple copies of the same data within the system increases data redundancy.*

- **Handling Exceptions**
  - *try-except blocks in Python are used for handling exceptions.*