### <span style="color:darkred">**Chapter 1 - Doing It With Tables (1-17)**</span>
---

#### <span style="color:blue">**Introduction**</span>

We are going to learn about the concepts, principles, techniques, and practices of database design (**DBD**), so that we can put to more effective use the "tools of the trade" - which include database management systems (**DBMS**), SQL, and stored procedures.

- DBMSs are essential tools for building most applications that store and retrieve data and for managing all the details of where to put the bits and bytes and how to get them out again.

- We are also going to dive into enough detail such that senior technology managers who want to understand the subject more deeply can then make better decisions about the architecture of complex systems of which relations DBs are a part.

- Today, databases (**DB**) underlie almost all significant business and professional computer applications. Apart from a few non-relational DBs, developed for high-volume transactions processing, most of the new DBs created since the mid-1980s have been relational DBs.

- Although you may read about DB systems based on "new" models for "big data", these complement, rather than replace, relational DBMSs.

> <span style="color:brown">*What DB models are optimized for AI?*</span>

- Prior to relational DBs, there were two data models:
  - Hierarchical model
  - Network model

#### <span style="color:blue">**The Early Days of DBMSs**</span>

DBMSs have existed since the 1960s. In those early days, DBs were based on hierachical and network models.

- Hierarchical models were path dependent in terms of the means for getting from a starting point to an ending point in a data chain and thus this path dependency created ambiguity in the logic needed to extract information from the model.

- A lot of thought had to go into what queries users might make of the DB. If the designer got the model wrong, the user would have great difficulty in getting to the required data.

#### <span style="color:blue">**The Relational Model**</span>

Relational DBMSs represent the third type of model and the one that is almost universally accepted today.

The relational DB was introduced by **Edgar Frank Codd** in 1970 in high paper A Relational Model of Data for Large Shared Data Banks, *Communications of the ACCM, Volume 13, Number 6, June 1970* pp. 377-387.

- In Codd's original paper what we call tables, columns, and rows today were referred to as relation, degree, and tuple respectively.

#### <span style="color:blue">**Relational DBMS Products Emerge**</span>

It took about 12 years after Codd's original paper for reliable DBMS software to be written, debugged, and brought to market. Two of the first DBMSs to be adopted by business in the early 1980s were **Ingres** and **Oracle**.

For over a decade after that, the old "network" DBMSs continued to be widely used for existing busienss applications because relational DBMSs consumerd a lot of computing power.

To this day, only ***relational*** from Codd's original terminology continues to be used when discussing DBMSs where relational means "based on relations" or more simply "based on tables".

A "query" tells the DB to make associations between the columns of two or more tables at the time the query is executed.

- There is a subtle but important point here: connections between tables exist in the mind of the DB designer and DB users; but, for the DBMS, these connections do not exist outside the context of the quereies.

<h7> <span style="color:blue"><b>Query Languages</b></span> </h7>

After defining the relational DB, Codd next defined a query language that users would use to interact with relations DBs.

- However, because the data in a relational DB was to be stored in a radically different way, the relational model required a completely new query language.

A query in its function to retrieve data operates in one of two ways:

- **Key-based** queries that retrieve rows based on filters against key columns.
- **Non-key-based** queries that retrieve rows based on filters against non-key columns.

Because retrieval of data according to values in non-key columns cannot take advantage of indexes, non-key-based queries place a heavier processing load on the DBMS relative to key-based queries and take longer to execute.

- Relational DBMSs internally generate sorting indexes on key columns and therefore DB users are not aware of their existence.
- If users are found to be performing a lot of non-key-based queries, DB admins can decide to instruct the DBMS to generate a secondary set pf sorting indices for those non-key columns.

In the early days query languages developed in two modes:

- Those using **relational algebra** that implemented procedural methods to extract information.
- Those using **relational calculus** that implemented declarative method to extract information. This mode became preferred because putting the way queries were executed entirely in the hands of the user turned out to be a poor choice.

<h7> <span style="color:blue"><b>DBMSs Today</b></span> </h7>

<h7> <span style="color:blue"><b>Schemas, Subschemas, and Views</b></span> </h7>

<h7> <span style="color:blue"><b>Normalization</b></span> </h7>

<h7> <span style="color:blue"><b>The Story So Far</b></span> </h7>

<h7> <span style="color:blue"><b>Summary of Jargon</b></span> </h7>

<h4> <span style="color:darkred"><b>Chapter 2 - Thinking About Data More Clearly (18-35)</b></span> </h4>

<h7> <span style="color:blue"><b>The Entity-Relationship Model (E-R Model)</b></span> </h7>

<h7> <span style="color:blue"><b>Key Selection</b></span> </h7>

<h7> <span style="color:blue"><b>Attributes of Relationships</b></span> </h7>

<h7> <span style="color:blue"><b>Summary of Jargon (Part 2)</b></span> </h7>

<h7> <span style="color:blue"><b>Drawing an E-R- Diagram</b></span> </h7>

<h7> <span style="color:blue"><b>Using an E-R Diagram to Design a Database</b></span> </h7>

<h7> <span style="color:blue"><b>Naming Tables</b></span> </h7>

<h7> <span style="color:blue"><b>Normalization</b></span> </h7>

<h7> <span style="color:blue"><b>Creating the Database</b></span> </h7>

<h7> <span style="color:blue"><b>A Brief Note on the Writing Conventions Used in this Book</b></span> </h7>

<h7> <span style="color:blue"><b>Case-Sensitivity in DBMSs</b></span> </h7>

<h7> <span style="color:blue"><b>The Database Design Process Summarized</b></span> </h7>

<h4> <span style="color:darkred"><b>Chapter 3 - Nulls, Keys, and Cardinality (36-43)</b></span> </h4>

<h7> <span style="color:blue"><b>The Final Steps in Creating Tables</b></span> </h7>

<h7> <span style="color:blue"><b>NULL and NOT NULL Constraints</b></span> </h7>

<h7> <span style="color:blue"><b>Designation of Key Columns</b></span> </h7>

<h7> <span style="color:blue"><b>Cardinality of Relationships</b></span> </h7>

<h4> <span style="color:darkred"><b>Chapter 4 - Normalization of Relational Database Designs (44-54)</b></span> </h4>

<h7> <span style="color:blue"><b>Normalization: an Art, not a Science</b></span> </h7>

<h6>$\hspace{10pt}$ Normalization is not a "mechanical" exercise - nobody has managed to write a program non-normalized database specifications as inputs (table names, column names for key / non-key attributes) with the output being a fully normalized database.<br><br>
$\hspace{10pt}$ Normalization is an intellectual exercise in that it requires an understanding of the meaning of the data entities and relationships.<br><br>
$\hspace{10pt}$ Whenever you see tables with non-key attributes, have to think hard about whether each non-key attribute is fundamentally an attribute of the entities in the entity set from which the table was derived.<br>
$\hspace{10pt}$ Furthermore, you also have to think hard about the subtle rules that govern the values that attributes may take.</h6>

<h7> <span style="color:blue"><b>A Brief History of Normalization</b></span> </h7>

<h6>$\hspace{10pt}$ Codd identified the problem of anomalies arising from insertions and deletions (and, in some special cases, updates).<br><br>
$\hspace{10pt}$ Eleven years later, Ronald Fagin defined a condition called Domain/Key Normal Form (DK/NF) and showed that DB normalized according to DK/NF cannot exhibit insertion/deletion anomalies (modification anomalies).<br><br>
$\hspace{10pt}$ According to Codd, 1NF is the least normalized table. At a minimum all an 1NF table needs is a key column(s). Codd continued by adding 2NF through 5NF.<br>
$\hspace{10pt}$ Furthermore, you also have to think hard about the subtle rules that govern the values that attributes may take.</h6>

<h7> <span style="color:blue"><b>Learning to Perform Normalization</b></span> </h7>

<h7> <span style="color:blue"><b>Good-Enough Normalization</b></span> </h7>

<h7> <span style="color:blue"><b>The Rule Most Often Broken</b></span> </h7>

<h7> <span style="color:blue"><b>Practicing Normalization</b></span> </h7>

<h4> <span style="color:darkred"><b>Chapter 5 - Some Practical Aspects of Using a DBMS (55-70)</b></span> </h4>

<h7> <span style="color:blue"><b>Introduction</b></span> </h7>

<h7> <span style="color:blue"><b>Classes of Database Users</b></span> </h7>

<h6> The value of most DBs lie in the fact that the data contained in them (a) is a result of actions of many users and (b) is made available to many users. </h6>
<h6> In practice, most DB "users" are unaware that they are interacting with a shared DB, because they interact with application software, which in turn interacts with the DB. </h6>
<h6> Application software essentially "talks" to the DB using SQL. </h6>
<h6> Because DBs can have many users, one important characteristic of all major DBMSs is that they have a means of recognizing users and granting to those users the rights that are appropriate for their roles. </h6>
<h6> Users who access a DB directly would typically have rights to read, add, delete, and update rows in the DB. </h6>
<h6> Users who access a DB indirectly via application software, will typically not have individual accounts on the DBMS. </h6>
<h6> Indirect users interact via an application software layer in which the application software is granted access to the DB via a generic account. </h6>

<h6> Read-only is the lowest level of privilege that can be given to a DB user or application software. </h6>
<h6> Various hands-on users include: </h6>
<h6> 1) The DB designer - who are typically given fairly high level privileges on development DBs but reduced privileges to the corresponding production DBs. </h6>
<h6> 2) The DB administrator - who are responsible for keeping the DB is good working order and are typically given the highest level of privileges. </h6>
<h6> 3) Application programmers - who write and maintain application software that interacts with the DBs have privileges similar to DB designers. </h6>

<h7> <span style="color:blue"><b>Stored Procedures</b></span> </h7>

<h7> <span style="color:blue"><b>Interacting with a DBMS</b></span> </h7>

<h7> <span style="color:blue"><b>Creating Tables</b></span> </h7>

<h7> <span style="color:blue"><b>Foreign Keys</b></span> </h7>

<h7> <span style="color:blue"><b>Table Aliases</b></span> </h7>

<h6> A table alias is a name (typically a very short one) that a user, or a stored procedure programmer, assigns to a table within a query or a stored procedure. </h6>
<h6> Table aliases make it needlessly hard for someone else to follow how an SQL command or stored procedure works, making maintenance unnecessarily difficult. </h6>
<h6> Table aliases are particularly confusing when different programmers use different aliases for the same table. </h6>
<h5> <span style="color:red">Don't use table aliases.</span> </h5>