Databases today are essential to every business.

The power of databases:
- Comes from a body of knowledge and technology that has developed over several decades
- is embodied in a specialized software package

Database and communication between the nodes - **core components** of the most complex networked systems.

## 1. What is a Database System?

A **Database system** (DBS) consists of a **Database** (DB) and a **Database Management System** (DBMS).

- A Database is a (typically very large) integrated collection of interrelated data which are stored in files.
    * Data describe information and activites about one or more related organizations (part of the real world).
- A Database Management System
    * is a collection of software packages designed to store, access, and manage databases.
    * It provides users and applications with an environment that is convenient and efficient to use.

![alt](Pictures\1_1.png)

## 2. Purpose of a Database System

- **Data integration** - All data are uniformly managed.
- **Efficient Data Access** - Database languages are provided to store, access and manage data.
- **Data Dictionary** - Contains all data about objects and structures of the database (metadata)
- **User/Application-Views** - Different views for different users and applications
- **Integrity Constraints** - are enforced by the DBMS
- **Security Mechanisms** - to protect data from security threats
- **Transactions** - combine sets of operations on data into logical units
- **Synchronization** of current user transactions
- **Recovery** of data after system crach
- **ad-hoc** queries, report generation, interfaces to other database systems, interfaces for application programming

## 3. History of Relational Database Systems

- 1970: Ted Codd(IBM) - relational model as conceptual foundation of relational DBS
- 1974: System R(IBM) - first prototype of a RDBMS
    - only two modules: RDS (optimizing SQl processor), RSS (access method);
      approx. 80.000 Lines of Code (PL/I, PL/S, Assembler), approx. 1.2MB code size
    - query language SEQUEL
    - first installation 1977
- 1975: University of California at Berkeley (UCB) - Ingres
    - query language QUEL
    - predecessor of Postgres, Sybase,...
- 1979: Oracle Version 2
- Today: 24 Bill $ market

## 4. Different Views of Data

A major purpose of a DBMS is to provide uses with an abstract view of data, i.e. it hides details of how data are stored and maintained on a computer.

![alt](Pictures\1_2.png)

-  **Physical Level**
Describes how data records are actually stored and how files and indexes are organized and used

-  **Logical Level**
Sometimes also called **Conceptual Level**, describes what data are stored in the DBS in terms of entitites and relationships;
Emplahsis on logical structure of the database

-  **View Level**
Describes how users and applications see the data.


Abstraction is achieved by describing each level in terms of a **database schema**, which, in turn, is based on a data model.


## 5. Data Models, Schemes, and Instances

A **Data Model** is a collection of concepts for describing
- data and relationships among data
- data semantics and data constraints
    
Object-Based logical Models:
- Entity-Relationship (ER) Model
- Object-Oriented (OO) Model

Record-Based logical Models
- Relational Model
- Network Model
- Hierarchical Model

A database **schema** is a descriptions of a particular collection of data, using a given data **model**.

An **instance** of a database schema is the actual content of the database at a particular point in time.

Schemas exist at dfferent levels of abstraction:
1. **Physical Schema**: storage structures associated with relations
2. **Conceptual (Logical) Schema**: typically builds the basis for designing a database.
3. **View (External) Schemas**: typically determinated during requirements analysis (often require integration into one conceptual schema)

## 5.1 Data independence

Ability to modify definition os schema at one level without affecting a schema definition at a higher level.

Achieved through the use of three levels of data abstraction (also called **three level schema architecture**):
- Logical Data Independence: ability to modify logical schema without causing application programs to be rewritten.
- Physical Data Independence: ability to modify physical schema without causing logical schema or applications to be rewritten (occasionally necessary to improve performance)

![alt](Pictures\1_3.png)

## 6. Database Languages

A Databse Management System (DMBS) offers two different types of languages (for the user):

### 6.1 Data Definition Language (DDL)
Specification language (notation) for defining a database schema; includes syntax and semantics.

DDL compiler generates set of tables stored in the DBMS's data dictionary (contains metadata, i.e. data about data)

Data storage and definition language - special type of DDL in which storage structures and access methods used by the DBS are specified.


### 6.2. Data Manipulation Language (DML)
Language for **accessing** and **manipulating** the data that is organized according to underlying data model

Two classes of languages:
- Procedural - user specifies how required data is retrieved
- Declarative - user specifies what data is required without specifying how to get those data

### 6.3. Data Administration

## 7.Database Design

1. Requirements Analysis
2. Conceptual Design
3. Logical Design
4. Data Definition
5. Physical Design
6. Implementation and Maintenance

![alt](Pictures\1_4.png)

## 7.1. Design steps by example

### - Requirements Analysis
![alt](Pictures\1_5.png)

### - Conceptual Design

![alt](Pictures\1_6.png)

### - Logical Design

![alt](Pictures\1_7.png)

### - Data Definition

![alt](Pictures\1_8.png)

## Exercises/ Questions - 1

### Question 1.
- We can store (very) large collections of data in a traditional file system or in a database system. What are the distinctions (advantages and disadvantages)?

### Answer
Advantages of DBs:
- Uniform access to data
- Incapsulation of data storage and maintainance tasks
- Special algorithms for data manipulations and searching
- Ad-hoc commands, easy-to-use syntax.

Disadvantages of DBs:
- Inner logic may be specific to a DBMS
- Require extra metadata
- Less flexibility
- Require extra resources for running database
- Perfomance may degradate in case of very large files
- Cannot store (actually, can, but poorly) binary data


Advantages of file system storage
- Very high (usually) perfomance
- High compability
- High flexibility
- Also hides physical level of data

Disadvantages of file system storage:
- No instrumets for data analysis and manipulation, a user has to implement it himself
- Higher possibility to break the system
- Low-level (return binary data) access to data


#### Summary

Comparing a traditional file system and a database system, we should remember, that database systems are built upon a file system, it utilizes the power of file access, provided at a lower level. It is better to use file systems if you have a binary data and need to implement your ows algorithm to access the data. It would be more useful to use DBs, if you want to store logical data, like integers, strings, objects, also storing its semantic, relations; also you need tools for maintaining these data, and don't have time/intention to do it by yourself.

## 8. Conceptual Database Design

The first step is the abstract representation of the structure of a database

Questions that are addressed during conceptual design:

- What are **the entities and relationships** of interest (mini-world)?
- **What information about entities and relationships** among entities need to be stored in the database?
- What are the **contraints** (or business rules) that (must) hold for the entities and relationships?

Design is independent of all physicall considerations (DBMS, OS, ...).

### 8.1. Entity-Relationship Data Model

The most common model for **Conceptual** Database Design is the **entity-relationship** model.

#### Entity-Relationship model (ER model)
- 1976 P.P.Chen
- "The Entity-Relationship Model - Toward a Unified View of Data"
- Today there are many extensions of the model

A database schema in the ER model can be represented pictorially (**Entity-Relationship diagram**)

#### 8.1.1. Entities

![alt](Pictures\1_9.png)

Entity $e_i$:
- real-world object or thing with an independent existence and which is distinguishable from other objects.


#### 8.1.2. Attributes

Attribute $a_i$:
- An entity is represeted by a set of attributes (its descriptive **properties**), e.g. name, age, salary, price etc.
- Attribute **values** that describe each entity become a **major part of** the dat aeventyally stored in a databse.

Types:
- simple (atomic) or composite
- single-valued or multi-valued
- stored or derived
- each of the above can be an *optional* attribute, in case an entity may not have an applicable value for an attribute -> NULL.

With each attribute a **domain** is associated, i.e., a set of permitted values for an attribute.

Possible domains are *number, string, date* etc.

#### 8.1.3 Entity Types and Sets

*Entity Type E* - Collection of entities that all have the same attributes, e.g. persons, cars, customers etc.

*Entity Set E* - Collection of entities of a particular entity type at any point in time; entity is typically referred to using the same name as entity type.

![alt](Pictures\1_10.png)

#### 8.1.4. Key attributes

Entities of an entity type need to be distinguishable.

A **key** of an entity type $E$ is a set $K$ of one or more attributes whose values uniquely determine each entity in an entity set.
- For given any two distinct entitites $e_1$ and $e_2$ in $E$, $e_1$ and $e_2$ cannot have identical values for each of the attributes in the key $K$.
- It is possible for $e_1$ and $e_2$ to agree in some of this attributes, but never in all attributes.

There also can be **more then one possible** key for an entity set. Then it is cutomary to pick one key as the 'preffered key', and to act as if that were the only key.

If there are not attributes with the key property, we define an **artificial key**.

#### 8.1.5. Relationship,Types, and Sets

##### 8.1.5.1 Relationship Type $R$
- describes a set of similar relationships
- An $n$-ary relationship type $R$ links $n$ entity types $E_1$, ...,$E_n$.


![alt](Pictures\1_11.png)

---
### Contraints on Relationship Types

Limit the number of possible combinations of entities that may participate in a relationship set.

There are two types of constraints:
- **cardinality ratio**
- **participation constraints**


## Cardinality Constraints

Types of **cardinality ratio** for binary relationships:

- **Many-to-Many, N:M**

![alt](Pictures\1_14.png)

" An employee can work in many departments ($\ge0$), and a department can have several employees."

- **Many-to-One, N:1**

Often called a **functional relationship**

![alt](Pictures\1_15.png)

"An employee can work in at most one department ($\le1$), and a department can have several employees."


- **One-to-Many, 1:N**

![alt](Pictures\1_16.png)

The reverse version of **N:1** relationship

- **One-to-One, 1:1**

![alt](Pictures\1_17.png)

Instaed of a cardinality ratio or participation constraint, more precise **cardinalities** can be associated with relationship types:

![alt](Pictures\1_20.png)

Each entity $e_1 \in E_1$ must participate in relationship set $R$ **at least** $min_1$ and **at most** $max_1$ times (analogous for $e_2 \in E_2$.

![alt](Pictures\1_21.png)

--- 

## Participation constraint

Specifies whether the existence of an entity $e \in E$ depends on being related to another entity via the relationship type $R$.

- **total**: each entity $e \in E$ _must_ participate in a relationship, it *cannot exist without* that participation (total participation aka existence dependency).

![alt](Pictures\1_18.png)

- **total**: default; each entity $e \in E$ can participate in a relationship

##### 8.1.5.2 Relationship (instance)

- Association among two or more entities, e.g. "customer 'Bill' orders product 'SkyPone' ".
- Each relationship in a relationship set $R$ of relationship type involves entities 

$$e_1 \in E_1,..., e_n \in E_n$$
$$R \subseteq {(e_1, ..., e_n) | e_1 \in E_1, ... , e_n \in E_n}$$

where $(e_1, ..., e_n)$ is a relationship.

![alt](Pictures\1_22.png)

## Weak Entity Sets

Existence of the weak entities depends on the existence of their owner entity.

Entity type, at whose identification (key) is involved a relationship (N:1) to an owner entity type.

- partial key - attribute(s) for identification of the weak entities relating to *one* owner entity
- full key = partial key of the weak entities + key of the owner entity set
![alt](Pictures\1_19.png)

##### 8.1.5.3 Degree of a relationship

Refers to the number of entity types that participate in the relationship type (binary, ternary, ...).

##### 8.1.5.4 Roles

The same entity type can participate more than once in a relationship type.

![alt](Pictures\1_12.png)


Role labels clarify semantics of a relationship, i.e. the way in which an entity participates in a relationship.

##### 8.1.5.5 Relationship Attributes

A relationship type can have attributes describing properties of a relationship.

- "customer 'Bon' ordered product 'SkyPhone' on October 15, 2014, for $650"
- These are attributes that cannot be associated with participating entities only, e.y., they make only sense in the context of a relationship.

Note that a relationship does not have key attributes. The identification of a particular relationship in a relationship set occurs through the keys of participating entities.

#### 8.2. Example of an Entity-Relationship Diagram

![alt](Pictures\1_13.png)

- **Rectangles** - entity types
- **Ellipses** - attributes
- **Diamonds** - relationship types
- **Lines** - link attributes to entity types and entity types to relationship types
- **Key** - attributes are underlined
- **Empty Circle** - represents an optional (null) attribute
- **Double Ellipses** - multi-valued attributes

## Handout / Exercise - 2

It is possible to define entities and their relationships in a number of different ways (in the same model)

- Shoud a real world concept be modeled as an entity type, attribute, or relationship type? Concept shoud be modeled as an entity type, because it reflects a real-world object and has information about the concept's characteristic.

- Is "Address" in attribute or an entity type? Depends upon the use one wants to make of address information.

![alt](Pictures\1_23.png)
- Here a supplier cannot offer the same product for different prices, why? Because Price is a Relationship Attribute, the derived key from Supplier and Product cannot distinguish the product by different price: Offers(Supplier_key, Product_key, Price)


Modeling price as an entity type resolves the problem resolves the problem - Offers(Supplier_Key, Product_Key, Price_Key); Price(Price_Key, Amount)


![alt](Pictures\1_24.png)



## 9. Steps in Designing an Entity-Relationship Schema

1. Identify entity types (entity type vs. attribute)
2. Identify relationship types
3. Identify and associate attributes with entity and relationship types
4. Determine attribute domains
5. Determine key attributes for entity types
6. Associate (refined) cardinality ratio(s) with relationship types