# 2. Enforce data consistency with attribute constraints
**After building a simple database, it's now time to make use of the features. You'll specify data types in columns, enforce column uniqueness, and disallow NULL values in this chapter.**

## Better data quality with constraints
### Better data quality with constraints
So far you've learned how to set up a simple database that consists of multiple tables. Apart from storing different entity types, such as professors, in different tables, you haven't made much use of database features. In the end, the idea of a database is to push data into a certain structure – a pre-defined model, where you enforce data types, relationships, and other rules. Generally, these rules are called integrity constraints, although different names exist.

### Integrity constraints
Integrity constraints can roughly be divided into three types.

The most simple ones are probably the so-called **attribute constraints**. For example, a certain attribute, represented through a database column, could have the integer data type, allowing only for integers to be stored in this column. 

Secondly, there are so-called **key constraints**. Primary keys, for example, uniquely identify each record, or row, of a database table. 

Lastly, there are **referential integrity constraints**. In short, they glue different database tables together. 

### Why constraints?
So why should you know about **constraints**? Well, they press the data into a certain form. With good constraints in place, people who type in birthdates, for example, have to enter them in always the same form. Data entered by humans is often very tedious to pre-process. So **constraints give you consistency**, meaning that a row in a certain table has exactly the same form as the next row, and so forth. All in all, they help to solve a lot of data quality issues. While enforcing constraints on human-entered data is difficult and tedious, database management systems can be a great help. In the next chapters and exercises, you'll explore how.

### Data types as attribute constraints
You'll start with attribute constraints in this chapter. In its simplest form, attribute constraints are data types that can be specified for each column of a table. Here you see the beginning of a list of all data types in PostgreSQL.

Name | Aliases | Description
:---|:---|:---
bigint | int8 | signed eight-byte integer
bigserial | serail8 | autoincrementing eight-byte integer
bit \[ (*n*) ] | - | fixed-length bit string
bit varying \[ (*n*) ] | varbit \[ (*n*) ] | variable-length bit string
boolean | bool | logical Boolean (ture/false)
box | - | rectangular box on a plane
bytea | - | binary data ("byte array")
... | ... | ...
cidr | - | IPv4 or IPv6 network address


There are basic data types for numbers, such as `bigint`, or strings of characters, such as `character varying`. There are also more high-level data types like `cidr`, which can be used for IP addresses. Implementing such a type on a column would disallow anything that doesn't fit the structure of an IP.

### Dealing with data types (casting)
Data types also restrict possible SQL operations on the stored data. For example, it is impossible to calculate a product from an integer *and* a text column, as shown here in the example. 
```sql
CREATE TABLE weather (
    temperature integer,
    wind_speed text);
SELECT temperature * wind_speed AS wind_chill
FROM weather;
```
The text column `wind_speed` may store numbers, but PostgreSQL doesn't know how to use text in a calculation.
```
operator does not exist: interger * text
HINT: No operator matches the given name and argument type(s).
You might need to add explict type casts.
```

The solution for this is type casts, that is, on-the-fly type conversions. In this case, you can use the `CAST` function, followed by the column name, the AS keyword, and the desired data type, and PostgreSQL will turn `wind_speed` into an integer right before the calculation.
```sql
SELECT temperature * CAST(wind_speed AS integer) AS wind_chill
FROM weather;
```

## Types of database constraints
Which of the followings are used to enforce a database constraint?
1. **Foreign keys**
2. ~SQL aggregate functions~
3. **The BIGINT data type**
4. **Primary keys**

*SQL aggregate functions are not used to enforce constraints, but to do calculations on data.*

## Conforming with data types
For demonstration purposes, I created a fictional database table that only holds three records. The columns have the data types `date`, `integer`, and `text`, respectively.
```sql
CREATE TABLE transactions (
 transaction_date date, 
 amount integer,
 fee text
);
```
Have a look at the contents of the `transactions` table.

The `transaction_date` accepts `date` values. According to [the PostgreSQL documentation](https://www.postgresql.org/docs/10/datatype-datetime.html#DATATYPE-DATETIME-INPUT), it accepts values in the form of `YYYY-MM-DD`, `DD/MM/YY`, and so forth.

Both columns `amount` and `fee` appear to be numeric, however, the latter is modeled as `text`.

- Execute the given sample code.
- As it doesn't work, have a look at the error message and correct the statement accordingly – then execute it again.

```sql
-- Let's add a record to the table
INSERT INTO transactions (transaction_date, amount, fee) 
VALUES ('2018-24-09', 5454, '30');

-- Doublecheck the contents
SELECT *
FROM transactions;
```

```
date/time field value out of range: "2018-24-09"
LINE 3: VALUES ('2018-24-09', 5454, '30');
                ^
HINT:  Perhaps you need a different "datestyle" setting.
```

```sql
-- Let's add a record to the table
INSERT INTO transactions (transaction_date, amount, fee) 
VALUES ('2018-09-24', 5454, '30');

-- Doublecheck the contents
SELECT *
FROM transactions;
```


```
transaction_date | amount | fee
-----------------|--------|----
1999-01-08       | 500    | 20
2001-02-20       | 403    | 15
2001-03-20       | 3430   | 35
2018-09-24       | 5454   | 30
```

*You can see that data types provide certain restrictions on how data can be entered into a table. This may be tedious at the moment of insertion, but saves a lot of headache in the long run.*

## Type CASTs
Type casts are a possible solution for data type issues. If you know that a certain column stores numbers as `text`, you can cast the column to a numeric form, i.e. to `integer`.
```sql
SELECT CAST(some_column AS integer)
FROM table;
```
Now, the `some_column` column is temporarily represented as `integer` instead of `text`, meaning that you can perform numeric calculations on the column.

- Execute the given sample code.
- As it doesn't work, add an `integer` type cast at the right place and execute it again.

```sql
-- Calculate the net amount as amount + fee
SELECT transaction_date, amount + fee AS net_amount 
FROM transactions;
```

```
operator does not exist: integer + text
LINE 2: SELECT transaction_date, amount + fee AS net_amount 
                                        ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
```

```sql
-- Calculate the net amount as amount + fee
SELECT transaction_date, CAST(amount AS integer) + CAST(fee AS integer) AS net_amount 
FROM transactions;
```

```
transaction_date | net_amount
-----------------|-----------
1999-01-08       | 520
2001-02-20       | 418
2001-03-20       | 3465
2018-09-24       | 5484
```

*Sometimes, type casts are necessary to work with data. However, it is better to store columns in the right data type from the first place.*

## Working with data types
Working with data types is straightforward in a database management system like PostgreSQL.

### Working with data types
As said before, data types are attribute constraints and are therefore implemented for single columns of a table. They define the so-called "domain" of values in a column, that means, what form these values can take – and what not. Therefore, they also define what operations are possible with the values in the column, as you saw in the previous exercises. Of course, through this, consistent storage is enforced, so a street number will always be an actual number, and a postal code will always have no more than 6 digits, according to your conventions. This greatly helps with data quality.

### The most common types
Here are the most common types in PostgreSQL. Note that these types are specific to PostgreSQL but appear in many other database management systems as well, and they mostly conform to the SQL standard. 

The `text` type allows characters strings of any length, while the `varchar` and `char` types specify a maximum number of characters, or a character string of fixed length, respectively. You'll use these two for your database. The `boolean` type allows for two boolean values, for example, `true` and `false` or `1` and `0`, and for a third unknown value, expressed through `NULL`.

Then there are various formats for `date` and `time` calculations, also with `timezone` support. `numeric` is a general type for any sort of numbers with arbitrary precision, while `integer` allows only whole numbers in a certain range. If that range is not enough for your numbers, there's also `bigint` for larger numbers.

### Specifying types upon table creation
Here's an example of how types are specified upon table creation. Let's say the social security number, `ssn`, should be stored as an integer as it only contains whole numbers. The name may be a string with a maximum of 64 characters, which might or might not be enough. The date of birth, `dob`, is naturally stored as a date, while the average grade is a numeric value with a precision of 3 and a scale of 2, meaning that numbers with a total of three digits and two digits after the fractional point are allowed. Lastly, the information whether the tuition of the student was paid is, of course, a boolean one, as it can be either true or false.
```sql
CREATE TABLE students (
    ssn integer,
    name varchar(64),
    dob date,
    average_grade numeric(3, 2), --e.g. 5.54
    tuition_paid boolean
);
```

### Alter types after table creation
Altering types after table creation is also straightforward, just use the shown `ALTER TABLE ALTER COLUMN` statement. 
```sql
ALTER TABLE students
ALTER COLUMN name
TYPE varchar(128)
```

In this case, the maximum name length is extended to 128 characters. Sometimes it may be necessary to truncate column values or transform them in any other way, so they fit with the new data type. Then you can use the `USING` keyword, and specify a transformation that should happen before the type is altered. Let's say you'd want to turn the `average_grade` column into an integer type. Normally, PostgreSQL would just keep the part of the number before the fractional point. With `USING`, you can tell it to round the number to the nearest integer, for example.
```sql
ALTER TABLE students
ALTER COLUMN average_grade
TYPE integer
-- Turns 5.54 into 6, not 5, before type conversion
USING ROUND(average_grade)
```

## Change types with ALTER COLUMN
The syntax for changing the data type of a column is straightforward. The following code changes the data type of the `column_nam`e column in `table_name` to `varchar(10)`:
```sql
ALTER TABLE table_name
ALTER COLUMN column_name
TYPE varchar(10)
```
Now it's time to start adding constraints to the database.

- Have a look at the distinct `university_shortname` values in the `professors` table and take note of the length of the strings.

```sql
-- Select the university_shortname column
SELECT DISTINCT(university_shortname) 
FROM professors;
```

```
university_shortname
--------------------
ULA
UNE
EPF
UBA
USG
UBE
UZH
UGE
UFR
USI
ETH
```

- Now specify a fixed-length character type with the correct length for `university_shortname`.

```sql
-- Specify the correct fixed-length character type
ALTER TABLE professors
ALTER COLUMN university_shortname
TYPE char(3);
```

- Change the type of the `firstname` column to `varchar(64)`.

```sql
-- Change the type of firstname
ALTER TABLE professors
ALTER COLUMN firstname
TYPE varchar(64);
```

## Convert types USING a function
If you don't want to reserve too much space for a certain `varchar` column, you can truncate the values before converting its type.

For this, you can use the following syntax:
```sql
ALTER TABLE table_name
ALTER COLUMN column_name
TYPE varchar(x)
USING SUBSTRING(column_name FROM 1 FOR x)
```
You should read it like this: Because you want to reserve only `x` characters for `column_name`, you have to retain a `SUBSTRING` of every value, i.e. the first `x` characters of it, and throw away the rest. This way, the values will fit the `varchar(x)` requirement.

- Run the sample code as is and take note of the error.
- Now use `SUBSTRING()` to reduce `firstname` to 16 characters so its type can be altered to `varchar(16)`.

```sql
-- Convert the values in firstname to a max. of 16 characters
ALTER TABLE professors 
ALTER COLUMN firstname 
TYPE varchar(16)
USING SUBSTRING(firstname FROM 1 FOR 16)
```

*However, it's best not to truncate any values in your database, so we'll revert this column to `varchar(64)`.*

---
## The not-null and unique constraints
In the last part of this chapter, you'll get to know two special attribute constraints: the not-null and unique constraints.

### The not-null constraint
As the name already says, the not-null constraint disallows any `NULL` values on a given column. This **must hold true for the existing state of the database**, but **also for any future state**. Therefore, you can only specify a not-null constraint on a column that doesn't hold any `NULL` values yet. And: It won't be possible to insert `NULL` values in the future.

### What does NULL mean?
Before I go on explaining how to specify not-null constraints, I want you to think about `NULL` values. What do they actually mean to you? There's no clear definition. `NULL` can mean a couple of things, for example, that the value is unknown, or does not exist at all. It can also be possible that a value does not apply to the column. Let's look into an example.

### What does NULL mean? An example
Let's say we define a table `students`. 
```sql
CREATE TABLE students (
	ssn integer not null,
	lastname varchar(64) not null,
	home_phone integer,
	office_phone integer
);
```

The first two columns for the social security number and the last name cannot be `NULL`, which makes sense: this should be known and apply to every student. The `home_phone` and `office_phone` columns though should allow for null values – which is the default, by the way. Why? First of all, these numbers can be unknown, for any reason, or simply not exist, because a student might not have a phone. Also, some values just don't apply: Some students might not have an office, so they don't have an office phone, and so forth. So, one important take away is that two `NULL` values must not have the same meaning. This also means that comparing `NULL` with `NULL always results in a `FALSE` value.

### How to add or remove a not-null constraint
You've just seen how to add a not-null constraint to certain columns when creating a table. 
```sql
ALTER TABLE students
ALTER COLUMN home_phone
SET NOT NULL;
```
Just add `not null` after the respective columns. But you can also add and remove not-null constraints to and from existing tables. To add a not-null constraint to an existing table, you can use the `ALTER COLUMN SET NOT NULL` syntax as shown here. Similarly, to remove a not-null constraint, you can use `ALTER COLUMN DROP NOT NULL`.
```sql
ALTER TABLE students
ALTER COLUMN ssn
SET NOT NULL;
```

### The unique constraint
The unique constraint on a column makes sure that there are no duplicates in a column. So any given value in a column can only exist once. This, for example, makes sense for university short names, as storing universities more than once leads to unnecessary redundancy. However, it doesn't make sense for university cities, as two universities can co-exist in the same city. Just as with the not-null constraint, you can only add a unique constraint if the column doesn't hold any duplicates before you apply it.

### Adding unique constraints
Here's how to create columns with unique constraints. Just add the `UNIQUE` keyword after the respective table column. 
```sql
CREATE TABLE table_name (
	column_name UNIQUE
);
```
You can also add a unique constraint to an existing table. For that, you have to use the `ADD CONSTRAINT` syntax. This is different from adding a `NOT NULL` constraint. 
```sql
ALTER TABLE table_name
ADD CONSTRAINT some_name UNIQUE(column_name);
```
However, it's a pattern that frequently occurs. You'll see plenty of other examples of `ADD CONSTRAINT` in the remainder of this course.


## Disallow NULL values with SET NOT NULL
The `professors` table is almost ready now. However, it still allows for `NULL`s to be entered. Although some information might be missing about some professors, there's certainly columns that *always* need to be specified.

- Add a not-null constraint for the `firstname` column.

```sql
-- Disallow NULL values in firstname
ALTER TABLE professors 
ALTER COLUMN firstname SET NOT NULL;
```

- Add a not-null constraint for the `lastname` column.

```sql
-- Disallow NULL values in lastname
ALTER TABLE professors
ALTER COLUMN lastname SET NOT NULL
```

## Make your columns UNIQUE with ADD CONSTRAINT
As seen in the video, you add the `UNIQUE` keyword after the `column_name` that should be unique. This, of course, only works for *new* tables:
```sql
CREATE TABLE table_name (
 column_name UNIQUE
);
```
If you want to add a unique constraint to an *existing* table, you do it like that:
```sql
ALTER TABLE table_name
ADD CONSTRAINT some_name UNIQUE(column_name);
```
Note that this is different from the `ALTER COLUMN` syntax for the not-null constraint. Also, you have to give the constraint a name `some_name`.

- Add a unique constraint to the `university_shortname` column in `universities`. Give it the name `university_shortname_unq`.

```sql
-- Make universities.university_shortname unique
ALTER TABLE universities
ADD CONSTRAINT university_shortname_unq UNIQUE(university_shortname);
```

- Add a unique constraint to the `organization` column in `organizations`. Give it the name `organization_unq`.

```sql
-- Make organizations.organization unique
ALTER TABLE organizations
ADD CONSTRAINT organization_unq UNIQUE(organization)
```

*Making sure `universities.university_shortname` and `organizations.organization` only contain unique values is a prerequisite for turning them into so-called primary keys.*