# Database roles

- Manage database access permissions
- Manage authentication system
- Roles can be assigned to one or more users (user / group)
- Roles are global across a database cluster
- Roles live on after users are deleted
- Roles can be created before user accounts
- Save DBAs time
- Sometimes a role gives a specific user too much access

# Create a role

### Empty role creation

`CREATE ROLE role_name;`

### Create a user role with password and validity

`CREATE ROLE role_name WITH PASSWORD 'PasswordForIntern' VALID UNTIL '2020-01-01';`

### Create a role that has create database attribute

`CREATE ROLE role_name CREATEDB;`

### Alter a role so that it has create role attribute

`ALTER ROLE role_name CREATEROLE;`

# GRANT privileges to roles

`GRANT privilege_name ON object TO role_name;`

- see privilege lists : SELECT , INSERT , UPDATE , DELETE etc
- see object lists : Table, View etc
- role : an individual or a group of individuals
- See Common PostgreSQL roles

# REVOKE privileges from roles

`REVOKE UPDATE ON table_name FROM role_name;`

# Users and groups

### Group role

`CREATE ROLE group_role WITH CREATEDB CREATEROLE;`

### User role

`CREATE ROLE user_role WITH PASSWORD 'PasswordForIntern' VALID UNTIL '2020-01-01' ;`

### Alter user role

`ALTER ROLE user_role WITH PASSWORD 's3cur3p@ssw0rd';`

### Grant Group privilege to user

`GRANT group_role TO user_role;`

### Revoke Group privilege from user

`REVOKE group_role FROM user_role;`

# Delete a role

`DROP ROLE role_name;`

# Partitioning

- Split bigger table into smaller parts
- For faster processing / calculation on smaller parts
- Vertical and Horizontal Partitioning

# Vertical Partitioning

- Columns are split to make separate tables

# Horizontal Partitioning

- Rows are split to make separate tables

# Creating Horizontal Partitions

- create a table with `PARTITION BY RANGE ()`
- create smaller tables as `PARTITION OF` the main table `FOR VALUES FROM x TO y`
- create `INDEX` ON main tables range attribute

### Using Range()

```
CREATE TABLE table_name (
...
some_col DATE NOT NULL
)
PARTITION BY RANGE (some_col);

CREATE TABLE table_name_p1 PARTITION OF table_name
FOR VALUES FROM ('2019-01-01') TO ('2019-03-31');
...
CREATE TABLE table_name_p2 PARTITION OF table_name
FOR VALUES FROM ('2019-09-01') TO ('2019-12-31');

CREATE INDEX ON table_name ('some_col');
```

### Using List()

```
CREATE TABLE table_name (
  film_id INT,
  title TEXT NOT NULL,
  release_year TEXT
)
PARTITION BY LIST (release_year);

CREATE TABLE table_name_1
	PARTITION OF table_name FOR VALUES IN ('2019');

CREATE TABLE table_name_2
	PARTITION OF table_name FOR VALUES IN ('2018');

CREATE TABLE table_name_3
	PARTITION OF table_name FOR VALUES IN ('2017');

INSERT INTO table_name
SELECT film_id, title, release_year FROM film;

```

# Pros/cons

- Indices of heavily-used partitions fit in memory
- Move lesser used partitions to specific medium: slower vs. faster
- Used for both OLAP as OLTP

However,
- Partitioning existing table can be a hassle
- Some constraints can not be set

# Sharding

- Horizontal partitions are applied to distribute a table across several machines
- Used for MPP (massively parallel processing)
- Calculations can be processed on each shard

# Unified Data Model

- Need overview of all database that are in different system / machine / format. eg- MongoDB, csv, MySQL
- Different update cadency for different data (1 hour, 30 minutes, 1 day)
- Dashboards?
- Recommendation engine?
- Transform (or ETL like apache airflow or scriptella) the different formats into one chosen format for uniformed model 
- Unified Data Format (eg- Redshift) connects all transformed formats

# Choosing a data integration tool

- Flexible : Connects all data sources
- Reliable : Can be maintained in a year
- Scalable : Scale well with increase of volume
- Automated testing and proactive alerts : Notification when data gets corrupted 
- Security : Restricted data from a source should remain restricted in unified data model
- Data governance - lineage : Should be traceable - where the data originated from / source and where it is used regularly

# DBMS

It is an interface between database and end users that manages 3 aspects:
- Data
- Database schema
- Database engine

Generally of Two types:
- SQL DBMS
- NoSQL DBMS


# RDBMS

- Relational DataBase Management System / SQL DBMS
- Based on the relational model of data
- SQL as query language
- When data is structured and unchanging
- Data must be consistent

# NoSQL DBMS

- Less structured / Flexible 
- Document-centered
- Used when No clear schema is followed / schema is varied
- Large quantities of data
- When data is frequently changing
- Scalable data / rapidly growing data
- Types: key-value store, document store, columnar database, graph database

# key-value store

- Combinations of keys and values
- Key: unique identifier
- Value: anything
- Use case: managing the shopping cart for an on-line buyer
- Example : Redis

# document store

- Key: unique identifier
- Values : documents
- Documents are somewhat structured
- Use case: content management system
- Example : MongoDB

# columnar database

- Store data in columns
- Faster Scalable
- Use case: big data analytics where speed is important
- Example: Cassandra

# graph database

- Data is interconnected and best represented as a graph
- Lots of complexity is introducd
- Use case: social media data, recommendations
- Example: neo4j