<a href="https://colab.research.google.com/github/hrbolek/learning/blob/master/itsystems/database.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Databázové systémy

##  Úvod

Databázové systémy slouží k ukládání dat. 

Data lze podle charakteru rozdělit do dvou základních kategorií:
- Homogenní data
- Heterogenní data

Homogeními daty rozumíme data, která mají stejnou strukturu narozdíl od heterogenních dat, kde je struktura odlišná.

Homogenní data lze přirovnat k tabulce v Excelu, která má definovaný počet sloupců a těchnto sloupcích jsou uloženy hodnoty.


## Pár úvah nad datovými toky
V případě velkých datových objemů není žádoucí a mnohdy ani možné zpracovávat datové celky. Pole (```list```, ```array``` apod.) je zpracováváno po prvcích.

In [10]:
data = [0, 1, 2, 3]
def oldFashioned(data, cislo):
  result = []
  for item in data:
    result.append(item + cislo)
  return result

prictenoOld = oldFashioned(data, 2)
print(data, '-->', prictenoOld)

[0, 1, 2, 3] --> [2, 3, 4, 5]


In [12]:
def newWay(data, cislo):
  for item in data:
    yield item + cislo

prictenoNew = list(newWay(data, 2))
print(data, '-->', prictenoNew)    

[0, 1, 2, 3] --> [2, 3, 4, 5]


### Proč ```list()```

In [13]:
prictenoWOList = (newWay(data, 2))
print(data, '-->', prictenoWOList)    
print('Spustime vypocet')
pricteno2List = list(newWay(data, 2))
print(data, '-->', pricteno2List)    


[0, 1, 2, 3] --> <generator object newWay at 0x7f4b13b62f68>
Spustime vypocet
[0, 1, 2, 3] --> [2, 3, 4, 5]


### Generators
funkce s výrazem ```yield``` jsou generátory. Používají se v mnoha programovacích jazycích, příkladem budiž Python, Javascript, C# a další. Výsledkem takové funkce není návratový hodnota ale generátor, což je objekt s definovanými vlastnosti a metodami. Jedna z jeho metod (typicky ```next```) jejímž opakovaným voláním lze získat hodnoty tvořící zpracovávanou sekvenci.

V příkladu je funkce ```list``` použita na převod generátoru na seznam. Teprve v tuto chvíli dojde k výpočtu. Pečlivě si prostudujte následující řádky kódu. 

In [16]:
def demo(data, cislo):
  for item in data:
    print('pricitam', item, '+', cislo, '=', item + cislo)
    yield item + cislo

generator = demo(data, 2)
print('generator:', data, '-->', generator)    
print('Teprve ted spustime vypocet')
vysledek = list(generator)
print('výsledek', data, '-->', vysledek)        

generator: [0, 1, 2, 3] --> <generator object demo at 0x7f4b13a6d150>
Teprve ted spustime vypocet
pricitam 0 + 2 = 2
pricitam 1 + 2 = 3
pricitam 2 + 2 = 4
pricitam 3 + 2 = 5
výsledek [0, 1, 2, 3] --> [2, 3, 4, 5]


### Generators II
Generátory jsou fakticky stavové automaty. S výhodou je lze používat při definování posloupnosti akcí nad datovým tokem. Fakticky jsou základem operačních systémů. 

Budiž funkce

$g_1(x)=x + 1$

$g_2(x)=x+2$

Obě funkce lze zobecnit pomocí funkce

$f(x, y)= x+ y$

neboť 

$g_1(x) = f(x, 1)$

$g_2(x)=f(x,2)$

Lze definovat funkcionál
$F$, pro který platí

$F(f, 2)=g_1$

$F(f, 2)=g_2$

Funkcionály jsou základem funkcionálního programování. Pro funkcionální programování je možné využít např. jazyka F#. Prvky funcionálního programování jsou ale dostupné (a mnohdy i využívané) v jazycích Python, Javascript, C#.

In [17]:
def f(x, y):
  return x + y

def F(f, x):
  def g(y):
    return f(x, y)
  return g

g1 = F(f, 1)
g2 = F(f, 2)

print('f(5, 10) =', f(5, 10))
print('g1(3) =', g1(3))
print('g2(3) =', g2(3))

f(5, 10) = 15
g1(3) = 4
g2(3) = 5


In [0]:
import pandas as pd
def displayData(data):
  df = pd.DataFrame(data)
  display(df)

### Příklad homogenních dat


In [0]:
dataStudents = [
  (1,'Monique Davis',400,'Literature','Monique@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
	(2,'Teri Gutierrez',800,'Programming','Teri@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
  (3,'Spencer Pautier',1000,'Programming','Spencer@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
  (4,'Louis Ramsey',1200,'Programming','Louis@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
  (5,'Alvin Greene',1200,'Programming','Alvin@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
  (6,'Sophie Freeman',1200,'Programming','Sophie@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
  (7,'Edgar Frank \"Ted\" \"Codd\"',2400,'Computer Science','Edgar@someOtherSchool.edu','2017-08-16 15:35:33','2017-09-02 19:33:56'),
  (8,'Donald D. Chamberlin',2400,'Computer Science','Donald@someOtherSchool.edu','2017-08-16 15:35:33','2017-09-02 19:33:56'),
  (9,'Raymond F. Boyce',2400,'Computer Science','Raymond@someOtherSchool.edu','2017-08-16 15:35:33','2017-09-02 19:33:56')]

In [0]:
dataContactInfo = [
  (1,'Monique.Davis@freeCodeCamp.org','555-555-5551',97111),
  (2,'Teri.Gutierrez@freeCodeCamp.org','555-555-5552',97112),
  (3,'Spencer.Pautier@freeCodeCamp.org','555-555-5553',97113),
  (4,'Louis.Ramsey@freeCodeCamp.org','555-555-5554',0),
  (5,'Alvin.Green@freeCodeCamp.org','555-555-5555',97115),
  (6,'Sophie.Freeman@freeCodeCamp.org','555-555-5556',97116),
  (7,'Maximo.Smith@freeCodeCamp.org','555-555-5557',97117),
  (8,'Michael.Roach@freeCodeCamp.ort','555-555-5558',97118)
]

In [21]:
def tuple2dictionary(names, item):
  result = {}
  for name, value in zip(names, item):
    result[name] = value
  return result

def tuple2dictionarySeq(names, sequence):
  for item in sequence:
    yield tuple2dictionary(names, item)

namesStudents = ['studentID', 'FullName', 'sat_score', 'programOfStudy', 'schoolEmailAdr', 'rcd_Created', 'rcd_Updated']
namesContacts = ['studentID', 'studentEmailAddr',  'student-phone-cell', 'student-US-zipcode']

namedStudents = tuple2dictionarySeq(namesStudents, dataStudents)
namedContacts = tuple2dictionarySeq(namesContacts, dataContactInfo)

displayData(dataSource())
displayData(tuple2dictionarySeq(names=names, sequence=dataSource()))

Unnamed: 0,0,1,2,3,4,5,6
0,1,Monique Davis,400,Literature,Monique@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
1,2,Teri Gutierrez,800,Programming,Teri@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
2,3,Spencer Pautier,1000,Programming,Spencer@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
3,4,Louis Ramsey,1200,Programming,Louis@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
4,5,Alvin Greene,1200,Programming,Alvin@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
5,6,Sophie Freeman,1200,Programming,Sophie@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
6,7,"Edgar Frank ""Ted"""" Codd""",2400,Computer Science,Edgar@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56
7,8,Donald D. Chamberlin,2400,Computer Science,Donald@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56
8,9,Raymond F. Boyce,2400,Computer Science,Raymond@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56


Unnamed: 0,studentID,FullName,sat_score,programOfStudy,schoolEmailAdr,rcd_Created,rcd_Updated
0,1,Monique Davis,400,Literature,Monique@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
1,2,Teri Gutierrez,800,Programming,Teri@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
2,3,Spencer Pautier,1000,Programming,Spencer@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
3,4,Louis Ramsey,1200,Programming,Louis@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
4,5,Alvin Greene,1200,Programming,Alvin@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
5,6,Sophie Freeman,1200,Programming,Sophie@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
6,7,"Edgar Frank ""Ted"""" Codd""",2400,Computer Science,Edgar@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56
7,8,Donald D. Chamberlin,2400,Computer Science,Donald@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56
8,9,Raymond F. Boyce,2400,Computer Science,Raymond@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56


## Doplňkové studijní zdroje

https://www.w3schools.com/sql/



## Relační algebra

Operátory 
- Sjednocení, 
- Průnik, 
- Rozdíl (podobnost s operacemi nad množinami není náhodná)
---
- Selekce, (SQL ```where```)
- Projekce, (SQL ```select```)
- Kartézký součin, (SQL ```join```)
- Přejmenování, (SQL ```as```)


### Selekce s použitím Pythonu

In [30]:
def createSelect(queryF):
  def selectF(generator):
    return filter(queryF, generator)
  return selectF

def createSelectEx(queryF): # stejne jako createSelect
  def selectF(generator):
    return (item for item in generator if queryF(item)) # viz https://docs.python.org/3/howto/functional.html
  return selectF

studentsWithHighScoreSelection = createSelect(lambda item: item['sat_score'] >= 1000)
namedStudents = tuple2dictionarySeq(namesStudents, dataStudents)

subsetResult = studentsWithHighScoreSelection(namedStudents)
displayData(subsetResult)

Unnamed: 0,studentID,FullName,sat_score,programOfStudy,schoolEmailAdr,rcd_Created,rcd_Updated
0,3,Spencer Pautier,1000,Programming,Spencer@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
1,4,Louis Ramsey,1200,Programming,Louis@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
2,5,Alvin Greene,1200,Programming,Alvin@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
3,6,Sophie Freeman,1200,Programming,Sophie@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56
4,7,"Edgar Frank ""Ted"""" Codd""",2400,Computer Science,Edgar@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56
5,8,Donald D. Chamberlin,2400,Computer Science,Donald@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56
6,9,Raymond F. Boyce,2400,Computer Science,Raymond@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56


### Projekce s využitím Pythonu

In [24]:
def createProjection(mapF):
  def projectionF(generator):
    for item in generator:
      yield mapF(item)
  return projectionF

def createProjectionEx(mapF): #stejne jako createProjection
  def projectionF(generator):
    return (mapF(item) for item in generator) #viz https://docs.python.org/3/howto/functional.html / Generator expressions and list comprehensions
  return projectionF

def createProjection2(names):
  def projectionF(generator):
    for item in generator:
      result = {}
      for name in names:
        result[name] = item[name]
      yield result
  return projectionF

namedStudents = tuple2dictionarySeq(namesStudents, dataStudents)
someStudentsColumns = createProjection(lambda item: {'studentID': item['studentID'], 'FullName': item['FullName']})
someColumnsResult = someStudentsColumns(namedStudents)
displayData(someColumnsResult)

namedStudents = tuple2dictionarySeq(namesStudents, dataStudents)
someStudentsColumns2 = createProjection2(['studentID', 'FullName', 'programOfStudy'])
someColumnsResult2 = someStudentsColumns2(namedStudents)
displayData(someColumnsResult2)

Unnamed: 0,studentID,FullName
0,1,Monique Davis
1,2,Teri Gutierrez
2,3,Spencer Pautier
3,4,Louis Ramsey
4,5,Alvin Greene
5,6,Sophie Freeman
6,7,"Edgar Frank ""Ted"""" Codd"""
7,8,Donald D. Chamberlin
8,9,Raymond F. Boyce


Unnamed: 0,studentID,FullName,programOfStudy
0,1,Monique Davis,Literature
1,2,Teri Gutierrez,Programming
2,3,Spencer Pautier,Programming
3,4,Louis Ramsey,Programming
4,5,Alvin Greene,Programming
5,6,Sophie Freeman,Programming
6,7,"Edgar Frank ""Ted"""" Codd""",Computer Science
7,8,Donald D. Chamberlin,Computer Science
8,9,Raymond F. Boyce,Computer Science


### Přejmenování s využitím Pythonu

In [27]:
def createRename(names):
  def renameF(generator):
    for item in generator:
      result = {**item}
      for old, new in names:
        del result[old]
        result[new] = item[old]
      yield result
  return renameF

namedStudents = tuple2dictionarySeq(namesStudents, dataStudents)
renamedColumnsStudents = createRename([('studentID', 'id'), ('FullName', 'name')])
renamedColumnsResults = renamedColumnsStudents(namedStudents)
displayData(renamedColumnsResults)



Unnamed: 0,sat_score,programOfStudy,schoolEmailAdr,rcd_Created,rcd_Updated,id,name
0,400,Literature,Monique@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56,1,Monique Davis
1,800,Programming,Teri@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56,2,Teri Gutierrez
2,1000,Programming,Spencer@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56,3,Spencer Pautier
3,1200,Programming,Louis@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56,4,Louis Ramsey
4,1200,Programming,Alvin@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56,5,Alvin Greene
5,1200,Programming,Sophie@someOtherSchool.edu,2017-08-16 15:34:50,2017-09-02 19:33:56,6,Sophie Freeman
6,2400,Computer Science,Edgar@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56,7,"Edgar Frank ""Ted"""" Codd"""
7,2400,Computer Science,Donald@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56,8,Donald D. Chamberlin
8,2400,Computer Science,Raymond@someOtherSchool.edu,2017-08-16 15:35:33,2017-09-02 19:33:56,9,Raymond F. Boyce


In [33]:
from itertools import product
#https://docs.python.org/3/library/itertools.html#itertools.product
def createCartesian(joinF):
  def cartesian(firstG, secondG):
    return filter(product(firstG, secondG), joinF)
  return cartesian

namedStudents = tuple2dictionarySeq(namesStudents, dataStudents)
namedContacts = tuple2dictionarySeq(namesContacts, dataContactInfo)

joinF = lambda left, right: left['studentID'] == right['studentID']
cartesian = createCartesian(joinF)
cartesianResult = cartesian(namedStudents, namedContacts)

displayData(product(namedStudents, namedContacts))
displayData(cartesianResult)
#displayData(namedContacts)

TypeError: ignored

 ## MySQL

 V prostředí MySQL (viz první stack) s pomocí phpMyAdmin spusťte následující SQL příkaz převzato [odtud](https://github.com/SteveChevalier/Distilling-Data/blob/master/schema_data_01_Student%20Schema%20and%20Data.sql)
```sql
-- ---------------------------------------------------
-- Part I - Create and Load Student Schema
-- ---------------------------------------------------
-- Create Schema (database) and set as default
CREATE DATABASE IF NOT EXISTS `student_examples`;
USE `student_examples`;

-- create student and student contact tables
DROP TABLE IF EXISTS `student`; 
CREATE TABLE `student` (
  `studentID` int(11) NOT NULL AUTO_INCREMENT,
  `FullName` text,
  `sat_score` int(11) DEFAULT NULL,
  `programOfStudy` text,
  `schoolEmailAdr` text,
  `rcd_Created` datetime DEFAULT CURRENT_TIMESTAMP,
  `rcd_Updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`studentID`));
  
DROP TABLE IF EXISTS `student-contact-info`;
CREATE TABLE `student-contact-info` (
  `studentID` int(11) DEFAULT NULL,
  `studentEmailAddr` text,
  `student-phone-cell` text,
  `student-US-zipcode` int(11) DEFAULT NULL);

-- Load data
INSERT INTO `student` 
	VALUES (1,'Monique Davis',400,'Literature','Monique@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
		(2,'Teri Gutierrez',800,'Programming','Teri@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
        (3,'Spencer Pautier',1000,'Programming','Spencer@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
        (4,'Louis Ramsey',1200,'Programming','Louis@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
        (5,'Alvin Greene',1200,'Programming','Alvin@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
        (6,'Sophie Freeman',1200,'Programming','Sophie@someOtherSchool.edu','2017-08-16 15:34:50','2017-09-02 19:33:56'),
        (7,'Edgar Frank \"Ted\"\" Codd\"',2400,'Computer Science','Edgar@someOtherSchool.edu','2017-08-16 15:35:33','2017-09-02 19:33:56'),
        (8,'Donald D. Chamberlin',2400,'Computer Science','Donald@someOtherSchool.edu','2017-08-16 15:35:33','2017-09-02 19:33:56'),
        (9,'Raymond F. Boyce',2400,'Computer Science','Raymond@someOtherSchool.edu','2017-08-16 15:35:33','2017-09-02 19:33:56');

INSERT INTO `student-contact-info` 
	VALUES (1,'Monique.Davis@freeCodeCamp.org','555-555-5551',97111),
    (2,'Teri.Gutierrez@freeCodeCamp.org','555-555-5552',97112),
    (3,'Spencer.Pautier@freeCodeCamp.org','555-555-5553',97113),
    (4,'Louis.Ramsey@freeCodeCamp.org','555-555-5554',0),
    (5,'Alvin.Green@freeCodeCamp.org','555-555-5555',97115),
    (6,'Sophie.Freeman@freeCodeCamp.org','555-555-5556',97116),
    (7,'Maximo.Smith@freeCodeCamp.org','555-555-5557',97117),
    (8,'Michael.Roach@freeCodeCamp.ort','555-555-5558',97118);


-- end Part I schema create and data import
```