### 10Apr25
#### Database Normalization and Algorithms
- designing a database
    - generative AI is actually pretty good at identifying potential data properties and generating test data
    - what makes a bad relation?
        - attributes that don't belong together
        - attributes with lots of NULL values
    - general guidelines
        - each tuple should represent a single entity or relationship instance
            - the schema should be easilly explained relation by relation
            - attributes should be easy to interpret
            - foreign keys should be used to refer to other entities
                - entity and relationship attributes should be kept separate as much as possible
        - design a schema that avoids anomalies from insert/delete/updates
            - consider `EMP_PROJ(Emp#, Proj#, Hours)`
                - you cannot insert a project without an employee assigned to it
                - deleting the project will delete the employees associated with it
                - updating the project will require updating all employees associated with it
                    - probably not efficiently
        - relations should be designed to minimize NULL values in their tuples
            - attributes that are frequently NULL may be better off in a separate relation
        - relations should be designed to satisfy lossless join
            - no spurious tuples should be generated when joining relations
            - non-additive or losslessness of joins
                - **Lossless Join Property:** a join operation on two relations R1 and R2 is lossless if the result of the join contains all the tuples in R1 and R2
                    - **exam question^**
                - **Lossy Join:**: a join operation on two relations R1 and R2 is lossy if the result of the join does not contain all the tuples in R1 and R2
                    - **exam question^**
                - if you join two relations, you should be able to get back the original relation
                - if you join two relations and get back a relation that is not the original, then it is a spurious tuple
            - preservation of dependencies
                - if you join two relations and get back a relation that does not satisfy the original dependencies, then it is a spurious tuple
- how to do this? Normalization
    - multiple formats
        - these formats are not mutually exclusive
        - dictate how to design a database
        - each normal format builds on the previous one
        - progressively refine the design to minimize redundancy and improve integrity of data
        - help to prevent anomalies from poorly structured relations and data
        - 1NF, 2NF, 3NF, BCNF
            - 3NF is the most commonly used in industry
            - BCNF is stronger and more restrictive
                - it is easier to define than 3NF
            - a well designed ER model will automatically yield relational tables in BCNF or even 4NF
    - how?
        - refining the schema
        - decomposing relations with undesirable properties into smaller relations
            - these follow a series of rules called normal forms
    - objectives
        - reduce data redundancy
            - minimize storage space
        - improve data integrity
            - ensure accuracy, consistency, and reliability of data by preventing anomalies
        - simplify maintenance
            - reducing complexity and risk for error
    - determining the normal form:
        - the form is determined by the **primary keys** and **functional dependencies**
            - Functional dependencies are the relationships between attributes in a relation
        - i.e. you must understand how one piece of data relates to another within the entity
        - the key helps determine if there are partial or transitive dependencies
            - partial dependency: a functional dependency where a non-prime attribute is functionally dependent on part of a candidate key
                - i.e. if the primary key is a composite key, then a non-prime attribute is dependent on only part of the composite key
                - this is bad because it means that the relation is not in 2NF
            - transitive dependency: a functional dependency where a non-prime attribute is functionally dependent on another non-prime attribute
                - i.e. if the primary key is a composite key, then a non-prime attribute is dependent on another non-prime attribute
                - this is bad because it means that the relation is not in 3NF
        - the functional dependencies help identify potential redundancies and other areas of concern
            - i.e. if the same data is stored in multiple places, then it is redundant
            - FD e.g.
                - ZIP_CODE -> CITY means:
                    - for every ZIP_CODE, there is a unique CITY
                    - if two records have the same ZIP_CODE, they must have the same CITY
                    - a zip code is a unique identifier for a city
                    - the city is functionally dependent on the zip code
                    - knowing the zip code allows you to determine the city
                    - attribute ZIP_CODE functionally determines CITY
                    - attribute CITY is functionally dependent on ZIP_CODE
    - determining the normal form
        - more to follow
- functional dependencies
    - constraints derived from the meaning and interrelationships of data attributes
    - a set of attributes X functionally determines a set of attributes Y if each value of X is associated with exactly one value of Y
        - i.e. if you know the value of X, you can determine the value of Y
    - X -> Y holds if for all tuples with the same X have the same Y
    - an FD is like a partial key
        - it uniquely determines some but not all attributes
    - a key is a special case of an FD
        - a key uniquely determines all attributes in the relation
    - these are derived from real-world constraints
        - e.g. a student ID uniquely identifies a student
        - e.g. a course code uniquely identifies a course
    - e.g.
        -   ```sql
            Courses (cid, title, prof, office)
            prof -> office
            ```
    -   e.g. within the company database
        - ```sql
            project_number -> {project_name, project_location}
            {SSN, project_number} -> hours
            ```
    - keys vs FDs
        - see slides around 25
        - e.g. what FDs hold in the following relation?
            - `Offerings (Course_id, Teacher_id, Hour, Room, Stu_id, Grade)`


### 17Apr 25
- examples
    - syntax
        -   ```
            R(a,b,c)
            K
            FD F = {a->b, b->c, ...}
            ```
    - if `a` functionally determines `b` and `c`, you can also write `a->b,c` or `a->bc`
    - if a functional dependency violates a normal form, you can decompose the relation
        - i.e. split the relation into two or more relations
- decomposition must be lossless
    - lossy decomposition is worse than normal form violations
    - lossy $\implies R \subseteq R_1 \bowtie R_2$
        - i.e. the join of the two relations does not contain all the tuples in the original relation
    - lossless $\implies R_1 \bowtie R_2 \equiv R$
        - i.e. the join of the two relations is equivalent to the original relation
            - nothing is lost in the join
    - you can test for losslessness after performing a decomposition
        - compute the intersection of the two relations
        - if $R_1 \cap R_2$ is a key for either relation, then the decomposition is lossless
            - e.g. `R(A,B,C,D,E)` with `FDs = {A->B, B->C, A->D, D->E}`
                - **likely exam question**
                - decompose into `R1(A,B,C)` and `R2(B,D,E)`
                - if $R_1 \cap R_2 = B$
                    - does `B` functionally determine `R_1` or `R_2`?
                        - no, `B` only functionally determines `C`
                    - `B` is not a key for either relation
                    - $\implies$ the decomposition is lossy
                - if $R_1 \cap R_2 = A$
                    - does `A` functionally determine `R_1` or `R_2`?
                        - yes, `A` functionally determines `B`
                    - `A` is a key for `R_1`
                    - $\implies$ the decomposition is lossless
- trivial functional dependencies
    - self
        - e.g. `A->A`
- closure of an FD set
    - it may be possible to imply other FDs from a set of FDs
    - e.g. `R(A,B,C)` with `FDs = {A->B, B->C}`
        - `A->C` is implied by `A->B` and `B->C`
        - $F^+$ denotes closure of the set of $F$
            - i.e. $F^+$ is the set of all FDs that can be derived from $F$
    - derived using the Armstrong Axioms
        - reflexivity
            - if $Y \subseteq X$, then $X \to Y$
        - augmentation
            - if $X \to Y$, then $XZ \to YZ$
        - transitivity
            - if $X \to Y$ and $Y \to Z$, then $X \to Z$
    - examples
        -   ```
            R(A,B,C,G,H,I)
            F = {A->B, A->C, CG->H, CG->I, B->H}
            ```
            - some other members of $F^+$
                - `A->H`
                    - by transitivity from `A->B` and `B->H`
                - `AG->I`
                    - by augmentation from `A->C` to get `AG->CG`
                    - then by pseudo-transitivity from `CG->I`
                    - hence `AG->CG->I`
    - algorithm for computing $F^+$
        - input the set of FDs
        - repeat until no change:
            - for FD $f \in F$: // apply
                - add all FDs generated by reflexivity (if $Y \subseteq X, then X \to Y$)
                - add all FDs generated by augmentation (if $X \to Y$, then $XZ \to YZ$)
            - for each pair f1




- determining the normal form of relations
    - **likely test questions**
    - testing for 2NF
        - definition: no non-prime attribute is partially dependent on any candidate key
            - i.e. no partial dependencies
        - check: for any `X->A`
            - is A a prime attribute of R?
                - prime: an attribute that is part of a key
                - if no, decompose R
            - is X a partial key of R?
                - if yes, decompose R
    - testing for 3NF
        - definition: all non- prime attributes depend only on the candidate keys
            - i.e. no transitive dependencies
        - check: for any `X->A`
            - is X a key of R?
                - if no, decompose R
            - is A a prime attribute of R?
                - if no, decompose R
    - testing for BCNF
        - definition: for every FD X->A, X is a key of R
        - check: for any `X->A`
            - is X a key of R?
                - if no, decompose R
    - e.g. relation
        - given `R(A1, A2, A3, ...,An)`
        - given $K \subseteq {A1, A2, A3, ...,An}$ is a key of $R$
        - given R has a set of FDs of form `X->A` where `X` and `A` are subsets of ${A1, A2, A3, ...,An}`
        - Question: is R in 2NF, 3NF, BCNF? If not, transform to the desired from.
            - check slides
    - achieving 2NF
        - **likely exam question**
        - consider `R(A,B,C,D,E)`, key = `{A,B}`, FDs = `{B->D}`
            - check steps
                - B determines D
                - B is not the whole key
                - D is not prime
                - hence `B->D` is a partial dependency
                - **R is not in 2NF**
            - decompose
                - split into `R1(A,B,C,E)` and `R2(B,D)`
                    - `{A,B}` is the key for `R1`
        - consider
            -   ```
                Relation Employee_Project(SSN, P_number, Hours, E_name, P_name, P_location)
                Key:{SSN, P_number}
                ```
            - check steps
                - look at the dependencies
                    - FD1: `SSN,P_number -> Hours`
                    - FD2: `SSN -> E_name`
                    - FD3: `P_number -> Pname, Plocation`
                        <img src="images/2NF_fail.png">
                        - possible anomalies
                            - deleting the last employee attached to a project deletes the project
                - decompose
                    - project into `Employee_Project(SSN, P_number, Hours)`, `Project(P_number, P_name, P_location)`, and `Employee(SSN, E_name)`
                        - now you can delete an employee without deleting the project
                        - retrieve the original table with `Employee_Project()`
                    <img src="images/decomposition.png">
    - achieving 3NF
        - consider a relation in 2NF
            - i.e. no non-prime attributes are partially dependent on any candidate key
        - recall the test for 3NF
            - if `X->A` is a FD, then:
                - `X` is a key of `R`
                - `A` is a prime attribute of `R`
        - consider
            -   ```
                Employee_Department(E_name, SSN, Address, D_num, D_name, M_SSN)
                Key: {SSN}
                ```
                <img src="images/3NF_fail.png">
                - `SSN->E_name, Address, D_num`
                - `D_num->D_name, M_SSN`
            - decompose
                - project into `Employee_Department(SSN, E_name, Address, D_num)`, `Department(D_num, D_name, M_SSN)`
                    - join the two relations to get the original relation


- e.g. the LOTS relation
    - a lot is a piece of land which someone owns
    - consider a state with two counties, Earp and Kidd
    - both counties keep records about lots
    - each county had internal lot numbering
    - later the state took over lot management
    - each lot was given a state lot number
    - now LOTS has a primary key ID, the state lot number
        - it also has a candidate key, the combination of the county, and county lot number
    - each county still needs to know its lot number
    <img src="images/LOTS_Relation.png">

        - if ID# was the only key, LOTS would be 2NF

F = {A → BC, CD → E, B → D, E → A}
R is decomposed into R1(A,B,C) and R2(A,D,E).
Is the decomposition lossy or lossless? Show your work.

Answer:
Intersection: R1 ∩ R2 → {A}### 22Apr25
- <img src="images/normal_forms.png" width="500">
- see slide deck 11 "Brief Definitions"
#### back to the lot example
- rate is still partially dependent on county
    - county is not a key 
    - **whenever you have a relation with only two attributes, you are automatically in BCNF**
- discovering new FDs from the data may cause you to need to further decompose a relation
### other stuff
- FDs are equivalent if they have the same closure
    - i.e. if $F^+ = G^+$
- why keys matter
    - sometimes the key isn't obvious
        - due to poor diagram, new FDs emerging, decomposition, etc
- finding keys via FDs
    - analyze the FDs
    - create two columns of attributes
        - left hand side: attributes on the left of the FD
        - right hand side: attributes on the right of the FD
    - draw arrows connecting matching attributes
    - are there any that are only pointed to?
        - these probably won't be keys
    - are there any that are always pointing?
        - these will always be a key or part of a key


### 29Apr25
- Project part 6
    - demo all of the features of the database system which have been implemented
    - submit GitHub link by the due date, the demo may take place afterwords
    - demo process:
        - option 1: schedule a time with the GTA or professor
            - it is a good idea to test the queries in advance and have them ready to copy-paste in and go
        - option 2:
            - if remotely accessible, email instructions for accessing and what queries to run to the GTA or professor
                - make sure to include steps (queries) to show off any and all relevant features
        - option 3:
            - export a bunch of query outputs to text files, email to GTA, they will respond with more queries to run
        - option 1 is preferred but slots are limited
- HW4 Solutions
    - **HW4 1 & 2 are likely exam questions**
    - 1:
        - R1 $\cup$ R2 = {A}
        - A is a key of R1
        - **the decomposition is lossless**
    - 2:
        - R1 $\cap$ R2 = {C}
        - C$^+$ = {C}
        - C is not a key of R1 or R2
        - **the decomposition is lossy**
    - 3:
        - many FDs possible
    - 4:
        - you *can* compute the keys of R
        - **you do not have to**
        - R1(B,D) with key B
        - R2(A,B,C,E) with F = {A->BC, E->A}
            - both FDs are BCNF valid because A is a key
        - The above is in BCNF and is a valid answer
        - you do lose FD CD->E though
            - if this is important, you can add another relation
        - R3(C,D,E) with F = {CD->E}
    -5:
        - EGH is not on the right side of any FD $\implies$ it is a part of the key
        - start with {EGH}$^+$ = {EGH}
            - add D
        - the key is {DEGH}
        - R1(A,B,C) Key: AB, FD: AB->C
        - R2 violates 3NF
        - R3(D,E,B) key: DE, FD: DE->B
        - R4(A,D,E,G,H) key: DEGH, FD: DEH->A
        - lost FD D->C
            - could add another relation but not required to meet 3NF
    - 6 A:
        - R1(A,B,C) key: A, FD A->BC
        - R3(A,D,E) key: AD, FD AD->E
        - R4(A,D,G), key: ADG
    - 6 B:
        - the decomposition is lossless because A is a key of R1
    - 6 C:
        -  no, you lose BD->E and CD->AB
- when is it ok to drop an FD during decomposition?
    - if it is non critical and has no impact on the data


#### Database Security
- CIA
    - Confidentiality
        - only authorized users can access the data
    - Integrity
        - the data is accurate and consistent
    - Availability
        - the data is available when needed
- access control models
    - define user permissions for accessing resources and data
    - key components
        - access control policy
            - the rules
        - access control mechanism
            - the implementation of the rules
    - specifications
        - subject: the entity (user or program) that is requesting access
        - object: the resource (file, database, etc) that is being accessed
        - access rights: the permissions granted to the subject for the object
    - policy types
        - Discretionary Access Control (DAC)
            - the owner of the resource determines which subjects can access specific objects
            - achieved with `GRANT` and `REVOKE` commands
                - e.g. `GRANT SELECT ON table_name TO user_name`
                    - can include a `WITH GRANT OPTION` clause
                        - allows the user to grant the same privileges to other users
                        - should be used with caution
                - e.g. users A1, A3, A4
                    - A1 grants A3 `SELECT` on `Emp` and `Dept` with the `WITH GRANT OPTION`
                    - A3 grants A4 `SELECT` on `Emp` and `Dept`
                    - A4 can now `SELECT` on `Emp` and `Dept` but cannot grant it to others
            - the timing of `REVOKE` is important
            - usually implemented via an access control list
            - e.g. UNIX file permissions
        - Mandatory Access Control (MAC)
            - access is based on security labels
            - subjects have security clearance levels
            - objects have security classifications
            - dominance relation
                - a subject can access an object if the subject's clearance level is the same or higher than the object's classification level
            - e.g. military classification levels
            - could be implemented by including a classification level attribute for each record in the database
                - e.g. a user with secret clearance can access files with secret and unclassified classifications while a user with top secret clearance can access files with top secret, secret, and unclassified classifications
        - Role-Based Access Control (RBAC)
            - access is based on roles assigned to users
            - roles are assigned to users based on their job functions
            - e.g. a user may have the role of "admin" or "user"
- encryption
    - protect data at rest and in transit
    - key types
        - data at rest
            - encrypts data stored on disk
            - e.g. AES, TDE
                - TDE (Transparent Data Encryption) is transparent to the users
        - data in transit
            - encrypts data being transmitted over a network
            - e.g. SSL/TLS, HTTPS
        - field level encryption
            - encrypts specific fields in a database
            - e.g. credit card numbers, social security numbers
    - implementation
        - transparent encryption systems are built into the DBMS, no application changes are needed
        - application-level encryption encrypts data before it is sent to the DBMS
            - requires application changes
        - column-level encryption encrypts specific columns in a table
- fault tolerance, backups, recovery
    - protect against data loss

### Database Manipulation and Programming
- methods of interaction
    - direct manipulation
        - e.g. SQL commands
    - batch command files of pre-written SQL commands
        - e.g. `*.sql` files
    - GUI interface
        - e.g. web apps for hotel reservations, banking, etc
    - database applications
        - e.g. programs written in Java, C#, Python, etc including database access logic
    - most database interactions are done through applications
- programming
    - embedded
        - host language is used to write the program
    - libraries
        - use a library to access the database
    - API
        - use an API to access the database
- impedance mismatch
    - difference between the data model of the host language and the data model of the database
    - binding
        - the process of converting data between the host language types and the database types
    - database constructs like relations, tuples, and attributes need to be mapped to host language constructs like classes, objects, and attributes


 "### 24Apr25\n",
    "- exercise: closure of attributes\n",
    "    - relation `R(A_CustomerID, B_OrderID, C_ProductID, D_OrderDate, E_ProductPrice)\n",
    "        - see slide 3\n",
    "        - calculate the closure of {A}, {A,B}, {B,C}\n",
    "            - given A->D, B->C, A,B->E, C->E\n",
    "            - algorithm\n",
    "                - initialize closure set: $X^+ = X$\n",
    "                - while FD `Y->Z` in set of FDs `F`, \n",
    "                    - if $Y \\subseteq X^+$, then add $Z$ to $X^+$\n",
    "                    - repeat until no change\n",
    "            - closure of {A}\n",
    "                - $A^+ = {A,D}\n",
    "            - closure of {A,B}\n",
    "                - $(A,B)^+ = {A,B,C,D,E}$\n",
    "            - closure of {B,C}\n",
    "                - $(B,C)^+ = {B,C,E}$\n",
    "        - derive any additional FDs\n",
    "            - `B->C->E`\n",
    "                - `B->E` is derived from `B->C` and `C->E`\n",
    "    - relation `Enrollment(A_StudentID, B_Course, C_Professor, D_Department, E_Semester, F_Grade)`\n",
    "        - given `ABE->F, B->C, C->D, AE->D`\n",
    "        - derived FDs\n",
    "            - `B->C->D`\n",
    "                - `B->D` is derived from `B->C` and `C->D`\n",
    "            - `B->C`, augment with `E`, `BE->CE`, decompose `BE->C`\n",
    "                - course and semester determines professor\n",
    "    - relation `R(A_CustomerID, B_OrderID, C_ProductID, D_OrderDate, E_OrderPrice)`\n",
    "        - find the candidate keys\n",
    "        - create left and right sides of the FDs\n",
    "            | Left |    | Right |\n",
    "            | ---- |----| ----- |\n",
    "            | A    |    | C     |\n",
    "            | B    |    | D     |\n",
    "            | C    |    | E     |\n",
    "    - relation `R(A,B,C,D,E)`\n",
    "        - given `BCA->D, BCE->A, A->E, E->D`\n",
    "        - create left and right sides of the FDs\n",
    "            | Left |    | Right |\n",
    "            | ---- |----| ----- |\n",
    "            | B    |    | D     |\n",
    "            | C    |    | A     |\n",
    "            | A    |    | E     |\n",
    "            | E    |    |       |\n",
    "            - A and E are on both sides\n",
    "            - start with `{BC}` because they are both on the left side\n",
    "                - start adding attributes\n",
    "                - ${BCA}^+ = {BCEAD}$\n",
    "            - ${B}^+ = {B}$\n",
    "            - ${C}^+ = {C}$\n",
    "            - **see slide 15 for the complete closure, it is a full slide and possibly one extra slide of explanation** \n",
    "    - normalize into 2NF, 3NF, BCNF\n",
    "        - relation `R(C,T,H,R,S,G)`\n",
    "            - FDs: `C->T, HE->C, HT->R, CS->G, HS->R`\n",
    "            - find key:\n",
    "                - create left and right sides of the FDs\n",
    "                -   | Left |    | Right |\n",
    "                    |------|----| ----- |\n",
    "                    | C    |    | T     |\n",
    "                    | HE   |    | C     |   \n",
    "                    | HT   |    | R     |\n",
    "                    | CS   |    | G     |\n",
    "                    | HS   |    | R     |\n",
    "                - eliminate anything appearing on both sides\n",
    "                -   | Left |    | Right |\n",
    "                    | ---- |----| ----- |\n",
    "                    | H    |    | R     |\n",
    "                    | S    |    | G     |\n",
    "                - key must contain H, and S\n",
    "                    - ${HS}^+ = {HSRCGT}$\n",
    "                - key is {H,S}\n",
    "            - check for 2NF\n",
    "                - check every FD X->A\n",
    "                    - either A is a prime attribute or X is part of a key\n",
    "                        - see slides\n",
    "                    - **R is in 2NF**\n",
    "            - check for 3NF\n",
    "                - check every FD X->A\n",
    "                    - either A is a prime attribute or X is a (super)key\n",
    "                    - see slide 21 & 22\n",
    "                    - CS-> G\n",
    "                        - CS is not a key\n",
    "                        - G is not a prime attribute\n",
    "                        - **violates 3NF, decompose**\n",
    "                        - decompose into \n",
    "                            - `R1(C,S,G)`, key = `{C,S}`, FD = `CS->G`\n",
    "                                - this is in 3NF\n",
    "                            - `R2(C,T,H,R,S)`, key = `{H,S}`, FD = `C->T, HR->C, HT->R, HS->R`\n",
    "                                - now check if this is in 3NF\n",
    "                                    - no, most of the remaining FDs are not in 3NF because the right side is not a key\n",
    "                                - start with C->T\n",
    "                                    - C is not a key\n",
    "                                    - T is not a prime attribute\n",
    "                                    - decompose into\n",
    "                                        - R21(C,T), key = `{C}`, FD = `C->T`\n",
    "                                            - this is in 3NF\n",
    "                                            - also BCNF because it only has two attributes\n",
    "                                        - R22(C,H,R,S), key = `{H,S}`, FD = `HR->C, HT->R, HS->R`\n",
    "                                            - and so on...\n",
    "                    - to fix lossy decomposition, you can add a new relation containing the FD you lost\n",
    "                        - **see slides 25 and 26**\n",
    "                    - the decomposition in slide 26 is in BCNF because each relation only has one FD\n",
    "- one of the homework problems as an example\n",
    "    - question 6: \n",
    "        - part a\n",

### 01May25
- revoking permissions
    - revoking privileges does not guarantee that the user cannot access the data
        - the user may still have privileges through other roles or means
    - options: `CASCADE`, `RESTRICT`
        - used to control privilege propagation
        - `CASCADE`: also revoke all privileges that they granted to other users
            - the "nuclear" option
            - e.g. if user A1 was granted privileges and granted them to A2, a cascade revoke will revoke the privileges from both A1 and A2
        - `RESTRICT`: fails if the user has granted privileges to other users
            - used to avoid unintended privilege revocation
            - e.g. if user A1 was granted privileges and granted them to A2, a restricted revoke will fail and both users will retain the privileges
- grant and revoke diagram notation
    - P\*\* represents the source of privilege P
    - P\* represents a P with the `WITH GRANT OPTION`
    - A/P\*\*
        - see slide 18 because he's powering throught this shit
            - I hope it isn't important
- grant and revoke example
    - grant
        - Owner: `GRANT SELECT, INSERT ON Employee TO Admin WITH GRANT OPTION`
            - this will allow Admin to grant the same privileges to other users
        - Admin: `GRANT SELECT ON Employee TO Manager`
        - Manager: `GRANT SELECT ON Employee TO Staff`
    - revoke
        -  `REVOKE SELECT ON Employee FROM Admin CASCADE`
            - this will revoke the privileges from Admin and all users that Admin granted privileges to
- general notes
    - circular grants won't survive a revoke
        - e.g. if A1 grants A2 some privileges and A2 grants A1 these same privileges, a revoke will revoke the privileges from both users

### back to embedded programming
- SQLCODE variable
    - like a return code
    - used to convey the status of the last SQL statement to the host language
    - important values
        - 0: success
        - 100: no more data available in the result
        - <0: error
        - -803: unique constraint violation
        - -305: null value was fetched
- PHP
    - a server side scripting language
    - can be used to create dynamic web pages which interact with a database
        - e.g. you can embed SQL queries in PHP code
### transactions
- must be managed carefully in multi-user environments
- transactions are a logical unit of work
    - begin with a `BEGIN TRANSACTION` statement
    - end with a `COMMIT` or `ROLLBACK` statement
        - `COMMIT` makes the changes permanent
        - `ROLLBACK` undoes the changes
- ACID property
    - used to ensure data integrity and consistency
    - Atomicity
        - all or nothing
        - if a transaction fails, all changes are rolled back
    - Consistency
        - the database is in a consistent state before and after the transaction
    - Isolation
        - transactions are isolated
        - multiple transactions can be executed concurrently without interfering with each other
    - Durability
        - once a transaction is committed, the changes are permanent
            - i.e. they persist even in the event of a system failure

- transaction issues
    - lost update:
        - transaction 1 updates a record and transaction 2 updates the same record, overwriting the changes made by transaction 1
            - transaction 2 starts before transaction 1 ends
    - uncommitted dependency/dirty read:
        - transaction 1 reads a record that is being updated by transaction 2, but transaction 2 has not yet committed the changes. transaction 2 rolls back, and transaction 1 has read invalid data
            - transaction 1 reads the uncommitted changes made by transaction 2
    - inconsistent analysis:
        - transaction 1 reads a record that is being updated by transaction 2, but transaction 2 has not yet committed the changes. transaction 1 reads the record again after transaction 2 has committed, and the data is inconsistent
            - transaction 1 reads the record before and after transaction 2 commits
        - "a transaction reads partial results from another transaction that simultaneously interacts with (and updates) the same data items"
    - unrepeatable read:
        - two reads of the same record return different values
    - phantom read:
        - a transaction repeats a query and gets new rows that were not visible in the first query
            - e.g. a transaction reads a set of rows that satisfy a condition, and then another transaction inserts a new row that satisfies the same condition
        - difference between unrepeatable read and phantom read
            - unrepeatable read:, the same row is read twice and returns new/different **values**
            - phantom read: the same query is executed twice and returns new/different **rows** 
- resolving transaction issues
    - serialization
        - ensures that the final state of the database after concurrent transactions is the same as if the transactions were executed one-by-one in some serial order
        - prevents anomalies
        - guarantees correctness
        - types
            - conflict serializability
                - uses locks to ensure that transactions are executed in a serial order
            - view serializability
                - ensures that the final state of the database is the same as if the transactions were executed in a serial order
    - achieving serialization 
        - locking: transactions acquire locks on the data they are accessing to block conflicting operations
        - timestamp ordering: assigns a timestamp to each transaction and ensures that transactions are executed in the order of their timestamps
        - optimistic concurrency control: allows transactions to execute without locking, but checks for conflicts at commit time
    - isolation levels
        - database setting that controls how and when transactions see each other's changes
        - balance safety and speed
        - levels
            - <img src="images/isolation_levels.png" width="500">
    - locks
        - used to control access to data
        - types
            - exclusive lock
                - prevents other transactions from reading or writing the locked data
            - shared lock
                - allows other transactions to read the locked data, but not write to it


### 06May 25
#### Final Exam
- covers content since Exam 3
    - lecture PDFs numbered 10-12
    - normalization, definitions, examples, exercises, and DB topics
#### back to transactions
- locking
    - used to control access to data
- transaction outcomes
    - committed
        - record update is permanently written to the database
    - aborted
        - record update is not written to the database
        - no trace of the transaction is left in the database
    - failures can occur but should be rare and recovery protocols should be in place
- system log
    - used to record all transactions and their outcomes
        - so that you can always recover to a consistent state
    - sequential, append only structure
    - periodically backed up
- synchronization points
    - boundaries between completed transactions
    - snapshots of the database at a specific point in time
- architectures
    - centralized
        - single machine system
        - simple system, centralized control
        - bottlenecks under heavy load, lack of redundancy
    - client-server
        - dedicated server, multiple clients
        - accessed via a network
        - e.g. MySQL, PostgreSQL in web apps
    - distributed
        - multiple servers, multiple clients
        - queries and updates may span multiple databases
        - e.g. global systems like banks
        - resilient, scalable, and generally high performance
        - more complex management
        - increased operational overhead
            - redundant storage increases costs
            - security risks from multiple access points
            - must work transparently
                - user should not need to know data location
                - data should be the same across all locations for all users
                - user should not need to know the schema details, etc
- fragmentation
    - dividing a database into smaller pieces
    - complete fragmentation: all data is accounted for
        - no loss during splits
    - horizontal: union all rows to get the original table
    - vertical: full outer join all columns to get the original table
    - hybrid: split rows by region and columns by sensitivity
- replication
    - full: copy the entire DB at every site
        - maximizes availability but increases update time
    - partial
        - critical fragments are duplicates
        - replication schema defines which parts to copy
        - a balance of availability vs update speed
- federated database system
    - a collection of autonomous databases
- CAP theorem
    - Consistency
        - all copies always match
    - Availability
        - a copy is always available
    - Partition Tolerance
        - the system continues to operate despite network issues
    - distributed database systems can only guarantee two of the three
        - e.g. banking systems prioritize accuracy over uptime
            - a bank may be down for maintenance but the data is always accurate
        - e.g. social media prioritize uptime over accuracy
            - tweets may still be available through a partial outage   
- big data
    - volume, velocity, variety, veracity, value
    - 