# DATABASE NORMALIZATION


# 1)Database Normalization
    * Normalization is a systematic approach of decomposing tables 
    * to eliminate data redundancy(repetition) and undesirable characteristics
    * like Insertion, Update and Deletion Anomalies.
    * It is a multi-step process that puts data into tabular form,
    * removing duplicated data from the relation tables.
![view](images/redundancy.png)
> Normalization is used for mainly two purposes,

    1) Eliminating redundant(useless) data.
    2) Ensuring data dependencies make sense i.e data is logically stored.
  
>  Problems Without Normalization

    * If a table is not properly normalized and have data redundancy 
    * then it will not only eat up extra memory space but will also make
    * it difficult to handle and update the database,without facing data loss.
    * Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized.
>  Insertion Anomaly
    * Suppose for a new admission, until and unless a student opts for a branch, 
    * data of the student cannot be inserted, or else we will have to set the branch information as NULL.
    * Also, if we have to insert data of 100 students of same branch, 
    * then the branch information will be repeated for all those 100 students.
    * These scenarios are nothing but Insertion anomalies.

>  Updation Anomaly
    * What if Mr. X leaves the college? or is no longer the HOD of computer science department?
    * In that case all the student records will have to be updated, and 
    * if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.

>  Deletion Anomaly
    * In our Student table, two different informations are kept together, 
    * Student information and Branch information. Hence, at the end of the academic year, 
    * if student records are deleted, we will also lose the branch information. This is Deletion anomaly.

>  Normalization Rule
    * Normalization rules are divided into the following normal forms:
    1) First Normal Form
    2) Second Normal Form
    3) Third Normal Form
    4) BCNF
    5) Fourth Normal Form
  
>  First Normal Form (1NF):-
![view](images/1nf.gif) 1NF Normalization.
    * For a table to be in the First Normal Form, it should follow the following 4 rules:
    1) It should only have single(atomic) valued attributes/columns.
    2) Values stored in a column should be of the same domain
    3) All the columns in a table should have unique names.
    4) And the order in which data is stored, does not matter.
 
>  Second Normal Form (2NF):-
    * For a table to be in the Second Normal Form,  
    1) It should be in the First Normal form.
    2) it should not have Partial Dependency.
     ![view](images/2NF.gif)
    * The following functional dependencies exist:
    * 1. The attribute ProfessorName is functionally dependent on attribute IDProf (IDProf --> ProfessorName)
    * 2. The attribute StudentName is functionally dependent on IDSt (IDSt --> StudentName)
    * 3. The attribute Grade is fully functional dependent on IDSt and IDProf (IDSt, IDProf --> Grade)
>  Third Normal Form (3NF):-
    * A table is said to be in the Third Normal Form when,
    * It is in the Second Normal form.
    * It doesn't have Transitive Dependency.
    * Transitive Dependency ( X->Z , X->Y , Y->Z )
![view](images/3NF.gif)
    * 1. Name, Account_No, Bank_Code_No are functionally dependent on ID (ID --> Name, Account_No, Bank_Code_No)
    * 2. Bank is functionally dependent on Bank_Code_No (Bank_Code_No --> Bank)

>  Boyce and Codd Normal Form (BCNF):-
    * Boyce and Codd Normal Form is a higher version of the Third Normal form.
    * This form deals with certain type of anomaly that is not handled by 3NF. 
    * A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. 
    * For a table to be in BCNF, following conditions must be satisfied:
    * R must be in 3rd Normal Form
    * for each functional dependency ( X → Y ), X should be a super Key.

> Fourth Normal Form (4NF)
    *  A table is said to be in the Fourth Normal Form when,
    *  It is in the Boyce-Codd Normal Form.
    * it doesn't have Multi-Valued Dependency.
    

# Rules for First Normal Form
   ### Rule 1: Single Valued Attributes
    * Each column of your table should be single valued which means 
    * they should not contain multiple values

   ### Rule 2: Attribute Domain should not change
    * This is more of a "Common Sense" rule.
    * In each column the values stored must be of the same kind or type.

    * For example: If you have a column dob to save date of births of a set of people,
    * then you cannot or you must not save 'names' of some of them 
    * in that column along with 'date of birth' of others in that column. 
    * It should hold only 'date of birth' for all the records/rows.

   ### Rule 3: Unique name for Attributes/Columns
    * This rule expects that each column in a table should have a unique name.
    * This is to avoid confusion at the time of retrieving data or performing any other operation on the stored data.
    * If one or more columns have same name, then the DBMS system will be left confused.

   ### Rule 4: Order doesn't matters
    * This rule says that the order in which you store the data in your table doesn't matter.
![view](images/1NF-form.png)

# Rules for 2NF:-
* it must satisfy two conditions:
  * 1) The table should be in the First Normal Form.
  * 2) There should be no Partial Dependency.
  * first understand the dependency.
![view](images/dependency.png)
  
  ### Partial Dependency - > when the table have a primary key of more tha one column of group
                               * and the one col of primary key also identify the some column uniquely
                               * then we say that the table have a Partial key dependency.
![view](images/partial.png)

  * The simple solution of this case is to seprate the column with there partialy dependent keys.
  ## Quick Recap
   ### For a table to be in the Second Normal form,
     * it should be in the First Normal form and it should not have Partial Dependency.
     * Partial Dependency exists, when for a composite primary key, 
             * any attribute in the table depends only on a part of the primary key and not on the complete primary key.
     * To remove Partial dependency, we can divide the table,
             * remove the attribute which is causing partial dependency, 
             * and move it to some other table where it fits in well.


# Rule for 3 NF:-
  * When a table is in the Second Normal Form and
  * has no transitive dependency, then it is in the Third Normal Form.
![view](images/NF3.png)
  * Primary key for our Score table is a composite key,
  * which means it's made up of two attributes or columns → student_id + subject_id
  * Our new column exam_name depends on both student_id and subject_id. 
  * For example, a mechanical engineering student will have Workshop exam 
  * but a computer science student won't. And for some subjects you have Prctical exams 
  * and for some you don't. So we can say that exam_name is dependent on both student_id and subject_id.

  * And what about our second new column total_marks? Does it depend on our Score table's primary key?

  * the column total_marks depends on exam_name as with exam type the total score changes.
  * For example, practicals are of less marks while theory exams are of more marks.

    * But, exam_name is just another column in the score table.
    * It is not a primary key or even a part of the primary key, and total_marks depends on it.

    ### This is Transitive Dependency. 
        * When a non-prime attribute depends on other non-prime attributes 
        * rather than depending upon the prime attributes or primary key.
![view](images/transitive.png)

   ### The advantage of removing transitive dependency is:-

        * Amount of data duplication is reduced.
        * Data integrity achieved.

# Rules for BCNF
    * For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:

    1) It should be in the Third Normal Form.
    2) for any dependency A → B, A should be a super key.
          * The second point sounds a bit tricky, right?
          * In simple words, it means, that for a dependency A → B,
          * A cannot be a non-prime attribute, if B is a prime attribute.
![view](images/BCNFex.png)

    *  student_id, subject together form the primary key.
    *  one professor teaches only one subject, but one subject may have two different professors.
    *  there is a dependency between subject and professor here, where subject depends on the professor name.
    
    * Above table satiesfied the  1st Normal form because all the values are atomic,
            * column names are unique and all the values stored in a particular column are of same domain.

    * This table also satisfies the 2nd Normal Form as their is no Partial Dependency.

    * And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.

    * But this table is not in Boyce-Codd Normal Form.

### Why this table is not in BCNF?
    * In the table above, student_id, subject form primary key, which means subject column is a prime attribute.

    * But, there is one more dependency, professor → subject.

    * And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF
 
![view](images/BCNF2.png)
     
     * More Generic View of Bcnf :- 
![view](images/BCNF.png)

# Fourth Normal Form (4NF)

      * Fourth Normal Form comes into picture when Multi-valued Dependency occur in any relation.
# Rules for 4th Normal Form
   
    1) It should be in the Boyce-Codd Normal Form.
    2) the table should not have any Multi-valued Dependency.
# What is Multi-valued Dependency?
    1) For a dependency A → B, if for a single value of A, multiple value of B exists, 
         * then the table may have multi-valued dependency.
    2) Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
    3) And, for a relation R(A,B,C),
        * if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.
![view](images/NF4.png)
![view](images/NF42.png)
![view](images/NF43.png)

# Fifth Normal Form (5NF)
     * Fifth Normal Form in Database Normalization is generally not implemented in real life database design.
     * But you should know what it it.
     * it should be in 4NF
     * It should not have join dependency exist
       * meanson dividing table and after re-joining  table form are remain same.
       * data should not be loosed.
     * It is also known as PJNF(Project join normal form)
     * So if the table have joint dependency then decompose the table.
![view](images/5nf.jpg)