# WEEK 18: 
https://learnwith.campusx.in/s/courses/637339afe4b0615a1bbed390/take

# Datatypes in SQL

SQL has several standard data types, including:

- __INT :__ A numeric data type that can store whole numbers. Used to store IDs, counts, or other numeric values that do not have a decimal component.


- __DECIMAL :__ A numeric data type that can store numbers with a fixed number of digits to the right of the decimal point. Used to store monetary values, measurements, or other values with a known number of decimal places.


- __FLOAT :__ A numeric data type that can store floating-point numbers with a decimal component. Used to store values with a large number of decimal places, such as scientific or mathematical calculations.

    - Other types : __BIGINT, SMALLINT, TINYINT, DECIMAL, NUMERIC, FLOAT, REAL__



- __VARCHAR :__ A character and string data type that can store variable-length strings of characters. Used to store text values, such as names, addresses, or descriptions.

    - Other types : __CHAR, VARCHAR, TEXT__


- __DATE :__ A date and time data type that stores the date (year, month, and day) without the time. Used to store dates such as birthdays, hire dates, or transaction dates.
    
    - Other types : __DATE, TIME, DATETIME, TIMESTAMP__


- __BOOLEAN :__ A binary data type that can store either true or false values. Used to store binary data, such as yes/no or on/off values.

These are some of the most commonly used data types in SQL, and the specific data type used will depend on the type of data being stored and the requirements of the application.






## Numeric Types : 

1. __INT :__ The INT data type is used to store integers with a maximum value of
2147483647 and a minimum value of -2147483648. Examples of data that can be
stored in INT include employee IDs, order numbers, and product IDs.


2. __TINYINT :__ The TINYINT data type is used to store integers with a maximum value of
127 and a minimum value of -128. Examples of data that can be stored in TINYINT
include Boolean values, such as 0 for false and 1 for true.


3. __SMALLINT :__ The SMALLINT data type is used to store integers with a maximum
value of 32767 and a minimum value of -32768. Examples of data that can be
stored in SMALLINT include quantities of items, such as the number of products
sold in a transaction.


4. __MEDIUMINT :__ The MEDIUMINT data type is used to store integers with a maximum
value of 8388607 and a minimum value of -8388608. Examples of data that can be
stored in MEDIUMINT include the number of visitors to a website or the number of
followers on a social media platform.


5. __BIGINT :__ BIGINT data type is used to store integers with a maximum value of
9223372036854775807 and a minimum value of -9223372036854775808.
Examples of data that can be stored in BIGINT include the total revenue generated
by a company or the number of views on a YouTube video.


6. __FLOAT :__ The FLOAT data type is used to store single-precision floating-point
numbers, which are numbers with a decimal point. Examples of data that can be
stored in FLOAT include the price of a product or the temperature of a room.


7. __DOUBLE :__ The DOUBLE data type is used to store double-precision floating-point
numbers, which are numbers with a decimal point that can store more digits than
FLOAT. Examples of data that can be stored in DOUBLE include very large or very
small numbers, such as the distance between planets in the solar system or the
size of an atom.


8. __DECIMAL :__ The DECIMAL data type is used to store exact decimal values with a
fixed number of digits before and after the decimal point. Examples of data that
can be stored in DECIMAL include financial values, such as the cost of an item or
the total balance in a bank account.

![image.png](attachment:image.png)

## String Data Type:

__1. CHAR :__ This data type is used to store fixed-length strings. The length
of the string is specified when the table is created, and the field will
always use that amount of space, regardless of whether the string
stored in it is shorter or longer. For example, if you define a
CHAR(10) field and store the string "hello" in it, MySQL will pad the
string with spaces so that it takes up 10 characters. CHAR fields are
useful when you have a field that always contains the same length of
data, such as a state abbreviation or a phone number.

__2. VARCHAR :__ This data type is used to store variable-length strings. The
length of the string can be up to a specified maximum, but the field
will only use as much space as it needs to store the actual data. For
example, if you define a VARCHAR(10) field and store the string
"hello" in it, MySQL will only use 5 characters to store the data.
VARCHAR fields are useful when you have a field that can contain
varying amounts of data, such as a user's name or address.


__3. TEXT :__ This data type is used to store larger amounts of variable-
length string data than VARCHAR. It can store up to 65,535 characters. TEXT fields are useful when you need to store large
amounts of text data, such as blog posts or comments.


__4. MEDIUMTEXT :__ This data type is used to store even larger amounts of
text data than TEXT. It can store up to 16,777,215 characters.
MEDIUMTEXT fields are useful when you need to store very large
amounts of text data, such as long-form articles or legal documents.


__5. LONGTEXT :__ This data type is used to store the largest amounts of
text data. It can store up to 4,294,967,295 characters. LONGTEXT
fields are useful when you need to store extremely large amounts of
text data, such as entire books or large collections of data.

### ENUM and SET : 

__6. ENUM:__ Like drop down menu, The ENUM data type is used to store a set of predefined values. You can
specify a list of possible values for an ENUM column, and the column can only
store one of these values. The ENUM data type can be used to ensure that only
valid values are stored in a column, and it can also save storage space compared to
storing string values. Example - gender



__7. SET :__ The SET data type is similar to ENUM, but it can store multiple values. You can
specify a list of possible values for a SET column, and the column can store any
combination of these values. The SET data type can be used to store sets of values,
such as tags or categories, in a single column. Example - hobbies

![image.png](attachment:image.png)

### 7. BLOB

The BLOB (Binary Large Object) data type in MySQL is used to store large binary
data, such as images, audio, video, or other multimedia content.
In MySQL, there are four types of BLOB data types that can be used to store binary
data with different maximum sizes:


- __TINYBLOB :__ Maximum length of 255 bytes. TINYBLOB is the smallest BLOB data
type in MySQL. It can be used to store small binary data, such as icons, small
images, or serialized objects.


- __BLOB :__ Maximum length of 65,535 bytes (64 KB).BLOB is a medium-sized BLOB
data type that can be used to store larger binary data, such as images, audio,
video, or other multimedia files.


- __MEDIUMBLOB :__ Maximum length of 16,777,215 bytes (16 MB).MEDIUMBLOB is
a larger BLOB data type that can be used to store even larger binary data, such
as high-resolution images or longer audio or video files.


- __LONGBLOB :__ Maximum length of 4,294,967,295 bytes (4 GB).LONGBLOB is the
largest BLOB data type in MySQL, and it can be used to store very large binary
data, such as very high-resolution images, long audio or video files, or even
entire documents.


___LOAD_FILE(PATH)___

![image.png](attachment:image.png)

#### Pros of storing files in BLOB columns:
- BLOB columns allow you to store binary data directly in the database, without needing to
store the file externally.


- Storing files in the database can simplify backup and restore procedures, as all the data is
in one place.


- Access to BLOB data can be controlled through database user permissions.


#### Cons of storing files in BLOB columns:


- Storing large files in the database can slow down database performance and increase
storage requirements.


- If you need to access the file outside of the database (e.g. to share it with another
application or user), you'll need to extract it from the database.


- Some file types may not be well-suited for storage in BLOB columns, depending on their size, structure, and how they are accessed.

## Datetime

In MySQL, there are several temporal data types that can be used to
store and manipulate time and date values. These include:

__1. DATE__ - used for storing date values in the format YYYY-MM-DD.


__2. TIME__ - used for storing time values in the format HH:MM:SS.
DATETIME - used for storing date and time values in the format
YYYY-MM-DD HH:MM:SS.


__3. TIMESTAMP__ - used for storing date and time values in the format
YYYY-MM-DD HH:MM:SS. It has a range of 1970-01-01 00:00:01
UTC to 2038-01-19 03:14:07 UTC.


__4. YEAR__ - used for storing year values in 2-digit or 4-digit format (YYYY
or YY). If the year is specified with 2 digits, it is assumed to be in
the range 1970-2069 (inclusive).

## Spatial Datatypes

- __GEOMETRY__ - The GEOMETRY data type is a generic spatial data type
that can store any type of geometric data, including points, lines, and
polygons.


ST_ASTEXT(), ST_X(),ST_Y()


- __JSON()__ 

![Screenshot%202023-08-22%20035508.png](attachment:Screenshot%202023-08-22%20035508.png)

### NOTE : use JSON_EXTRACT with \\$.key_name $\longrightarrow\;$ to extract  the value

---
---
---

### Why cant a single table hold all data?

A single table in SQL may not be able to hold all the data due to the following reasons:

1. **Size Limitations:**
   - Tables have practical size limits determined by the database management system and underlying hardware.
   - Large datasets may exceed these limits, leading to performance issues or data truncation.


2. **Performance Impact:**
   - As a table grows, querying and updating its data becomes slower due to increased processing time.
   - Indexes may become less effective, impacting query performance.


3. **Data Organization:**
   - Large and complex datasets often require logical organization into multiple related tables.
   - Separating data based on relationships enhances data integrity and simplifies querying.


4. **Maintenance Complexity:**
   - Managing a single massive table becomes difficult, affecting maintenance, backups, and recovery processes.
   - Smaller, specialized tables are easier to maintain and optimize.


5. **Concurrency Challenges:**
   - A single table may face contention issues when multiple users attempt to access or modify the data concurrently.
   - Dividing data into smaller tables can alleviate concurrency problems.


6. **Data Integrity:**
   - A single table can result in data duplication and inconsistency when dealing with different types of information.
   - Normalization (splitting data into related tables) helps maintain data integrity.


7. **Flexibility and Scalability:**
   - Using multiple tables allows for easier scalability by distributing data across different resources.
   - It's challenging to scale a monolithic table as system demands grow.


8. **Security and Access Control:**
   - Dividing data into smaller tables enables finer-grained access control.
   - It's easier to restrict access to specific data subsets with multiple tables.


9. **Complexity of Queries:**
   - Complex queries on a single large table can become convoluted and less efficient.
   - Breaking data into smaller tables makes queries more manageable and optimized.


10. **Data Redundancy:**
    - A single table might store redundant information for different entities, leading to inefficient storage.
    - Normalization minimizes redundancy by separating data logically.


11. **Data Backup and Recovery:**
    - Backing up and recovering data from a single large table can be time-consuming and error-prone.
    - Smaller tables facilitate more efficient backup and recovery processes.

In summary, while a single table might be suitable for small datasets, using multiple tables offers advantages in terms of performance, organization, maintenance, scalability, security, and overall database efficiency for larger and more complex datasets.

#  NORMALIZATION

Database normalization is a process used to organize data in a database to reduce data redundancy and
dependency.

The goal of normalization is to ensure that each piece of data is stored in one place, in a structured
way, to minimize the risk of inconsistencies and improve the overall efficiency and usability of the database.


There are several levels of database normalization, each with its own set of rules and guidelines. The most
commonly used levels of normalization are:


- __a. First Normal Form (1NF):__ This level requires that all data in a table is stored in a way that each column
contains only atomic (indivisible) values, and there are no repeating groups or arrays.


- __b. Second Normal Form (2NF):__ This level requires that each non-key attribute in a table is dependent on the
entire primary key, not just a part of it.


- __c. Third Normal Form (3NF):__ This level requires that each non-key attribute in a table is dependent only on the
primary key and not on any other non-key attributes.


- __d. There are higher levels of normalization, such as Fourth Normal Form (4NF) and Fifth Normal Form (5NF)__, but
they are less commonly used in practice.

### 1st Normal Form : 

A table is in 1 NF if:

- There are only Single Valued Attributes or each col should contain atomic values, comma sepertaed values are not allowed.


- Attribute/Column Domain does not change -> data type should not change.


- There is a unique name for every Attribute/Column.


- The order in which data is stored does not matter.

#### eg : Here we can see that Adress and Skills column has coma seperated values

![image.png](attachment:image.png)

#### assigning new rows based on multiple skills. It's in 1-NF but data is redundant

![image.png](attachment:image.png)

#### so data to not be redundant we will create  3 tables

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### 2nd Normal Form

A table is in 2NF if:

1. It is already in 1NF


2. It does not contain any partial dependency


___Partial dependency occurs when a non-key attribute is dependent on only a part of the
primary key instead of the entire key:___

![image.png](attachment:image.png)

Here Primary key is a combination of OrderID and ProductID

Product Name is partially dependent on ProductID part of the primary key. Product Name should not be part of that table.

So 2 tables will be formed:

![image.png](attachment:image.png)

### 3rd Normal Form

A table is in 3NF if:

- If it is already in 2NF


- There is no transitive dependency.

___A transitive dependency exists when a non-key attribute depends on another non-
key attribute, which is not a part of the primary key.___

![image.png](attachment:image.png)

#### creating 2 diff tables :

![image.png](attachment:image.png)

### BCNF

BCNF (Boyce-Codd Normal Form) is a higher level of database normalization that ensures that a database table has minimal redundancy and all functional dependencies are properly represented. A table is in BCNF if, for every non-trivial functional dependency, the left-hand side of the dependency is a superkey (a unique identifier for each row).

Here's a short explanation of BCNF with an example:

**Example: Employee_Project Table**

Consider a table that stores information about employees and the projects they work on. Each row contains the employee's ID, name, project ID, and project name.

| Employee ID | Employee Name | Project ID | Project Name  |
|-------------|---------------|------------|---------------|
| 1           | Alice         | 101        | Project A     |
| 2           | Bob           | 102        | Project B     |
| 1           | Alice         | 103        | Project C     |
| 3           | Carol         | 102        | Project B     |

In this example, we have a functional dependency: {Employee ID, Project ID} -> {Employee Name, Project Name}. This means that given an Employee ID and Project ID, we can determine the Employee Name and Project Name.

However, the table is not in BCNF because the combination of {Employee ID, Project ID} is not a superkey; it's possible for different employees to work on the same project, leading to duplicate rows.

To bring the table to BCNF, we split it into two tables:

**Table 1: Employees**

| Employee ID | Employee Name |
|-------------|---------------|
| 1           | Alice         |
| 2           | Bob           |
| 3           | Carol         |

**Table 2: Projects**

| Project ID | Project Name  |
|------------|---------------|
| 101        | Project A     |
| 102        | Project B     |
| 103        | Project C     |

Now, the functional dependency {Employee ID, Project ID} -> {Employee Name, Project Name} holds, and both tables are in BCNF. Redundancy is reduced, and each table represents a single entity without duplication.

In essence, BCNF ensures that every non-trivial functional dependency in a table is based on a superkey, reducing redundancy and improving data integrity in the database.


---
---
---

## ER Diagram

ER diagram stands for Entity-Relationship diagram. It is a graphical representation
of entities and their relationships to each other. ER diagrams are used in database
design to visualize the entities, attributes, and relationships involved in a system.

There are three basic types of relationships in an ER diagram:

1. __One-to-One (1:1) :__ Each entity in one set is associated with only one entity
in the other set, and vice versa.


2. __One-to-Many (1:N) :__ Each entity in one set is associated with one or more
entities in the other set, but each entity in the other set is associated with
only one entity in the first set.


3. __Many-to-Many (N:M) :__ Each entity in one set is associated with one or
more entities in the other set, and each entity in the other set is
associated with one or more entities in the first set.

eg : 

![image.png](attachment:image.png)

---
---

## SQL Injection

SQL injection is an attempt to access a website's database tables by injecting SQL into a form field. 

SQL Injection is a type of cyber attack where malicious individuals manipulate a website's input fields to trick the website's database into revealing sensitive information or performing unintended actions. It occurs when a website doesn't properly handle user input, allowing attackers to insert their malicious SQL code into input fields.


By executing their own SQL code, hackers can upgrade their account access, view someone else's private
information, or make any other modifications to the database


**How SQL Injection Works:**
1. Websites use databases to store and retrieve data. They often use SQL (Structured Query Language) to interact with the database.


2. Attackers input malicious SQL code into a website's input fields (like search boxes or login forms).


3. If the website doesn't validate or sanitize the input properly, the attacker's SQL code gets combined with the legitimate SQL code.


4. This combination of code can trick the database into doing something harmful, like revealing sensitive data or modifying the database.

**Example:**
Imagine a login form where you input your username and password. If the website doesn't properly validate the input and an attacker inputs something like:

```
username: admin' OR '1'='1
password: somepassword
```

The SQL code that the website sends to the database might end up looking like:

```sql
SELECT * FROM users WHERE username='admin' OR '1'='1' AND password='somepassword';
```

The `'1'='1'` part always evaluates to true, so the attacker might gain unauthorized access.

**How to Avoid SQL Injection:**


1. **Use Prepared Statements:** Instead of directly embedding user input into SQL queries, use parameterized queries or prepared statements provided by your programming language or framework. These methods automatically sanitize input, preventing SQL Injection.


2. **Input Validation:** Validate and sanitize user input before sending it to the database. Check if the input matches the expected format and type.


3. **Escape User Input:** If you can't use prepared statements, escape user input by converting special characters to their safe counterparts before using them in SQL queries.


4. **Least Privilege:** Make sure your database users only have the necessary permissions. This limits the potential damage if an attacker gets in.


5. **Error Handling:** Don't show detailed error messages to users. Attackers can exploit these messages to understand your database structure.


6. **Update and Patch:** Keep your software, database, and libraries up to date. Security vulnerabilities are often fixed in updates.


7. **Web Application Firewall (WAF):** Implement a WAF that can help detect and block SQL Injection attempts.
