## SQL


- **SQL (Structured Query Language) is a programming language used for managing
and manipulating data in relational databases. It allows you to insert, update,
retrieve, and delete data in a database.**


- It is widely used for data management in
many applications, websites, and businesses. In simple terms, SQL is used to
communicate with and control databases.

## MY SQL


- MySQL is a relational database management system

- MySQL is open-source

- MySQL is free

- MySQL is ideal for both small and large applications

- MySQL is very fast, reliable, scalable, and easy to use

- MySQL is cross-platform

- MySQL is compliant with the ANSI SQL standard

- MySQL was first released in 1995

- MySQL is developed, distributed, and supported by Oracle Corporation

## Difference between Relational and NOSQL databases

- Relational databases use the relational model, which organizes data into tables with rows and columns, and uses structured query language (SQL) to access and manipulate the data. They are well suited for structured data, such as financial transactions, and are commonly used in business applications.


- NoSQL databases, on the other hand, are designed to handle large amounts of unstructured or semi-structured data, such as social media posts, log files, or user-generated content. They use a variety of data storage models, including key-value, document-based, column-based, and graph databases. NoSQL databases are designed to be horizontally scalable, allowing them to handle large amounts of data and high levels of traffic.


___In summary, relational databases are well suited for structured data, while NoSQL databases are designed to handle unstructured data and scale horizontally.___

## Difference between SQL and MYSQL

SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. It's used to insert, update, and retrieve data in a database.


MySQL is an open-source relational database management system that uses SQL as its primary language. 

In other words, MySQL is a database management system that implements the SQL language. It's one of the most popular database systems in use today and is widely used for web applications and data storage.


__To put it simply, SQL is a language, while MySQL is a database management system that uses the SQL language.__

## Database

### What is a Database?

A database is an organized collection of data stored in the form of tables and accessed electronically. It's designed to help users manage, manipulate, and retrieve data efficiently. Databases are used in a wide range of applications, including e-commerce, financial systems, and customer relationship management.

### Types of Database?

There are several types of databases, including:



- __Relational databases: Store data in tables with rows and columns and use structured query language (SQL) to access data.__
    - MYSQL
    - SQL Server
    - PostreSQL
    - SQLite
    - MariaDB



- __Non-relational databases (NOSQL): Store data in a format other than tables, such as key-value pairs, document-based, or graph databases.__
    - Hbase
    - mongodb
    - cassandra



- Centralized databases: Store data in a single, centralized location and allow multiple users to access the data from different locations.



- Distributed databases: Store data on multiple servers and allow multiple users to access the data from different locations.



- Operational databases: Store real-time data and are designed to support the day-to-day operations of an organization.



- Data warehouses: Store historical data for analysis and decision-making purposes.



- In-memory databases: Store data in RAM for faster access and processing.



- Cloud databases: Store data on remote servers and allow access over the internet.


These are the main types of databases, and different applications may use different types depending on their specific requirements.

## Relational databases : 

![Screenshot%202023-08-03%20033810.png](attachment:Screenshot%202023-08-03%20033810.png)

- columns = attributes


- rows = tuples


- number of rows = cardinality


- number of columns = degree of relation


- type of column = domain

## DBMS

- DBMS stands for "Database Management System." It is software that enables users and applications to interact with a database, providing functionalities for data storage, retrieval, modification, and management. 


- DBMS acts as an intermediary between users and the database, ensuring data integrity, security, and efficient access to the stored information. It allows users to define, create, and manipulate databases, making it easier to organize, query, and update data in a structured manner. 


- Examples of popular DBMSs include MySQL, Oracle, Microsoft SQL Server, and PostgreSQL.

![Screenshot%202023-08-03%20041411.png](attachment:Screenshot%202023-08-03%20041411.png)

## Difference between DMBS and RDBMS?

**DBMS (Database Management System):**

- Manages and stores data in a structured way.


- Provides basic data storage and retrieval capabilities.


- Doesn't enforce strong relationships between data elements.


- Can be non-relational and handle various data formats.


- Doesn't guarantee ACID properties (Atomicity, Consistency, Isolation, Durability).

**RDBMS (Relational Database Management System):**


- Organizes data into structured tables with predefined schemas.


- Enforces strong relationships between data using keys (primary, foreign).


- Uses SQL (Structured Query Language) for querying and manipulating data.


- Guarantees ACID properties for transactions.


- Ensures data integrity through constraints and normalization.

In simple terms, a DBMS is a broader term that includes systems managing any type of data, while an RDBMS specifically deals with structured data using tables and SQL.

## Database keys:

Keys are attributes or sets of attributes that play a fundamental role in ensuring data integrity, data uniqueness, and establishing relationships between tables. Keys are used to identify and access records in a database table efficiently. 

1. **Primary Key:** A unique identifier for each record in a table, ensuring data integrity and fast data retrieval.


2. **Foreign Key:** A column that establishes a link between two tables, enforcing referential integrity and maintaining data consistency across related tables. It refers to the primary key of another table.


3. **Candidate Key:** A potential primary key candidate that uniquely identifies records in a table but is not currently designated as the primary key.


4. **Unique Key:** Ensures that each value in the specified column is unique, but unlike the primary key, a table can have multiple unique keys.


5. **Composite Key:** A combination of two or more columns that, together, uniquely identify each record in a table.


6. **Super Key:** A set of one or more attributes that can uniquely identify records, including candidate keys and additional attributes.


7. **Alternate Key:** Another candidate key that is not chosen as the primary key.


8. **Surrogate Key:** When a table doesnot have a Primary or Composite key. A system-generated unique identifier used as the primary key when a natural primary key is not available or is not suitable for performance or security reasons.

### example : 
1. **Product_ID (Primary Key):** A unique identifier for each product.
2. **Product_Name:** The name of the product.
3. **Category_ID (Foreign Key):** A reference to the primary key of the "Categories" table, linking products to their respective categories.
4. **SKU (Unique Key):** A unique Stock Keeping Unit assigned to each product.
5. **Barcode (Unique Key):** A unique barcode for each product.
6. **Price:** The price of the product.
7. **Stock_Quantity:** The current stock quantity of the product.
8. **Product_Code (Composite Key):** A combination of Product_ID and Category_ID that uniquely identifies each product in the table.

Here's the modified sample table with some example data:

| Product_ID | Product_Name     | Category_ID | SKU      | Barcode      | Price   | Stock_Quantity | Product_Code |
|------------|------------------|-------------|----------|--------------|---------|----------------|--------------|
| 1          | Laptop           | 101         | LAP-001  | 123456789012 | $800.00 | 50             | 1-101        |
| 2          | Smartphone       | 102         | PHN-002  | 987654321098 | $500.00 | 100            | 2-102        |
| 3          | Headphones       | 103         | HPD-003  | 456789012345 | $50.00  | 200            | 3-103        |

In this updated example, Product_ID remains the primary key, Category_ID is the foreign key, SKU and Barcode are unique keys, and Product_Code is a composite key combining Product_ID and Category_ID to uniquely identify each product.

### Criteria to become Primary Key:

To become a primary key of a table, a column must fulfill the following criteria:

1. **Uniqueness:** Every value in the column must be unique; no two rows in the table can have the same value for the primary key column.


2. **Non-Nullability:** The primary key column must not allow NULL values. Each row in the table must have a valid value for the primary key.


3. **Irreducibility:** The primary key should be minimal, meaning it should consist of the smallest number of columns required to uniquely identify each row in the table.


4. **Stability:** The primary key value should not change over time, as it serves as a stable identifier for each row.


5. **Unchangeability:** The primary key value should not be modified after its initial insertion into the table to maintain data consistency.


6. **Uniformity:** The data type and format of the primary key column should be consistent across all rows in the table.


By satisfying these criteria, a column can be designated as the primary key of a table, ensuring data integrity and efficient data retrieval through fast indexing.

### Entity

Anything which can be a part of a table.

eg : student, restuarent, car number plate, age

## Cardinality of Relationships

Cardinality refers to the number of unique values or tuples (rows) present in a specific relation (table) of the database. It provides insight into the uniqueness and uniqueness constraints of the data within that relation.


There are three main types of cardinality:

1. **One-to-One (1:1) Cardinality:** In a one-to-one cardinality, each value in one table's column is related to exactly one value in another table's column, and vice versa. This relationship indicates a strict and unique pairing between rows in the two tables.
    - eg : each person is assigned a unique government-issued identification number (such as a Social Security Number). Each identification number corresponds to only one individual, and vice versa. This is an example of one-to-one cardinality.


2. **One-to-Many (1:N) Cardinality:** In a one-to-many cardinality, each value in one table's column can be related to multiple values in another table's column, but each value in the second table's column is related to only one value in the first table's column. This type of cardinality is the most common in database relationships.
    - eg : library database where each book has an ISBN (International Standard Book Number). Each ISBN can be associated with only one book, but each book can have multiple copies in the library. This represents a one-to-many cardinality.


3. **Many-to-Many (N:N) Cardinality:** In a many-to-many cardinality, each value in one table's column can be related to multiple values in another table's column, and vice versa. This relationship requires an intermediate table (often called a junction or link table) to create unique combinations of values between the two tables.
    - eg : In a university database, students can enroll in multiple courses, and each course can have multiple students. Therefore, there is a many-to-many relationship between students and courses, requiring an intermediate link table to track the enrollments.


Cardinality plays a crucial role in designing database schemas and establishing relationships between tables. Understanding the cardinality of relations helps ensure data consistency, efficiency, and appropriate database normalization.

![Screenshot%202023-08-03%20050202.png](attachment:Screenshot%202023-08-03%20050202.png)

### Drawbacks of databases:

![Screenshot%202023-08-03%20051052.png](attachment:Screenshot%202023-08-03%20051052.png)

___
___

### Special character use: 'kumar\\'s'

```sql
INSERT INTO employee (firstname, lastname, salary, phoneno, location) 
VALUES 
('kapil', 'kumar\'s', 10000, 943345566, 'bangalore')
or 
('kapil', "kumar's", 10000, 943345566, 'bangalore')
```

### View all the databases:

```sql
SHOW database;
```

### Create database:

```sql
CREATE database mohitdb;
```

### Use the current dabase:

```sql
USE mohitdb;
```

#### NOTE : don't use databse keyword while selecting the database

### See the tables present in the database:

```sql
SHOW tables;
```


### DROP DATABASE:

```sql
Drop database mohitdb;
```

### UNIQUE KEY:

In SQL, a unique key is a constraint that ensures that the values in a particular column or set of columns are unique across all rows in a table. 

### __There are two ways to create a unique key in SQL__

- **1. Single or multiple unique keys**

**Creating two separate unique keys:** In this approach, you create two separate unique keys, each on a different column. 

For example, consider a table named "Employee" with columns "EmpID", "FirstName", and "LastName". You can create two separate unique keys, one on the "EmpID" column and another on the "FirstName" and "LastName" columns:

```sql
CREATE TABLE Employee (
  EmpID INT PRIMARY KEY,
  FirstName VARCHAR(50),
  LastName VARCHAR(50),
  UNIQUE (EmpID),
  UNIQUE (FirstName, LastName));
```

-  __2. Combination of columns as a unique key__

__Combining two columns as a single unique key:__ In this approach, you create a single unique key on a combination of two columns. For example, consider the same "Employee" table:

```sql
CREATE TABLE Employee (
  EmpID INT PRIMARY KEY,
  FirstName VARCHAR(50),
  LastName VARCHAR(50),
  UNIQUE (FirstName, LastName));
```

### Difference between PRIMARY KEY and UNIQUE KEY

In SQL, both "PRIMARY KEY" and "UNIQUE KEY" are constraints used to ensure the uniqueness of values in a table. However, there are some differences between them:


- Cardinality - A primary key constraint must be unique across all rows in the table, and cannot contain null values. __A unique key constraint must also be unique, but can contain null values.__


- Number of Constraints - A table can have only one primary key constraint, but can have multiple unique key constraints.


- Indexing - A primary key creates a clustered index by default, while a unique key creates a non-clustered index by default. Clustered indexes physically reorder the rows in the table to match the index order, while non-clustered indexes create a separate structure that points to the original data.


- Reference in Foreign Key - A foreign key in another table must reference the primary key of the referenced table. A foreign key can also reference a unique key, but not recommended as it increases complexity.

###  Scaler and Aggregate functions:

For doing operations on data SQL has many built-in functions, they are categorized into two categories and further sub-categorized into seven different functions under each category. The categories are:


- __Aggregate functions :__ These functions are used to do operations from the values of the column and a single value is returned. It operate on multiple rows of data and return a single value that summarizes the data. Examples of aggregate functions include SUM, AVG, COUNT, MIN, and MAX. Aggregate functions are typically used in the SELECT clause of a query to calculate summary information for a group of rows, such as the total sum of values or the average of values.


- __Scalar functions :__ These functions are based on user input, these too return a single value. Scalar functions operate on a single row of data and return a single value for that row. Examples of scalar functions include mathematical operations like SUM, AVG, and COUNT, as well as string functions like UPPER, LOWER, and SUBSTRING. Scalar functions can be used in the SELECT and WHERE clauses of a query to modify or manipulate individual values in the result set.

### Default values:

It is the value that a column will take if no value is specified during an insert operation. You can specify a default value for a column in the table definition using the "DEFAULT" keyword. The syntax for setting a default value in SQL is as follows:

```sql
CREATE TABLE Employee (
    ID INT PRIMARY KEY,
    Name VARCHAR(50),
    Age INT DEFAULT 25);
```

### Creating tables:

```sql
CREATE TABLE employee (
    id INT AUTO_INCREMENT,
    firstname varchar(20) NOT NULL,
    lastname varchar(20),
    salary INT NOT NULL,
    phoneno INT,
    location varchar(20),
    PRIMARY KEY (id),
    UNIQUE (firstname, lastname));
```

### Description of a table

```sql
DESC Table_name;
```

![descsql.jpg](attachment:descsql.jpg)

## INSERTING VALUES in a table:

#### >> as id is in AUTO_INCREMENT we donot need to pass it in the column section

```sql
INSERT INTO employee (firstname, lastname, salary, phoneno, location) 
VALUES 
('kapil', 'sharma', 10000, 943345566, 'bangalore'),
('rohit','kumar',20000,746353532,'saharsa'),
('mohit','kumar',50000,986559,'patna'),
('mohit',NULL,40000,83736578,'patna');
```

![sql1.jpg](attachment:sql1.jpg)

>**we can pass NULL as value if NOT NULL is not defined for that column**

### NOTE : Don't use VALUES keyword when inserting more than one row from one table to another

#### eg : select statement inside INSERT : 

![image.png](attachment:image.png)

### Deleting tables in a database:

To delete a table from a database in SQL, you can use the "DROP TABLE" statement. The basic syntax for this statement is as follows:

```sql
DROP TABLE Employee;
```

## Difference between DELETE, TRUNCATE and DROP 

In SQL, "DELETE" and "DROP" are two separate statements used to remove data from a table.

- The "DELETE" statement is used to remove one or more rows from a table based on a specified condition. The basic syntax for the "DELETE" statement is as follows:

```sql
DELETE FROM table_name [WHERE condition];
```

- The "TRUNCATE" statement, on the other hand, is used to remove all data from a table, but the table structure remains intact. The basic syntax for the "TRUNCATE" statement is as follows:

```sql
TRUNCATE TABLE table_name;
```

- The "DROP" statement, on the other hand, is used to delete the entire table, including its structure and data. The basic syntax for the "DROP" statement is as follows:

```sql
DROP TABLE Employee;
```

__In summary, <br></br>"DELETE" statement is used to remove one or more rows from a table based on a specified condition, <br></br>"TRUNCATE" statement is used to remove all data from a table, but leave the table structure intact. <br></br>While the "DROP" statement is used to completely delete the entire table and its data.__

## DDL vs DML

__DDL $\Longrightarrow$ Data Definition Language.__

It includes the SQL commands that can be used to ___define the database schema.___ It simply deals with descriptions of the database schema and is used to create and modify the structure of database objects in the database. Examples of DDL statements include __CREATE, ALTER, DROP, TRUNCATE.__
<br></br>

__DML $\Longrightarrow$  Data Manipulation Language.__

It includes the SQL commands that can be used to ___manage data stored in the database.___ This includes inserting, updating, and deleting data. Examples of DML statements include __SELECT, INSERT, UPDATE, DELETE__

____In short, DDL is used to create and modify database structure, while DML is used to manage the data stored in the database.____

### DML : CRUD operations

**CRUD stands for "Create, Read, Update, and Delete"** and refers to the four basic operations that can be performed on a database. These operations are the foundation of database management. The equivalent CRUD operations in SQL are:



- **Create -** This operation is used to insert new data into a database. The "INSERT INTO" statement is used to perform this operation:

```sql
INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2, ...);
```

- **Read -** This operation is used to retrieve data from a database. The "SELECT" statement is used to perform this operation:

```sql
SELECT column1, column2, ... FROM table_name [WHERE condition];
```

- **Update -** This operation is used to modify existing data in a database. The "UPDATE" statement is used to perform this operation:

```sql
UPDATE table_name SET column1 = value1, column2 = value2, ... [WHERE condition];
```

- **Delete -** This operation is used to remove data from a database. The "DELETE" statement is used to perform this operation:

```sql
DELETE FROM your_table_name WHERE your_condition;
```

### WHERE clause()

```sql
select * from employee where location = 'patna';
```

![sql1111.jpg](attachment:sql1111.jpg)

### NOTE : SQL is case insensitive!!!

for it be case sentisitve use __binary__

```sql
select * from employee where binary location ='patna';
```

### ALIASING()

```sql
select firstname, salary, location, lastname AS Title from employee;
```

![image.png](attachment:image.png)

```sql
SELECT 
    os AS 'operating_system',
    model,
    price,
    rating 
FROM campusx.smartphones;
```

![image.png](attachment:image.png)

### UPDATE() - use SET

Updating rows in the table

```sql
UPDATE employee 
SET lastname = 'kr' 
WHERE (firstname = 'mohit' and salary = 40000);
```

![image.png](attachment:image.png)

#### updating salary by 700 : 

```sql
UPDATE employee set salary = salary + 700;
```

![image.png](attachment:image.png)

## ALTER - to alter schema of Table

The ALTER command in SQL is used to modify the structure of a database table after it has already been created. It can be used to add, modify or drop columns, add or drop constraints, or modify the properties of existing columns or constraints.

### 1. Adding a new column:

```sql
ALTER table employee ADD COLUMN age INT NOT NULL;
```

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### ADDING COLUMN between 2 Columns:

```sql
ALTER TABLE customers ADD COLUMN surname VARCHAR(30) NOT NULL AFTER NAME_TABLE
```

### 2. Modifying an existing column: use MODIFY

#### changing number of character limit in location column and setting it as not null and default = 'Bangalore' :

```sql
ALTER TABLE employee MODIFY COLUMN location varchar(27) DEFAULT 'Bangalore';
```

![image.png](attachment:image.png)

### 3. Dropping a column:

```sql
ALTER TABLE employee DROP COLUMN age;
```


### 4. Adding a new constraint:

```sql
ALTER TABLE table_name
ADD  constraint_name constraint_type (column_name);
```

##### add primary key id column

eg : 
```sql
ALTER TABLE employee add primary key(id);
```

### 5. Dropping a constraint:

```sql
ALTER TABLE table_name
DROP CONSTRAINT constraint_name;
```

###### drop primary key id column

eg : 
```sql
ALTER TABLE employee drop primary key;
```

##### NOTE : donot need to specify column name 'id' as a parameter while dropping PRIMARY KEY

It's important to note that you do not need to specify the name of the column or columns that make up the primary key when dropping the primary key constraint. The DROP PRIMARY KEY clause alone is enough to remove the constraint.

##### DROP a UNIQUE Constraint :

```sql
ALTER TABLE Persons
DROP INDEX UC_Person;
```

### NOTE : Constraints cannot be Modified, they need to be deleted and added again in the correct format

### 6. Renaming column names

```sql
ALTER TABLE sleep
CHANGE COLUMN `Wakeup time` wakeup_time VARCHAR(255);
```

#### note: use tilde ` insted of ' or " for column name

___
___

### DATA INTEGRITY:

Session 31 - SQL DDL Commands : https://www.youtube.com/watch?v=ny1mh6VUpnQ


Data integrity in databases refers to the ___accuracy, completeness, and consistency___
of the data stored in a database. 


It is a measure of the reliability and
trustworthiness of the data and ensures that the data in a database is protected
from errors, corruption, or unauthorized changes.

#### There are various methods used to ensure data integrity, including:

- __Constraints:__ Constraints in databases are rules or conditions that must be met for data to be
inserted, updated, or deleted in a database table. They are used to enforce the
integrity of the data stored in a database and to prevent data from becoming
inconsistent or corrupted.



- __Transactions:__ a sequence of database operations that are treated as a single unit
of work.




- __Normalization:__ a design technique that minimizes data redundancy and ensures
data consistency by organizing data into separate tables.

### CONSTRAINTS : 

Constraints in MySQL are rules that enforce data integrity and consistency. They specify the conditions that data must meet in order to be inserted, updated, or deleted from a table. Constraints ensure that the data in a table remains consistent and meets certain requirements.

There are several types of constraints in MySQL:

- __PRIMARY KEY:__ Enforces uniqueness and defines a column or a combination of columns as the primary key of a table.


- __FOREIGN KEY:__ Enforces referential integrity and ensures that the values in a foreign key column match the values in the referenced column of a referenced table.


- __UNIQUE:__ Enforces uniqueness and ensures that the values in a column or a combination of columns are unique within the table.


- __NOT NULL:__ Enforces non-NULL values and ensures that a value is entered in a column for every row in a table.


- __CHECK:__ Enforces conditional constraints and allows you to specify conditions that data must meet in order to be inserted, updated, or deleted from a table.


- __DEFAULT:__ Specifies a default value for a column.


- __AUTO-INCREMENT:__



___Constraints can be specified when creating a table, or they can be added or modified later using ALTER TABLE statements. They are an important tool for maintaining the integrity and consistency of your data.___

```sql
CREATE TABLE Employees (
    EmployeeID INT AUTO_INCREMENT PRIMARY KEY,
    EmployeeName VARCHAR(50) NOT NULL,
    EmployeeCode INT UNIQUE,
    Salary DECIMAL(10, 2) DEFAULT 0.00,
    Department VARCHAR(100) CHECK (LENGTH(Department) > 2),
    HireDate DATE,
    ManagerID INT,
    FOREIGN KEY (ManagerID) REFERENCES Managers(ManagerID)
);
```


In this query:


- `EmployeeID` is an ___auto-incrementing primary key column___ representing the unique identifier for each employee.


- `EmployeeName` is a VARCHAR column storing the name of the employee, and it ___cannot be NULL___.


- `EmployeeCode` is an ___INT column with a UNIQUE constraint,___ ensuring each employee has a unique employee code.


- `Salary` is a DECIMAL column representing the ___salary of the employee, with a default value of 0.00.___


- `Department` is a VARCHAR column representing the department in which the employee works. It has a ___CHECK constraint to ensure the department name is longer than 2 characters.___


- `HireDate` is a DATE column representing the date on which the employee was hired.


- `ManagerID` is an INT column serving as a ___foreign key, referencing the primary key `ManagerID`___ in the table `Managers`. It establishes a relationship between employees and their managers.

#### Q - constraint is comibnation of (name, email and password) cannot be duplicate:

![Screenshot%202023-08-03%20053655.png](attachment:Screenshot%202023-08-03%20053655.png)

## <span class="mark">CHECK</span>

The CHECK constraint is used to limit the value range that can be placed in a column.

If you define a CHECK constraint on a column it will allow only certain values for this column.

If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row.

```sql
CREATE TABLE Persons (
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    CHECK (Age>=18)
);
```

## FOREIGN KEY

- A foreign key is a column in a table that is a reference to the primary key of another table. 


- It is used to establish a relationship between two tables, ensuring data integrity and consistency. 


- The purpose of a foreign key is to prevent actions that would create orphaned records in the child table (referenced table) when records in the parent table (referencing table) are deleted or updated.


- __Foreign Key constraint is used to prevent actions that would destroy links between two tables__

### NOTE : The table with the foreign key is called child table, the table with the primary key is called the parent table or refrenced table

In MySQL, when you create a foreign key constraint, the referenced column (in this case, `id` in `test2`) must have an index on it. This index is used to enforce referential integrity.



```sql
ALTER TABLE test2 ADD INDEX (id);
```

After creating the index, you should be able to add the foreign key constraint without any issues:

```sql
ALTER TABLE test3 ADD FOREIGN KEY (sl_no) REFERENCES test2 (id);
```

Make sure to create the index in the referenced table (parent_table) before creating the foreign key constraint in the referencing table (child).

#### example : 


```sql
ALTER TABLE students ADD FOREIGN KEY(course_id) REFERENCES courses(course_id);
```

- Parent table - courses


- Child table - students


### __because if a course id is not present in courses table(parent) we cannot add it in students table__

```sql
-- Create the main table

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName VARCHAR(100),
    DepartmentID INT
);
```

```sql
-- Create the referenced table

CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY,
    DepartmentName VARCHAR(100)
);
```

```sql
-- Add a foreign key constraint to link Employees table to Departments table

ALTER TABLE Employees ADD FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID);
```

### Benefits of adding foreign key:

- __Enforcing referential integrity :__ A foreign key constraint is used to enforce referential integrity, which ensures that data entered into the database is consistent and accurate. The foreign key constraint ensures that a value in the foreign key column of a table must match a value in the primary key column of another table.



- __Preventing orphaned records :__ A foreign key constraint helps to prevent orphaned records, which are records that have no corresponding parent record in the referenced table. When a foreign key constraint is in place, the database will not allow you to delete a record in the referenced table if there are matching records in the referencing table.


- __Simplifying data retrieval :__ By establishing relationships between tables through foreign key constraints, you can simplify data retrieval by joining tables to retrieve related data.


- __Improving database performance :__ By establishing relationships between tables through foreign key constraints, the database can use the relationships to optimize query performance by using indexes on the foreign key and primary key columns.


__Overall, adding a foreign key and referencing it is an important aspect of database design and helps to ensure the integrity, accuracy, and performance of your database.__

### How Foreign Key can be violated

Foreign key constraints can be violated in several ways. Some common violations include:

- Inserting a value into the foreign key column that doesn't match a value in the referenced column: This can occur when you try to insert a row into a table that has a foreign key constraint, and the value in the foreign key column doesn't match a value in the referenced column of the referenced table.


- Deleting a row from the referenced table: This can occur when you try to delete a row from the referenced table that is being referenced by one or more rows in the referencing table.


- Updating a value in the referenced column: This can occur when you try to update a value in the referenced column that is being referenced by one or more rows in the referencing table.


- Setting the foreign key column to NULL: This can occur when you try to insert or update a row in the referencing table and set the value in the foreign key column to NULL, but the foreign key constraint requires non-NULL values.


- Duplicate values in the foreign key column: This can occur when you try to insert or update a row in the referencing table and the value in the foreign key column is not unique, but the foreign key constraint requires unique values.

### adding courseno from students table and courseid from courses table as FOREIGN KEY

```sql
ALTER TABLE students ADD FOREIGN KEY(course_no) REFRENCES courses(course_id);
```

### <span class="mark">CASCADE KEYS</span>

- Cascading in MySQL, specifically in the context of database operations, refers to the behavior that occurs when you perform certain operations on a parent table that has associated child tables with foreign key relationships. 


- Cascade actions define what should happen to the child records when certain operations are performed on the parent recor


- When a CASCADE action is specified for a foreign key, it means that changes made to the referenced primary key in the parent table will automatically propagate to the child table with the foreign key.ds.

There are several types of CASCADE actions that can be applied to foreign keys:

- __CASCADE UPDATE:__ When the primary key value in the parent table is updated, the corresponding foreign key value in the child table will also be updated automatically.


- __CASCADE DELETE:__ When a row is deleted from the parent table, all related rows in the child table with matching foreign key values will also be automatically deleted.

Here's an example SQL query that demonstrates the implementation of a CASCADE DELETE foreign key constraint:

```sql
-- Create the parent table
CREATE TABLE Authors (
    AuthorID INT PRIMARY KEY,
    AuthorName VARCHAR(100)
);

-- Create the child table with a foreign key referencing the Authors table
CREATE TABLE Books (
    BookID INT PRIMARY KEY,
    BookTitle VARCHAR(200),
    AuthorID INT,
    CONSTRAINT fk_AuthorID
        FOREIGN KEY (AuthorID)
        REFERENCES Authors(AuthorID)
        ON DELETE CASCADE
);

-- Insert some data into the Authors table
INSERT INTO Authors (AuthorID, AuthorName) VALUES
(1, 'John Doe'),
(2, 'Jane Smith');

-- Insert some data into the Books table
INSERT INTO Books (BookID, BookTitle, AuthorID) VALUES
(101, 'Book 1', 1),
(102, 'Book 2', 1),
(103, 'Book 3', 2);

-- Now, let's delete the author with AuthorID 1
DELETE FROM Authors WHERE AuthorID = 1;
```

In this example, the foreign key constraint `fk_AuthorID` in the `Books` table has the `ON DELETE CASCADE` option, meaning that when an author with `AuthorID = 1` is deleted from the `Authors` table, all related books with `AuthorID = 1` will also be automatically deleted from the `Books` table. This ensures that the database remains consistent and avoids orphaned records in the child table.

### DISTINCT()

### DISTINCT COMBINATION OF 2 COLUMNS:

![image.png](attachment:image.png)

### ORDER BY()

### ORDER BY on 2 columns, one with ASCENDING other with DESCENDING:

### WILDCARD

A wildcard character is used to substitute one or more characters in a string.

Wildcard characters are used with the operator. The operator is used in a clause to search for a specified pattern in a column. LIKE WHERE

- __% : Matches zero or more characters.__ 

For example, the pattern '%m%' matches any string that contains an "m" character, such as "mat", "rampart", or "maze".


- _ __(underscore):__ Matches a single character. 

For example, the pattern '_og' matches any string that contains exactly three characters and ends with "og", such as "dog" or "fog".

These wildcard characters can be used with the LIKE operator and the REGEXP operator in SQL to perform flexible and powerful searches on data stored in tables. Note that the use of wildcard characters may impact the performance of your queries, especially if the data you're searching is large or complex.

### LIKE ()

###### patterns of location which starts with pat or ran:

![image.png](attachment:image.png)

###### name of students with 2 charcaters

![image.png](attachment:image.png)

##### starts with letter 'a' and have atleast 3 characters : 

WHERE CustomerName LIKE  __'a_%_%'__	

Finds any values that starts with "a" and are at least 3 characters in length

# ORDER OF EXECUTION :

https://www.youtube.com/watch?v=JUCTcHsNkyM&list=PLtgiThe4j67rAoPmnCQmcgLS4iIc5ungg&index=8

![image.png](attachment:image.png)

__Flash jumped Wonder girl having se* during office__

The order of execution in SQL refers to the order in which SQL statements are executed and the order in which the data is processed. In general, the order of execution in SQL can be summarized as follows:

- __From Clause:__ The database engine selects the data from the specified tables and views.


- __Where Clause:__ The database engine filters the data based on the conditions specified in the WHERE clause.


- __Group By Clause:__ The database engine groups the data based on the columns specified in the GROUP BY clause.


- __Having Clause:__ The database engine filters the grouped data based on the conditions specified in the HAVING clause.


- __Select Clause:__ The database engine applies the calculations and functions specified in the SELECT clause.


- __Order By Clause:__ The database engine sorts the data based on the columns specified in the ORDER BY clause.


- __Limit Clause:__ The database engine limits the number of rows returned by the query based on the value specified in the LIMIT clause.

# AGGREGATE FUNCTIONS

- __AVG:__ returns the average value of a column.


- __COUNT:__ returns the number of rows in a column.


- __SUM:__ returns the sum of values in a column.


- __MIN:__ returns the minimum value in a column.


- __MAX:__ returns the maximum value in a column.

### 1. COUNT()

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### MEDIAN()

- **total odd numbers :** middle most number when values sorted ascending or descening.

- **total even numbers :**  average of middle numbers.

- https://www.youtube.com/watch?v=fwPk1RXlorQ&ab_channel=AnkitBansal

### Median for even number of rows:

#### step 1:

```sql
select *, total_rows * 1.0 /2,(total_rows*1.0/2) +1
from
  (select lat,
  COUNT(*) OVER()  as total_rows,
  ROW_NUMBER() OVER(ORDER BY lat asc) as rn
  from LAT_N)as
t1
WHERE rn between total_rows * 1.0 /2 AND (total_rows*1.0/2) +1;
```

![image.png](attachment:image.png)

#### step 2 : 

```sql
select avg(lat)
from
  (select lat,
  COUNT(*) OVER()  as total_rows,
  ROW_NUMBER() OVER(ORDER BY lat asc) as rn
  from LAT_N)as
t1
WHERE rn between total_rows * 1.0 /2 AND (total_rows*1.0/2) +1;
```

![image.png](attachment:image.png)

### Median for even number of rows: (same works for odd too)

```sql
select *, total_rows * 1.0 /2,(total_rows*1.0/2) +1
from
  (select lat,
  COUNT(*) OVER()  as total_rows,
  ROW_NUMBER() OVER(ORDER BY lat asc) as rn
  from LAT_N)as
t1
WHERE rn between total_rows * 1.0 /2 AND (total_rows*1.0/2) +1;
```

![image.png](attachment:image.png)

```sql
select avg(lat) as average_lat
from
  (select lat,
  COUNT(*) OVER()  as total_rows,
  ROW_NUMBER() OVER(ORDER BY lat asc) as rn
  from LAT_N)as
t1
WHERE rn between total_rows * 1.0 /2 AND (total_rows*1.0/2) +1;
```

![image.png](attachment:image.png)

### Question : finding median company wise:

```sql
select company_name,total_rows,employee_salary,
total_rows*1.0/2,
  (total_rows*1.0/2)+1
from
  (select *,
  ROW_NUMBER() OVER(PARTITION BY company_name ORDER BY employee_salary asc) as rn,
  COUNT(*) OVER(PARTITION BY company_name)  as total_rows
  from salary)as
t1
```

![Screenshot%202023-09-08%20032132.png](attachment:Screenshot%202023-09-08%20032132.png)

#### now finding median (adding where clause):

```sql
select company_name,total_rows,employee_salary,
total_rows*1.0/2,
  (total_rows*1.0/2)+1
from
  (select *,
  ROW_NUMBER() OVER(PARTITION BY company_name ORDER BY employee_salary asc) as rn,
  COUNT(*) OVER(PARTITION BY company_name)  as total_rows
  from salary)as
t1
WHERE rn between total_rows * 1.0 /2 AND (total_rows*1.0/2) +1;

```

![image.png](attachment:image.png)

#### we can see even number of rows still doesnot have median, it has 2 values. 


#### so we group by on company_name and will calculate average of those 2 values: 

```sql
select company_name,total_rows,avg(employee_salary),
total_rows*1.0/2,
  (total_rows*1.0/2)+1
from
  (select *,
  ROW_NUMBER() OVER(PARTITION BY company_name ORDER BY employee_salary asc) as rn,
  COUNT(*) OVER(PARTITION BY company_name)  as total_rows
  from salary)as
t1
WHERE rn between total_rows * 1.0 /2 AND (total_rows*1.0/2) +1
GROUP BY company_name;
```

![image.png](attachment:image.png)

### 2. Group BY()

##### THIS WON"T WORK

___

![image.png](attachment:image.png)

##### GROUPBY on multiple columns

![image.png](attachment:image.png)

# GROUP BY - CampusX

https://youtu.be/nsKcmOly0UY?t=2399

#### Group smartphones by brand and get the count, average price, max rating, avg screen size, and avg battery capacity

```sql
SELECT brand_name, 
count(*) as num_phones, 
AVG(price) as average_price, 
MAX(rating) as max_rating,
AVG(screen_size) as average_screen_size,
AVG(battery_capacity) as average_battery
FROM smartphones
GROUP BY brand_name
ORDER BY num_phones DESC;
```

![image.png](attachment:image.png)

## GROUP BY on 2 columns

#### Group smartphones by the brand and processor brand and get the average price and the average primary camera resolution (rear)

```sql
SELECT brand_name, processor_brand,
AVG(price) as average_price,
AVG(primary_camera_rear) as average_primary_camera_rear
FROM smartphones
GROUP BY brand_name,processor_brand
ORDER BY brand_name;
```

![image.png](attachment:image.png)

### 3. MIN

![image.png](attachment:image.png)

##### This won't work : student name with min years of experience

##### instead this will be correct

![image.png](attachment:image.png)

### 4. MAX

##### MAX number of years_of_joing from each student_company

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### 5. SUM()

![image.png](attachment:image.png)

### 6. AVERAGE()

![image.png](attachment:image.png)

### 7. STD() - standard deviation

![image.png](attachment:image.png)

### 8. VARIANCE() - Variance

![image.png](attachment:image.png)

## ROUND OF DECIMALS:

while creating the column instead of INT we can pass DECIMAL(5,2) i.e 5 digits before decimal and in that 2 digit after decimal

![image.png](attachment:image.png)

### BETWEEN ()

```sql
select student_fname,student_lname,source_of_joining, student_company,years_of_experience 
from students 
WHERE years_of_experience BETWEEN 8 and 12;
```

![image.png](attachment:image.png)

### IN()

```sql
select student_fname,student_lname,source_of_joining, student_company,years_of_experience 
from students 
WHERE student_company IN ('walmart','amazon','flipkart');
```

![image.png](attachment:image.png)

### NOT IN()

```sql
select student_fname,student_lname,source_of_joining, student_company,years_of_experience 
from students 
WHERE student_company NOT IN ('walmart','amazon','flipkart');
```

![image.png](attachment:image.png)

## CAST ()

The CAST() function converts a value (of any type) into the specified datatype.

#### euclidean distance : 

```sql
SELECT 
CAST(SQRT(POWER(MAX(LAT_N)-MIN(LAT_N),2) + POWER(MAX(LONG_W)-MIN(LONG_W),2)) AS DECIMAL (9,4))
FROM STATION;
```

### ROUND()

__The ROUND() function rounds a number to a specified number of decimal places.__

eg : Round the number to 2 decimal places

135.38

### CEIL ()

The CEIL() function returns the smallest integer value that is bigger than or equal to a number.


__Note: This function is equal to the CEILING() function.__

26

### FLOOR()

The FLOOR() function returns the largest integer value that is smaller than or equal to a number.

Note: Also look at the ROUND(), CEIL(), CEILING(), TRUNCATE(), and DIV functions.

25

### TRUNCATE ()

The TRUNCATE() function truncates a number to the specified number of decimal places.

eg :

```sql
SELECT TRUNCATE(135.37565768, 2);
```

**output : 135.37**

eg : Return a number truncated to 0 decimal places:

```sql
SELECT TRUNCATE(345.156, 0);
```

**output : 345**

Parameter	$\longrightarrow$ Description

number	$\longrightarrow$    Required. The number to be truncated

decimals	$\longrightarrow$ Required. The number of decimal places to truncate to

### ABS() : Absolute value of a number i.e negative becomes positive number

```sql
SELECT ABS(1 - price) As 'temp' FROM smartphones;
```

![image.png](attachment:image.png)

### 3rd largest:

#### find the phone with 3rd largest battery

```sql
SELECT model, battery_capacity
FROM smartphones
ORDER BY battery_capacity DESC
LIMIT 2,1;
```

![image.png](attachment:image.png)

#### LIMIT 2,1 means go to 2nd row and 1 means take the next row after 2nd row.

### 3rd and 4th Largest:

```sql
SELECT model, battery_capacity
FROM smartphones
ORDER BY battery_capacity DESC
LIMIT 2,2;
```

#### LIMIT 2,2 means go to 2nd row and 2 means take the next  2 rows i.e 3rd and 4th after 2nd row.

![image.png](attachment:image.png)

# CASES()

the CASE statement is used to control the flow of a query by allowing you to perform conditional logic in a query. 

The CASE statement is often used in the SELECT clause of a query to transform data on the fly and to provide an alternate value if a condition is met.

- The CASE statement always goes in the SELECT clause:


- __CASE must include the following components: WHEN, THEN, and END.__ ELSE is an optional component.


- You can make any conditional statement using any conditional operator (like WHERE ) between WHEN and THEN. This includes stringing together multiple conditional statements using AND and OR.


- You can include multiple WHEN statements, as well as an ELSE statement to deal with any unaddressed conditions.

### example 1 : 

##### for years_of_experience with less than 4  label as fresher between 4 to 8 as lead else MANAGER in a new column

```sql
select student_id, student_fname, source_of_joining, student_company, years_of_experience,

CASE

    WHEN years_of_experience IS NULL THEN 'whoareyou'

    WHEN years_of_experience < 4 THEN 'fresher'

    WHEN years_of_experience BETWEEN 4 and 8 THEN 'Lead'

    ELSE 'MANAGER'

END AS 'POSITION'

FROM students;
```

![image.png](attachment:image.png)

### example 2 : 

__Q - Write a query identifying the type of each record in the TRIANGLES table using its three side lengths. Output one of the following statements for each record in the table:__

- __Equilateral :__ It's a triangle with  3 sides of equal length.
- __Isosceles :__ It's a triangle with  2 sides of equal length.
- __Scalene :__ It's a triangle with 3 sides of differing lengths.
- __Not A Triangle :__ The given values of A, B, and C don't form a triangle.

```sql
SELECT *,
CASE
    WHEN ((A+B)>C AND (A+C)>B AND (B+C)>A) AND (A=B AND B=C AND A=C) THEN 'Equilateral'
    WHEN ((A+B)>C AND (A+C)>B AND (B+C)>A) AND (A!=B AND B!=C AND A!=C)  THEN 'Scalene'
    WHEN ((A+B)>C AND (A+C)>B AND (B+C)>A) AND ((A!=C AND A=B) OR (B!=C AND A=C) OR (A!=B AND B=C)) THEN 'Isosceles' 
    ELSE 'Not A Triangle'
END
FROM TRIANGLES
```

![Screenshot%202023-09-06%20170955.png](attachment:Screenshot%202023-09-06%20170955.png)

### Question: Create a pivot of the input data in sql

https://www.youtube.com/watch?v=4p-G7fGhqRk&list=PLavw5C92dz9Ef4E-1Zi9KfCTXS_IN8gXZ&index=14

#### Input and output:

![Screenshot%202023-08-27%20004257.png](attachment:Screenshot%202023-08-27%20004257.png)

#### customer and month wise cross tab

```sql
select customer_id,
    SUM(CASE WHEN monthname(sales_date) = 'January' THEN amount ELSE 0 END) as 'Jan-21',
    SUM(CASE WHEN monthname(sales_date) = 'February' THEN amount ELSE 0 END) as 'Feb-21',
    SUM(CASE WHEN monthname(sales_date) = 'March' THEN amount ELSE 0 END) as 'Mar-21',
    SUM(CASE WHEN monthname(sales_date) = 'April' THEN amount ELSE 0 END) as 'Apr-21',
    SUM(CASE WHEN monthname(sales_date) = 'May' THEN amount ELSE 0 END) as 'May-21',
    SUM(CASE WHEN monthname(sales_date) = 'June' THEN amount ELSE 0 END) as 'June-21',
    SUM(CASE WHEN monthname(sales_date) = 'July' THEN amount ELSE 0 END) as 'July-21',
    SUM(CASE WHEN monthname(sales_date) = 'August' THEN amount ELSE 0 END) as 'Aug-21',
    SUM(CASE WHEN monthname(sales_date) = 'September' THEN amount ELSE 0 END) as 'Sept-21',
    SUM(CASE WHEN monthname(sales_date) = 'October' THEN amount ELSE 0 END) as 'Oct-21',
    SUM(CASE WHEN monthname(sales_date) = 'November' THEN amount ELSE 0 END) as 'Nov-21',
    SUM(CASE WHEN monthname(sales_date) = 'December' THEN amount ELSE 0 END) as 'Dec-21'
  from test3
  group by customer_id;
```

![image.png](attachment:image.png)

#### month wise cross tab : 

```sql

select 'Total' kungfu, -- adding total box in front
    SUM(CASE WHEN monthname(sales_date) = 'January' THEN amount ELSE 0 END) as 'Jan-21',
    SUM(CASE WHEN monthname(sales_date) = 'February' THEN amount ELSE 0 END) as 'Feb-21',
    SUM(CASE WHEN monthname(sales_date) = 'March' THEN amount ELSE 0 END) as 'Mar-21',
    SUM(CASE WHEN monthname(sales_date) = 'April' THEN amount ELSE 0 END) as 'Apr-21',
    SUM(CASE WHEN monthname(sales_date) = 'May' THEN amount ELSE 0 END) as 'May-21',
    SUM(CASE WHEN monthname(sales_date) = 'June' THEN amount ELSE 0 END) as 'June-21',
    SUM(CASE WHEN monthname(sales_date) = 'July' THEN amount ELSE 0 END) as 'July-21',
    SUM(CASE WHEN monthname(sales_date) = 'August' THEN amount ELSE 0 END) as 'Aug-21',
    SUM(CASE WHEN monthname(sales_date) = 'September' THEN amount ELSE 0 END) as 'Sept-21',
    SUM(CASE WHEN monthname(sales_date) = 'October' THEN amount ELSE 0 END) as 'Oct-21',
    SUM(CASE WHEN monthname(sales_date) = 'November' THEN amount ELSE 0 END) as 'Nov-21',
    SUM(CASE WHEN monthname(sales_date) = 'December' THEN amount ELSE 0 END) as 'Dec-21'
  from test3;
```

![image.png](attachment:image.png)

#### sum of total amount customer wise :

```sql
select customer_id,sum(amount) as `Total`
from test3
group by customer_id
```

![image.png](attachment:image.png)

### final output: 

- customer and month wise cross tab UNION month wise cross tab (tablename - t)


- LEFT JOIN with sum of total amount customer wise (tablename t2)

```sql
SELECT DISTINCT customer_id,monthname(sales_date),
SUM(amount) OVER(PARTITION BY customer_id, monthname(sales_date) ORDER BY customer_id)
FROM test3;

select t.*,t2.Total from 
  (select customer_id,
    SUM(CASE WHEN monthname(sales_date) = 'January' THEN amount ELSE 0 END) as 'Jan-21',
    SUM(CASE WHEN monthname(sales_date) = 'February' THEN amount ELSE 0 END) as 'Feb-21',
    SUM(CASE WHEN monthname(sales_date) = 'March' THEN amount ELSE 0 END) as 'Mar-21',
    SUM(CASE WHEN monthname(sales_date) = 'April' THEN amount ELSE 0 END) as 'Apr-21',
    SUM(CASE WHEN monthname(sales_date) = 'May' THEN amount ELSE 0 END) as 'May-21',
    SUM(CASE WHEN monthname(sales_date) = 'June' THEN amount ELSE 0 END) as 'June-21',
    SUM(CASE WHEN monthname(sales_date) = 'July' THEN amount ELSE 0 END) as 'July-21',
    SUM(CASE WHEN monthname(sales_date) = 'August' THEN amount ELSE 0 END) as 'Aug-21',
    SUM(CASE WHEN monthname(sales_date) = 'September' THEN amount ELSE 0 END) as 'Sept-21',
    SUM(CASE WHEN monthname(sales_date) = 'October' THEN amount ELSE 0 END) as 'Oct-21',
    SUM(CASE WHEN monthname(sales_date) = 'November' THEN amount ELSE 0 END) as 'Nov-21',
    SUM(CASE WHEN monthname(sales_date) = 'December' THEN amount ELSE 0 END) as 'Dec-21'
  from test3
  group by customer_id 

   UNION

select 'Total' kungfu, -- for 'Total' cell
    SUM(CASE WHEN monthname(sales_date) = 'January' THEN amount ELSE 0 END) as 'Jan-21',
    SUM(CASE WHEN monthname(sales_date) = 'February' THEN amount ELSE 0 END) as 'Feb-21',
    SUM(CASE WHEN monthname(sales_date) = 'March' THEN amount ELSE 0 END) as 'Mar-21',
    SUM(CASE WHEN monthname(sales_date) = 'April' THEN amount ELSE 0 END) as 'Apr-21',
    SUM(CASE WHEN monthname(sales_date) = 'May' THEN amount ELSE 0 END) as 'May-21',
    SUM(CASE WHEN monthname(sales_date) = 'June' THEN amount ELSE 0 END) as 'June-21',
    SUM(CASE WHEN monthname(sales_date) = 'July' THEN amount ELSE 0 END) as 'July-21',
    SUM(CASE WHEN monthname(sales_date) = 'August' THEN amount ELSE 0 END) as 'Aug-21',
    SUM(CASE WHEN monthname(sales_date) = 'September' THEN amount ELSE 0 END) as 'Sept-21',
    SUM(CASE WHEN monthname(sales_date) = 'October' THEN amount ELSE 0 END) as 'Oct-21',
    SUM(CASE WHEN monthname(sales_date) = 'November' THEN amount ELSE 0 END) as 'Nov-21',
    SUM(CASE WHEN monthname(sales_date) = 'December' THEN amount ELSE 0 END) as 'Dec-21'
  from test3) as t

LEFT JOIN

(select customer_id,sum(amount) as `Total`
from test3
group by customer_id) as t2
ON t.customer_id=t2.customer_id
```

![image.png](attachment:image.png)

___
___

# JOINS

https://www.youtube.com/watch?v=9joG6P9ZhPM&list=PLtgiThe4j67rAoPmnCQmcgLS4iIc5ungg&index=12

![joins_SQL.png](attachment:joins_SQL.png)

![image.png](attachment:image.png)

In SQL, there are several types of joins that you can use to combine data from two or more tables. The main types of joins are:

### 1. Inner Join: 

> Returns only the rows that have matching values in both tables.

```sql
SELECT *
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
```

```sql
select * from superhero INNER JOIN identity ON superhero.superhero_name = identity.superhero;
```

![image.png](attachment:image.png)

### 2. Left Join (or Left Inclusive Join):

> Returns all the rows from the left table (table1), and the matching rows from the right table (table2). If there is no match, NULL values will be returned for right table's columns.

```sql
SELECT *
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
```

```sql
select * from superhero LEFT JOIN identity ON superhero.superhero_name = identity.superhero;
```

![image.png](attachment:image.png)

##### using Left join (Left Exclusive)

```sql
select child.member_id as child_id, child.name as child_name, child.age as child_age, parent.name as parent_name, parent.age as parent_age, child.parent_id as parent_id
from relations as child

LEFT JOIN relations as parent

ON child.parent_id = parent.member_id
WHERE child.parent_id is NOT NULL;
```

![image.png](attachment:image.png)

### 3. Right Join (or Right Inclusive Join):

> Returns all the rows from the right table (table2), and the matching rows from the left table (table1). If there is no match, NULL values will be returned for left table's columns.

```sql
SELECT *
FROM table1
RIGHT JOIN table2
ON table1.column = table2.column;
```

```sql
select * from superhero RIGHT JOIN identity ON superhero.superhero_name = identity.superhero;
```

![image.png](attachment:image.png)

##### using right join (Right Exclusive)

```sql
SELECT child.member_id AS child_id, child.name AS child_name, child.age AS child_age, parent.name AS parent_name, parent.age AS parent_age, child.parent_id AS parent_id
from relations AS child

RIGHT JOIN relations AS parent

ON child.parent_id = parent.member_id
WHERE child.parent_id is NOT NULL;
```

![image.png](attachment:image.png)

### 4. Full Outer Join:

> Returns all the rows from both tables, and matching rows will have matching values. If there is no match, NULL values will be returned for non-matching columns.

```sql
SELECT *
FROM table1
FULL OUTER JOIN table2
ON table1.column = table2.column;
```

### NOTE : MYSQL doesnot have FULL OUTER JOIN instead use combining a LEFT JOIN and a RIGHT JOIN:

```sql
SELECT * from superhero 
LEFT JOIN identity ON superhero.superhero_name = identity.superhero
UNION
SELECT * from superhero
RIGHT JOIN identity ON superhero.superhero_name = identity.superhero;
```

![image.png](attachment:image.png)

### 5. Cross Join: 

> Returns the Cartesian product of the two tables, which means it returns every possible combination of rows from both tables.

### NOTE : in CROSS JOIN we donot need a common key

```sql
SELECT *
FROM table1
CROSS JOIN table2;
```

```sql
select * from superhero CROSS JOIN identity;
```

![cross_join_sql.jpg](attachment:cross_join_sql.jpg)

### 6. SELF JOIN :

> A SELF JOIN in SQL is a regular join, but the table is joined with itself. 

>In other words, a SELF JOIN combines rows from the same table based on a related column between two or more rows within the same table.

To perform a SELF JOIN, you must specify an alias for the table because you're joining it with itself. This allows you to distinguish between the two copies of the same table in the join condition.

```sql
SELECT *
FROM table_name AS t1
JOIN table_name AS t2
ON t1.column_name = t2.column_name;
```

### NOTE : we are joining same table with itself, so both table names will be set as different

### eg: fetch child name and their age corresponding to their parent and parent age:

#### parent_id will be matched with the same member_id and name will be fetched

![image.png](attachment:image.png)

##### using self join

```sql
SELECT child.member_id AS child_id, child.name AS child_name, child.age AS child_age, parent.name AS parent_name, parent.age AS parent_age, child.parent_id AS parent_id
from relations AS child

JOIN relations AS parent

ON child.parent_id = parent.member_id;
```

![image.png](attachment:image.png)

### example of self join : 

![Screenshot%202023-09-08%20040413.png](attachment:Screenshot%202023-09-08%20040413.png)

##### the joined table :

![Screenshot%202023-09-08%20040709.png](attachment:Screenshot%202023-09-08%20040709.png)

##### self join now to get friend's name

```sql

with cte as 
    (SELEct s.*,f.friend_id from students_p as s
    JOIN friends_p as f
    ON s.id = f.id)
    
select * from cte
JOIN students_p as t2
ON cte.friend_id=t2.id
ORDER BY cte.id asc;
```

![Screenshot%202023-09-08%20040830.png](attachment:Screenshot%202023-09-08%20040830.png)

## JOIN on basis of 2 columns : 

![image.png](attachment:image.png)

## JOINING 3 tables

```sql
SELECT
    e.first_name,
    e.last_name,
    d.department_name,
    s.salary_amount
FROM
    employees e
JOIN
    departments d ON e.department_id = d.department_id
JOIN
    salaries s ON e.employee_id = s.employee_id;
```

## JOINING 4 tables

```sql
SELECT ab.Region,cl.Edition,
MAX(cl.CL) as max_cl
FROM cl_country as cl
LEFT JOIN ab_country as ab
ON (cl.Country = ab.Country AND cl.Edition = ab.Edition)
LEFT JOIN cd_country as cd
ON (ab.Country = cd.Country AND ab.Edition = cd.Edition AND ab.Region = cd.Region)
LEFT JOIN efg_country as efg
ON (ab.Country = efg.Country AND ab.Edition = efg.Edition AND ab.Region = efg.Region)
WHERE cl.Edition=2020
GROUP BY ab.region
ORDER BY max_cl DESC;
```

![image.png](attachment:image.png)

##### example 2 : 

```sql
SELECT DISTINCT c.*, 
  count(DISTINCT lm.lead_manager_code), 
  count(DISTINCT sm.senior_manager_code),
  count(DISTINCT m.manager_code), 
  count(DISTINCT e.employee_code)
from company as c
JOIN lead_manager as lm
ON c.company_code = lm.company_code
JOIN senior_manager as sm
ON (sm.company_code=c.company_code AND lm.lead_manager_code=sm.lead_manager_code)
JOIN manager as m
ON m.senior_manager_code=sm.senior_manager_code
JOIN employee as e
ON e.manager_code = m.manager_code
GROUP BY c.company_code,c.founder
ORDER BY c.company_code;
```

C1 Angela 1 2 5 13 

C10 Earl 1 1 2 3 

C100 Aaron 1 2 4 10 


## SET Operations

1. __UNION:__ The UNION operator is used to combine the results of two or more SELECT
statements into a single result set. The UNION operator removes duplicate rows
between the various SELECT statements.


2. __UNION ALL:__ The UNION ALL operator is similar to the UNION operator, but it does
not remove duplicate rows from the result set.


3. __INTERSECT:__ The INTERSECT operator returns only the rows that appear in both
result sets of two SELECT statements.


4. __EXCEPT:__ The EXCEPT or MINUS operator returns only the distinct rows that appear
in the first result set but not in the second result set of two SELECT statements.

### EXCEPT : find customers who have never ordered

```sql
select user_id from users
EXCEPT 
select user_id from orders;
```

![image.png](attachment:image.png)

## UNION

### RULES to be followed:


- The number of columns in each table must be the same.


- The data types of the columns in each table must be compatible and correspond to each other in order.


- The names of the columns in the result set are taken from the first table, so it's a good idea to give the columns in both tables meaningful names to make the result set easier to understand.



- The tables must be from the same database or accessible from the same database connection.

## Difference between UNION and UNION ALL()

The UNION and UNION ALL operators in SQL are used to combine the results of two or more SELECT statements into a single result set.

The main difference between UNION and UNION ALL is how duplicate rows are handled:

- __UNION:__ The UNION operator removes duplicate rows from the final result set. If the same row is present in multiple SELECT statements, it will appear only once in the final result set.


- __UNION ALL:__ The UNION ALL operator does not remove any duplicate rows. All rows, including duplicates, from each SELECT statement are included in the final result set.

## Difference between WHERE and HAVING clause in SQL

The WHERE and HAVING clauses in SQL are both used to filter rows from a result set based on certain conditions. 

However, they are used in different contexts and have different purposes:

- __WHERE clause :__ The WHERE clause is used to filter rows from a result set before aggregating the data. It filters rows based on the values in individual columns and returns only the rows that meet the specified conditions. The WHERE clause is applied to individual rows, and it filters out rows that do not meet the conditions.


- __HAVING clause :__ The HAVING clause is used to filter rows from a result set after aggregating the data. It filters groups of rows based on the result of an aggregate function, such as SUM, AVG, COUNT, etc. The HAVING clause is applied to groups of rows, and it filters out groups that do not meet the conditions.

### NOTE : After GROUP BY we can't use WHERE clause, we have to use HAVING clause

### NOTE : what WHERE is to SELECT is what HAVING is to GROUP BY

![image.png](attachment:image.png)

##### only using where clause :

```sql
select source_of_joining , count(*) as number 
FROM students 
where years_of_experience > 3 
GROUP BY source_of_joining;
```

![image.png](attachment:image.png)

![image.png](attachment:image.png)

##### using where and having clause

```sql
SELECT source_of_joining , 
count(*) as number 
from students 
where years_of_experience > 3 
GROUP BY source_of_joining 
HAVING number >1;
```

![image.png](attachment:image.png)

## HAVING clause - Campusx 

https://youtu.be/nsKcmOly0UY?t=5262

### find the avg rating of smartphone brands that have more than 20 phones

```sql
SELECT brand_name, 
COUNT(*) as count_phones,
ROUND(AVG(rating)) as avg_rating
FROM smartphones
GROUP BY brand_name
HAVING count_phones>20
ORDER BY avg_rating DESC;
```

![image.png](attachment:image.png)

## USING WHERE and HAVING together

###  Find the top 7 brands with the highest avg ram that has a refresh rate of at least 90 Hz and fast charging available and don't consider brands that have less than 10 phones

```sql
SELECT brand_name, 
count(*) as phone_count,
ROUND(avg(ram_capacity)) as average_ram
FROM smartphones

WHERE refresh_rate >= 90 AND fast_charging_available = 1

GROUP BY brand_name

HAVING phone_count > 10
ORDER BY average_ram DESC
LIMIT 7;
```

![image.png](attachment:image.png)

## ROLL UP, CUBE, Grouping sets

- "roll-up" is an operation used for generating aggregated results from a set of data. 


- It's commonly used in data warehousing and reporting scenarios to create hierarchical summaries of data at different levels of granularity. 


- The ROLLUP operation produces a result set that includes aggregated values for various combinations of specified columns, representing different levels of summarization.

```sql
SELECT product, category, region, SUM(sales_amount) AS total_sales
FROM sales_data
GROUP BY ROLLUP(product, category, region);
```

The result of this query would include aggregated sales amounts for each combination of product, category, and region, along with subtotals for different levels of summarization:

| product  | category | region   | total_sales |
|----------|----------|----------|-------------|
| Product1 | Category1| Region1  | 1000        |
| Product1 | Category1| Region2  | 1500        |
| Product1 | Category1| NULL     | 2500        |  -- Subtotal for Category1
| Product1 | NULL     | NULL     | 2500        |  -- Subtotal for Product1
| Product2 | Category2| Region1  | 800         |
| Product2 | Category2| Region2  | 1200        |
| Product2 | Category2| NULL     | 2000        |  -- Subtotal for Category2
| Product2 | NULL     | NULL     | 2000        |  -- Subtotal for Product2
| NULL     | NULL     | NULL     | 4500        |  -- Total


In this example, the ROLLUP operation has created a hierarchy of summarization levels based on the specified columns (product, category, and region). It calculates subtotals and grand totals to provide a comprehensive overview of the data at different levels of aggregation.

### Difference between rollup and groupby


The key distinction is that ROLLUP allows you to generate __multiple levels of aggregation, including subtotals and grand totals,__ making it useful for generating summary reports with varying levels of detail.

| product_category | product_type | sales_date | amount |
|------------------|--------------|------------|--------|
| Electronics     | Smartphone   | 2023-08-01 | 500    |
| Electronics     | Laptop       | 2023-08-01 | 800    |
| Clothing        | T-Shirt      | 2023-08-01 | 50     |
| Electronics     | Smartphone   | 2023-08-02 | 450    |
| Clothing        | Jeans        | 2023-08-02 | 70     |


#### Groupby query:
```sql
SELECT product_category, product_type, SUM(amount) AS total_amount
FROM sales
GROUP BY product_category, product_type;
```

##### result : 

| product_category | product_type | total_amount |
|------------------|--------------|--------------|
| Electronics     | Smartphone   | 950          |
| Electronics     | Laptop       | 800          |
| Clothing        | T-Shirt      | 50           |
| Clothing        | Jeans        | 70           |


#### rollup query : 

```sql
SELECT product_category, product_type, SUM(amount) AS total_amount
FROM sales
GROUP BY ROLLUP (product_category, product_type);
```

##### results:

| product_category | product_type | total_amount |
|------------------|--------------|--------------|
| Electronics     | Smartphone   | 950          |
| Electronics     | Laptop       | 800          |
| Electronics     |              | 1750         |  <!-- Subtotal for Electronics -->
| Clothing        | T-Shirt      | 50           |
| Clothing        | Jeans        | 70           |
| Clothing        |              | 120          |  <!-- Subtotal for Clothing -->
|                  |              | 1870         |  <!-- Grand Total -->


In the ROLLUP result:

- We get subtotals for each combination of product_category and product_type, such as Electronics with Smartphone and Laptop.


- __We also get subtotals for each individual product_category, showing the total amount for all Electronics products and all Clothing products.__


- __Finally, we have the grand total of all sales.__

### CUBE and Grouping sets:

watch it to understand - https://www.youtube.com/watch?v=KLPULneM4mo

### GROUPING SETS:

The GROUPING SETS operation allows you to specify multiple grouping sets within a single query. This provides more flexibility in choosing specific combinations of columns for subtotals and totals.

```sql
SELECT product_category, product_type, SUM(amount) AS total_amount
FROM sales
GROUP BY GROUPING SETS (
    (product_category, product_type),
    (product_category),
    ()
);

```

| product_category | product_type | total_amount |
|------------------|--------------|--------------|
| Electronics      | Smartphone   | 950          |
| Electronics      | Laptop       | 800          |
| Clothing         | T-Shirt      | 50           |
| Electronics      |              | 1750         |  -- Subtotal for Electronics
| Clothing         |              | 120          |  -- Subtotal for Clothing
|                  |              | 1870         |  -- Grand Total


#### Difference between Grouping sets and ROLLUP 

- The key difference is that ROLLUP generates subtotals and grand totals automatically based on the columns specified in the ROLLUP clause. In contrast, GROUPING SETS allows you to explicitly define multiple grouping sets, giving you more control over which subtotals and grand totals you want to include in the result.

- While ROLLUP provides a structured approach with hierarchical subtotals, GROUPING SETS offers more flexibility to create custom combinations of subtotals and totals in a single query.

### CUBE

The CUBE operation generates subtotals and grand totals for all possible combinations of columns specified in the CUBE clause. It provides a more comprehensive approach, creating aggregates for every possible combination of dimensions.

```sql
SELECT product_category, product_type, SUM(amount) AS total_amount
FROM sales
GROUP BY CUBE (product_category, product_type);

```

| product_category | product_type | total_amount |
|------------------|--------------|--------------|
| Electronics      | Smartphone   | 950          |
| Electronics      | Laptop       | 800          |
| Electronics      |              | 1750         |
| Clothing         | T-Shirt      | 50           |
| Clothing         | Jeans        | 70           |
| Clothing         |              | 120          |
|                  |              | 1870         |
| Electronics      |              | 950          |
| Clothing         |              | 120          |
|                  |              | 1070         |
|                  | Smartphone   | 950          |
|                  | Laptop       | 800          |
|                  |              | 1750         |
|                  | T-Shirt      | 50           |
|                  | Jeans        | 70           |
|                  |              | 120          |
|                  |              | 1870         |
|                  |              | 2750         |


#### Difference between Roll up and CUBE:

- The key difference is in the scope of aggregation. While ROLLUP generates subtotals and grand totals in a hierarchical manner for specific columns, CUBE generates subtotals and grand totals for all possible combinations of the specified columns. CUBE provides a more comprehensive overview of the data, but it can result in a larger result set.



In summary, ROLLUP is more focused on structured subtotals, while CUBE provides a broader view by including aggregates for all possible combinations of dimensions. The choice between them depends on the level of detail and insight you need from your aggregated data.

### REPLACE() in sql: 

#### Question : 

Samantha was tasked with calculating the average monthly salaries for all employees in the EMPLOYEES table, but did not realize her __keyboard's  0 key was broken__ until after completing the calculation. She wants your help finding the difference between her miscalculation (using salaries with any zeros removed), and the actual average salary.


Write a query calculating the amount of error (i.e.:  __actual - miscalculated average monthly salaries__), and round it up to the next integer.

```sql
select 
CEIL(AVG(Salary) - AVG(replace(Salary,0,'')))
from employees;
```

#### output : 2253

---

# SUBQUERIES in SQL - Campusx

https://youtu.be/YYq47MN3TZI

A subquery is a query within another query. It is a SELECT statement that is
nested inside another SELECT, INSERT, UPDATE, or DELETE statement. The
subquery is executed first, and its result is then used as a parameter or condition
for the outer query.

#### Subqueries can be used inside:

1. SELECT


2. FROM


3. WHERE


4. HAVING


5. INSERT


6. UPDATE


7. DELETE

### Types of Subquries:

`In SQL, subqueries are queries nested within another query to perform specific tasks. There are several types of subqueries, each serving different purposes:

1. **Scalar Subquery:**
   - Returns a single value (one row and one column) to the outer query.
   - Often used in expressions, comparisons, or calculations.
   - Example: `SELECT name, (SELECT MAX(salary) FROM employees) AS max_salary FROM employees;`


2. **Single-Row Subquery:**
   - Returns a single row with multiple columns to the outer query.
   - Typically used with comparison operators such as `IN`, `=`, `<`, `>`, etc.
   - Example: `SELECT name FROM employees WHERE salary = (SELECT MAX(salary) FROM employees);`


3. **Multi-Row Subquery:**
   - Returns multiple rows with one or more columns to the outer query.
   - Typically used with the `IN` or `ANY` operators.
   - Example: `SELECT name FROM employees WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');`


4. **Correlated Subquery:**
   - References columns from the outer query in the inner query.
   - Executed once for each row processed in the outer query.
   - Example: `SELECT name FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);`


5. **Nested Subquery:**
   - A subquery within another subquery.
   - Provides a way to perform more complex queries by building on multiple levels of nesting.
   - Example: `SELECT name FROM employees WHERE department_id = (SELECT department_id FROM departments WHERE name = 'Sales');`


6. **Correlated EXISTS Subquery:**
   - Checks for the existence of rows in the subquery result.
   - Used with the `EXISTS` keyword in conditions.
   - Example: `SELECT name FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.order_date > '2023-01-01');`

These subquery types offer various ways to manipulate and retrieve data based on specific conditions or relationships within a database.

### Combination of 2 conditions :

#### Find the most profitable movie of each year

![image.png](attachment:image.png)

### 2nd example query on 2 conditions:

![image.png](attachment:image.png)

### Co-related subquery - Not preffered over JOINS and Groupby

https://youtu.be/YYq47MN3TZI?t=4944

A correlated subquery is a type of SQL subquery that refers to a column from the outer query within its own query block. This creates a relationship, or correlation, between the inner and outer queries. The inner query's result depends on the specific row being processed in the outer query. Correlated subqueries are used to compare values between the main query and the subquery for filtering or matching purposes. They are typically slower and less efficient compared to non-correlated subqueries.

In short, a correlated subquery connects an inner query to an outer query using data from the outer query's current row.

![Screenshot%202023-08-25%20221233.png](attachment:Screenshot%202023-08-25%20221233.png)

#### Find all the movies that have a rating higher than the average rating of movies in the same genre:

### SUB query in SELECT statement

#### Get the percentage of votes for each movie compared to the total number of votes.


![image.png](attachment:image.png)

### Not efficient sub-query : SELECT inside SELECT

#### Display all movie names ,genre, score and avg(score) of genre

![image.png](attachment:image.png)

### Difference between subqueries and co related sub-queries:

Subqueries and correlated subqueries are both types of SQL queries, but they serve different purposes and have distinct characteristics:

1. **Subquery (Non-Correlated Subquery):**
   - A subquery is a query nested within another query.
   - It can be executed independently of the outer query and produces a result that is used by the outer query.
   - The subquery is evaluated first, and its result is then used in the main query.
   - It doesn't have a direct relationship with the outer query's data.
   - Typically, subqueries are more efficient than correlated subqueries.



2. **Correlated Subquery:**
   - A correlated subquery is a subquery that references columns from the outer query.
   - The subquery's execution depends on the data of the current row being processed in the outer query.
   - It is executed repeatedly, once for each row processed by the outer query.
   - Correlated subqueries can be less efficient and slower than non-correlated subqueries, especially for large datasets.
   - They are used when comparing data between the inner and outer queries is necessary.

In essence, the key difference is that a correlated subquery establishes a connection between the inner and outer queries by utilizing the current row's data in the outer query, whereas a non-correlated subquery operates independently of the outer query's data.

# WINDOW FUNCTIONS

Window functions in SQL are a type of aggregate function that operate over a set of rows, defined by a sliding window or a set of rows. They are used to perform calculations on a subset of rows, rather than on the entire result set of a query.


The window specification is defined using the __OVER() clause in SQL__, which specifies
the partitioning and ordering of the rows that the window function will operate
on.


Window functions allow you to perform calculations that depend on the values of multiple rows in a query, and can be useful for tasks such as calculating running totals, moving averages, percentiles, and more.


### NOTE : It is mandatory to use OVER () while using WINDOWS Function.

### How WINDOW Function is different from GROUP BY of pandas

**WINDOW Functions:**
- Operate within the context of individual rows.


- Calculate values based on a window of related rows around each row.


- Values are added as new columns to the existing rows.


- Suitable for analytical calculations within rows.


- __Retains the same number of rows in the result.__

**GROUP BY in pandas:**


- Groups rows with the same values in specified columns.


- Aggregates data within each group using functions like sum, mean, etc.


- Reduces the number of rows, representing each group with a single row.


- Used for summarizing data based on column values.


- __Changes the number of rows in the result by collapsing groups.__

## OVER clause

The OVER clause in MySQL is used to define a window function that performs a calculation across a set of rows that are related to the current row. 

It is used with aggregate functions like SUM, AVG, MIN, MAX, etc. to perform calculations over a specified range of rows.

__NOTE : if nothing is passed insider OVER() then it aggregates over entire column__

#### USING JOIN

```sql
SELECT student_fname, student_lname, students.location, total_count, average_experience from students 
JOIN 
(select location, count(location) as total_count, avg(years_of_experience) as average_experience from students GROUP BY location) as temptable 
ON students.location = temptable.location;
```

![image.png](attachment:image.png)

#### SAME using OVER AND PARTITION BY

```sql
SELECT student_fname, student_lname, location, COUNT(location) OVER (PARTITION BY location) as total_students,
avg(years_of_experience)  OVER (PARTITION BY location) AS avg_exp 
from students;
```

![image.png](attachment:image.png)

## PARTITION


PARTITION BY is a clause in the GROUP BY statement in MySQL. It divides the rows of a result set into partitions based on the values of the specified column or columns.

The purpose of using PARTITION BY is to provide an additional level of grouping within a GROUP BY statement. With it, you can perform aggregate operations (such as SUM, AVG, MIN, etc.) on each partition separately, rather than on the entire result set.

__The PARTITION BY clause is useful in situations where you want to perform the same aggregate operation on multiple partitions of data.__

### BENEFIT of PARTITION over GROUPBY :

WE can use non-aggregated columns also in partition by. Unlike in GROUPBY we can only use columns passed to GROUPBY.

## NOTE : 'AS' used for rename the result column name is used after partition

##### partition on location and assign rank to years of experience based on location

![image.png](attachment:image.png)

##### Highest years of experience from each location

![image.png](attachment:image.png)

#### marks greater than respective branch's avg marks 

![image.png](attachment:image.png)

## PARTITION BY on Multiple columns

#### cheapest flights between 2 cities

![image.png](attachment:image.png)

## Types of Window functions :

- __ROW_NUMBER:__ Assigns a unique number to each row in the result set, based on the order specified in the ORDER BY clause of the OVER clause.


- __RANK:__ Assigns a unique rank to each row in the result set, based on the order specified in the ORDER BY clause of the OVER clause. Rows with the same values receive the same rank, and a gap is left in the ranking for the next unique value.


- __DENSE_RANK:__ Assigns a unique rank to each row in the result set, based on the order specified in the ORDER BY clause of the OVER clause. Rows with the same values receive the same rank, and there is no gap in the ranking for the next unique value.


- __NTILE:__ Divides the result set into a specified number of groups, or tiles, and assigns a number to each row indicating which tile it belongs to.


- __PERCENT_RANK:__ Calculates the relative rank of each row within the result set as a fraction between 0 and 1.


- __CUME_DIST:__ Calculates the cumulative distribution of a value within the result set, expressed as a fraction between 0 and 1.


- __LEAD and LAG:__ Return the value of a specified column from a row at a specified offset from the current row, either ahead (LEAD) or behind (LAG) in the result set.
<br></br>

__You can use a window function by including it as part of an aggregate function in a query, and using the OVER clause to specify the window for the function.__

## 1. ROW NUMBER

### NOTE : Row number() is always used with order by

![image.png](attachment:image.png)

##### 5th highest salary

![image.png](attachment:image.png)

## 2. RANK  and DENSE RANK

RANK and DENSE_RANK are both used to assign a unique rank to each row within a result set, based on the values in one or more columns. 

However, there is a difference in the way that they handle ties (rows with equal values in the ranking column).

- RANK assigns the same rank to tied rows and skips the next rank number. For example, if two rows have the same value and are assigned rank 1, the next row would be assigned rank 3.


- DENSE_RANK, on the other hand, assigns the same rank to tied rows and does not skip any rank numbers. For example, if two rows have the same value and are assigned rank 1, the next row would be assigned rank 2.

__In summary, RANK can have "gaps" in the rank numbers, while DENSE_RANK always assigns consecutive rank numbers.__

### NOTE : For RANK and DENSE RANK ORDER BY is mandatory

#### ROW NUMBER

![image.png](attachment:image.png)

#### RANK

```sql
SELECT *, 
RANK() OVER (ORDER BY years_of_experience DESC) as ranking 
from students;
```

![image.png](attachment:image.png)

#### rank branch wise

![image.png](attachment:image.png)

#### DENSE RANK

```sql
SELECT student_fname, student_lname, location, years_of_experience, 
DENSE_RANK() over(ORDER BY years_of_experience DESC) as dense_ranking 
from students;
```

![image.png](attachment:image.png)

### RANK () vs DENSE_RANK()

![image.png](attachment:image.png)

## 3. LEAD and LAG

Return the value of a specified column from a row at a specified offset from the current row, either ahead (LEAD) or behind (LAG) in the result set.

##### LAG : Grouping on source of joining

![image.png](attachment:image.png)

null output is for first record of each group of source_of_joining

##### now passing 2 rows before for each group

![image.png](attachment:image.png)

##### LEAD : Grouping on source of joining

![image.png](attachment:image.png)

null output is for last record of each group of source_of_joining

##### now passing 2 rows after for each group

![image.png](attachment:image.png)

### Month on Month Growth using LAG


![image.png](attachment:image.png)

## 4. First_value() - first record of  years for each group

![image.png](attachment:image.png)

### Student name who has highest marks overall

![image.png](attachment:image.png)

###  name and branch and marks of only toppers

![image.png](attachment:image.png)

## 5. Last_value() - last years for each group

![image.png](attachment:image.png)

### NOTE : We are not getting correct results bcoz of FRAME () clause

___Default FRAME clause:___

over (partition by source_of_joining order by years_of_experience desc __range between unbounded preceding and current row)__ as least_experienced

##### changing current row to unbounded following for last_value to work:

![image.png](attachment:image.png)

### another example

![image.png](attachment:image.png)

### name and branch of lowest marks

#### OR

![image.png](attachment:image.png)

## FRAME clause()


- FRAME is a subset of partition created by Windows function.


- It  defines the scope of the calculation performed by a window function, and it's used to specify which rows should be included in the calculation based on their relative position to the current row performed by a window function.


___The ROWS clause___ specifies how many rows should be included in the frame
relative to the current row. For example, ROWS 3 PRECEDING means that the
frame includes the current row and the three rows that precede it in the partition.


___The BETWEEN___ clause specifies the boundaries of the frame.


The FRAME clause has two components:

- __ROWS BETWEEN :__ Specifies the range of rows to include in the calculation, either UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING, to include all rows, or a range such as 1 PRECEDING and 1 FOLLOWING to include only the current row and its two neighbors.


- __EXCLUSIVE or INCLUSIVE :__ Specifies whether the first and last rows in the frame should be included in the calculation or excluded.

Examples


- __ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW__ - means that the
frame includes all rows from the beginning of the partition up to and including the
current row.


- __ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING:__ the frame includes the
current row and the row immediately before and after it.


- __ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING:__ the
frame includes all rows in the partition.


- __ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING:__ the frame includes the
current row and the three rows before it and the two rows after it.

## 6. Nth_Value

The NTH_VALUE function in MySQL is a window function that returns the nth value __(any value from a position specified by us)__ in a set of values, based on a specified order. The function returns the value of a specified expression for the nth row in the window frame, where the frame is defined using the OVER clause.

##### 2nd most experienced person from each group

![image.png](attachment:image.png)

### NOTE : if number of rows in any group is less than tha nth_value provided, that group will return NULL

## 7. Ntile

Segmentation using NTILE is a technique in SQL for dividing a dataset into equal-
sized groups based on some criteria or conditions, and then performing calculations or analysis on each group separately using window functions.It returns a number representing the group or tile that each row belongs to.


The NTILE function in SQL is a window function that returns the ntile value for a __set of rows (buckets)__ based on a specified order. 



![image-2.png](attachment:image-2.png)

### NOTE : if total rows is uneven when divided, then 1st quantile group gets more records 

### Using cases in Ntile

![image.png](attachment:image.png)

## 8. CUME_DIST() - cummulative distribution

The cumulative distribution function is used to
describe the probability distribution of random
variables. 

It can be used to describe the probability
for a discrete, continuous or mixed variable. It is
obtained by summing up the probability density
function and getting the cumulative probability for
a random variable

![image.png](attachment:image.png)

#### students having marks greater than 90 percentile:

![image.png](attachment:image.png)

### Another example:

![image.png](attachment:image.png)

#### round of to 3 places after decimal using round(,3)

![image.png](attachment:image.png)

### using it to fetch first 35% from each group

![image.png](attachment:image.png)

## 9. PERCENT_RANK

percent_rank is a window function in SQL that calculates the __relative rank of a row within a set of rows,__ represented as a decimal value between 0 and 1, where 0 represents the lowest rank and 1 represents the highest rank.

The percent_rank function is used in a similar way to the rank function, but instead of returning the rank as an integer, it returns the rank as a decimal value. The percent_rank function is calculated as: __(rank - 1) / (total number of rows - 1).__

```sql
select student_id, student_fname, location,source_of_joining, years_of_experience,
round(percent_rank() over (partition by source_of_joining order by years_of_experience desc),3) as cume_distri
from students;
```

![image.png](attachment:image.png)

## Difference between cum_dist() and percent_rank()

The difference between percent_rank and cume_dist lies __in the way they calculate the relative position of a row within a set of rows.__

- __percent_rank__ returns the relative rank of a row as a decimal value between 0 and 1, where 0 represents the lowest rank and 1 represents the highest rank. 

__The percent_rank function is calculated as (rank - 1) / (total number of rows - 1).__


- __cume_dist__ returns the cumulative distribution of a value within a set of values, represented as a decimal value between 0 and 1. 

The cume_dist function calculates the fraction of rows that are less than or equal to the current row, within the set of rows defined by the PARTITION BY clause.

## 10. CUMULATIVE SUM

Cumulative sum is another type of calculation that can be performed using
window functions. A cumulative sum calculates the sum of a set of values up to a
given point in time, and includes all previous values in the calculation.

### career runs of viart kohli after 50th, 100th match

```sql
SELECT * FROM 
    (SELECT 
    CONCAT("Match-",CAST(ROW_NUMBER() OVER(ORDER BY ID) AS CHAR)) AS match_number,
    SUM(batsman_run) as 'runs_scored',
    SUM(SUM(batsman_run)) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as career_runs
    from ipl
    WHERE batter = 'V Kohli'
    GROUP BY ID) as t
WHERE match_number IN ('Match-50','Match-100');
```

![image.png](attachment:image.png)

### 11. CUMULATIVE AVERAGE

Cumulative average is another type of average that can be calculated using
window functions. A cumulative average calculates the average of a set of values
up to a given point in time, and includes all previous values in the calculation.

```sql
SELECT * FROM 
    (SELECT 
    CONCAT("Match-",CAST(ROW_NUMBER() OVER(ORDER BY ID) AS CHAR)) AS match_number,
    SUM(batsman_run) as 'runs_scored',
    SUM(SUM(batsman_run)) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as career_runs,
    AVG(SUM(batsman_run)) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as career_average
    from ipl
    WHERE batter = 'V Kohli' 
    GROUP BY ID) as 
t;
```

![image.png](attachment:image.png)

## 12. MOVING AVERAGE/ ROLLING AVERAGE

![image.png](attachment:image.png)

```sql
SELECT * FROM 
    (SELECT 
    CONCAT("Match-",CAST(ROW_NUMBER() OVER(ORDER BY ID) AS CHAR)) AS match_number,
    SUM(batsman_run) as 'runs_scored',
    SUM(SUM(batsman_run)) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as career_runs,
    AVG(SUM(batsman_run)) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as career_average,
    AVG(SUM(batsman_run)) OVER (ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) as moving_average
    from ipl
    WHERE batter = 'V Kohli'
    GROUP BY ID) as 
t;
```

![image.png](attachment:image.png)

## 13. Percentage of total

Percent of total refers to the percentage or proportion of a specific value in
relation to the total value. It is a commonly used metric to represent the relative
importance or contribution of a particular value within a larger group or
population.

### Percentage of total sales an item brings to restaurants

### 14. Percentage of change

Percent change is a way of expressing the difference between two
values as a percentage of the original value. It is often used to measure
how much a value has increased or decreased over a given period of
time, or to compare two different values.

![image.png](attachment:image.png)

```sql
SELECT 
YEAR(Date),QUARTER(Date), SUM(views) AS 'views',
((SUM(views) - LAG(SUM(views)) OVER (ORDER BY YEAR(Date),QUARTER(Date)))/LAG(SUM(views)) 
OVER(ORDER BY YEAR(Date),QUARTER(Date)))*100 AS 'Percent_change'
FROM youtube_views
GROUP BY YEAR(Date),QUARTER(Date)
ORDER BY YEAR(Date),QUARTER(Date);
```

## 15 . Percentiles & Quantiles

A __Quantile__ is a measure of the distribution of a dataset that divides the data into
any number of equally sized intervals. For example, a dataset could be divided into
__deciles__ (ten equal parts), __quartiles__ (four equal parts), __percentiles__ (100 equal
parts), or any other number of intervals.


Each quantile represents a value below which a certain percentage of the data
falls. For example, the 25th percentile (also known as the first quartile, or Q1)
represents the value below which 25% of the data falls. The 50th percentile (also
known as the median) represents the value below which 50% of the data falls, and
so on.

___

## EXISTS / NOT EXISTS

#### used in correlated nested query

```sql
SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name FROM table_name WHERE condition);
```

- The EXISTS operator is used to test for the existence of any record in a subquery.


- The EXISTS operator returns TRUE if the subquery returns one or more records


- __Each row of Outer query table will be compared with rows of inner query table. Entire inner query will run for each row of outer query.__

```sql
SELECT *
FROM users
WHERE EXISTS (
  SELECT *
  FROM orders
  WHERE orders.user_id = users.user_id
);
```

![image.png](attachment:image.png)

In this example, we're using EXISTS in a subquery to check if there are any rows in the orders table that have a customer_id equal to the id of a row in the customers table. If the subquery returns any rows, the EXISTS condition is considered to be true, and the outer query will return all rows from the customers table. If the subquery doesn't return any rows, the EXISTS condition is considered to be false, and the outer query won't return any rows.

## ANY and ALL

```sql
SELECT column_name(s)
FROM table_name
WHERE column_name operator ANY
  (SELECT column_name
  FROM table_name
  WHERE condition);
```

__The ANY operator :__

- returns a boolean value as a result


- returns TRUE if ANY of the subquery values meet the condition


- ANY means that the condition will be true if the operation is true for any of the values in the range.

#### Products:
| product_id | product_name | product_price | category_id |
|------------|--------------|---------------|-------------|
| 1          | Laptop       | 1000          | 1           |
| 2          | Smartphone   | 500           | 1           |
| 3          | Tablet       | 300           | 1           |
| 4          | TV           | 800           | 2           |
| 5          | Headphones   | 50            | 3           |


#### Orders:

| order_id | customer_id | product_id |
|----------|-------------|------------|
| 101      | 201         | 1          |
| 102      | 202         | 3          |
| 103      | 203         | 2          |
| 104      | 201         | 4          |
| 105      | 203         | 5          |


- Find products that have been ordered by at least one customer
```sql
SELECT product_name
FROM Products
WHERE product_id = ANY (
    SELECT product_id
    FROM Orders);
```

#### output:

| product_name |
|--------------|
| Laptop       |
| Smartphone   |
| Tablet       |
| TV           |
| Headphones   |


#### eg 2 : ANY


```sql
SELECT product_name FROM Products
WHERE product_price > ANY (SELECT product_price FROM Products WHERE category_id = 4);
```


### The ALL operator:

```sql
SELECT column_name(s)
FROM table_name
WHERE column_name operator ALL
  (SELECT column_name
  FROM table_name
  WHERE condition);
```

- returns a boolean value as a result


- returns TRUE if ALL of the subquery values meet the condition


- is used with SELECT, WHERE and HAVING statements


- ALL means that the condition will be true only if the operation is true for all values in the range.

- Find customers who have ordered all products from category 1 :
```sql
SELECT c.customer_id, c.customer_name
FROM Customers c
WHERE c.customer_id = ALL (
    SELECT o.customer_id
    FROM Orders o
    JOIN Products p ON o.product_id = p.product_id
    WHERE p.category_id = 1
);
```

# WITH clause() / Common Table Expression (CTE) / Sub-query factoring

https://www.youtube.com/watch?v=QNfnuK-1YYY&list=PLavw5C92dz9Ef4E-1Zi9KfCTXS_IN8gXZ&index=8

The WITH () clause in MySQL is also known as a __Common Table Expression (CTE)__ and is used to simplify complex SQL queries by creating a temporary, named result set that can be referred to within the main query.

A CTE is essentially a named subquery that can be used within a SELECT, INSERT, UPDATE, or DELETE statement.

Here's a breakdown of the syntax:

1. `WITH`: This keyword signals the start of the `WITH` clause.


2. `cte_name`: This is the name you assign to the Common Table Expression. You'll use this name to reference the CTE in the main query.


3. `(column1, column2, ...)`: Optionally, you can specify the column names for the CTE. This is especially useful if you want to give explicit names to the columns in the CTE, which can make the main query more readable.


4. `AS`: This keyword separates the CTE name and column specification from the actual subquery.


5. Subquery: This is where you define the query that generates the temporary result set. This subquery can be a `SELECT` statement that retrieves data from one or more tables and can involve joins, filters, and other operations.


6. Main query: After defining the CTE, you can use it within the main query. The main query can reference the CTE as if it were a table or a subquery, making the code more organized and understandable.


__we start with WITH clause and then give an alias such cte, then pass the column name required__

```sql
WITH cte AS (
  SELECT column1, column2, SUM(column3) AS total
  FROM table_name
  GROUP BY column1, column2
)
SELECT *
FROM cte
WHERE total > 100;
```

### Q- Fetch employees who earn more than average salary of all employees:

> average_salary is a alias of (select avg(salary) from emp_data) which is here a temp table

```sql
with average_salary(avg_sal) as
    (select avg(salary) from emp_data)

select *
from emp_data as e, average_salary
where e.salary > average_salary.avg_sal; 
```

![image.png](attachment:image.png)

### Q- Apple Stores with sales greater than average sale of combined stores :

__Steps :__

1) Total Sales per store  $\longrightarrow$ Total_sales


2) Find the average sales of all stores together. $\longrightarrow$ Avg_sales



3) find the stores where Total_sales > avg_sales of all stores 

##### - without using WITH clause(), using subqueries

![image.png](attachment:image.png)

![image.png](attachment:image.png)

##### - using WITH clause()

- using with clause() we will get the average salary then we give that as a condition to filter data


- first with clause will be Total_Sales


- Second with clause is Avearge_sales

```sql
with Total_sales(store_id, store_name,total_sales_each_store) AS
                (select s.store_id,s.store_name,sum(s.quantity * s.cost) as total_sales_each_store
                from iphone as  s
                group by s.store_id,store_name),                
      Avg_sales(avg_overall) AS (select ROUND(avg(total_sales_each_store),2) as avg_overall from Total_sales)
      

select * from
Total_sales as ts
join Avg_sales as av
on ts.total_sales_each_store > av.avg_overall;
```

![image.png](attachment:image.png)

#### Question
Ketty gives Eve a task to generate a report containing three columns: Name, Grade and Mark. Ketty doesn't want the NAMES of those students who received a grade lower than 8. The report must be in descending order by grade -- i.e. higher grades are entered first. 


If there is more than one student with the same grade (8-10) assigned to them, order those particular students by their name alphabetically. 


Finally, if the grade is lower than 8, use "NULL" as their name and list them by their grades in descending order. 


If there is more than one student with the same grade (1-7) assigned to them, order those particular students by their marks in ascending order.


```sql
with cte as (select * from (SELECT name,marks,
CASE
    WHEN marks between 90 AND 100 THEN 10
    WHEN marks between 80 AND 89 THEN 9
    WHEN marks between 70 AND 79 THEN 8
end as grades
from students
WHERE marks>=70) as t1
             
UNION ALL
             
select * from 
    (SELECT name,marks,
    CASE
        WHEN marks between 60 AND 69 THEN 7
        WHEN marks between 50 AND 59 THEN 6
        WHEN marks between 40 AND 49 THEN 5
        WHEN marks between 30 AND 39 THEN 4
        WHEN marks between 20 AND 29 THEN 3
        WHEN marks between 10 AND 19 THEN 2
        WHEN marks between 1 AND 9 THEN 1
    END as grades
    from students
    WHERE marks<70) as  
t2
ORDER BY grades DESC, name ASC, marks ASC )


select 
CASE
    WHEN grades > 7 then name
    ELSE "NULL"
END as names, 
grades,marks
from cte;
```

![Screenshot%202023-09-07%20233642.png](attachment:Screenshot%202023-09-07%20233642.png)

### Using 2 cte's together

```sql
with cte as 
    (select c.hacker_id,h.name,
    COUNT(*) as c_created
    from challenges2 as c
    JOIN hackers2 as h
    ON c.hacker_id = h.hacker_id
    GROUP BY c.hacker_id,h.name),

cte2 as 
    (select *,
    COUNT(*) over(PARTITION BY cte.c_created) as total_count
    from cte
    ORDER BY c_created DESC)

select 
    cte2.hacker_id, 
    cte2.name, 
    cte2.c_created 
    from cte2
    WHERE (cte2.c_created = (select MAX(cte2.c_created) from cte2)) 
           OR cte2.total_count = 1
    ORDER BY c_created DESC,hacker_id;
```

### Benefits of using CTE (WITH clause) :

- __Improved readability:__ A CTE can make a complex query easier to understand by breaking it down into smaller, named sub-queries that can be referenced within the main query.


- __Reusability:__ A CTE can be used multiple times within a single query or across multiple queries, which can improve code reuse and make maintenance easier.


- __Improved performance :__ A CTE can be used to break down a complex query into smaller, more manageable parts, which can improve query performance by reducing the amount of data that needs to be processed at each stage.


- __Improved maintainability:__ By encapsulating a sub-query within a CTE, the query becomes self-contained, making it easier to understand and maintain, especially for complex queries.


- __Better error handling :__ If a query that includes a CTE encounters an error, only the CTE is affected, rather than the entire query, which can simplify debugging and error handling.


- __Improved optimization:__ By breaking down a complex query into smaller sub-queries, the optimizer can better evaluate the cost of each part, leading to more efficient query execution.

In summary, the use of the WITH clause provides several benefits that can lead to improved performance, readability, and maintainability of SQL queries.

___
___

## COALESCE () - returns non-null values

The COALESCE function in SQL is used to return the first non-NULL value from a list of expressions. 


__It takes a list of one or more expressions as its arguments,__ and returns the first expression that is not NULL. If all expressions are NULL, then COALESCE returns NULL. That is how it is different from isnull()

The COALESCE function works as follows:

- It evaluates the expressions in the order they are provided.
- It returns the value of the first non-null expression.
- If all expressions are null, it returns null.

#### Example 1: Using COALESCE to Replace NULL Values:

| student_id | student_name | grade |
|------------|--------------|-------|
| 1          | Alice        | 85    |
| 2          | Bob          | NULL  |
| 3          | Carol        | 92    |
| 4          | Dave         | NULL  |
| 5          | Eve          | 78    |


```sql
SELECT student_id, student_name, COALESCE(grade, 'N/A') AS final_grade
FROM Students;
```

| student_id | student_name | final_grade |
|------------|--------------|-------------|
| 1          | Alice        | 85          |
| 2          | Bob          | N/A         |
| 3          | Carol        | 92          |
| 4          | Dave         | N/A         |
| 5          | Eve          | 78          |


#### Example 2: Using COALESCE with Multiple Columns:

```sql
SELECT student_id, student_name, COALESCE(grade, extra_credit, 0) AS final_grade
FROM Students;
```

| student_id | student_name | final_grade |
|------------|--------------|-------------|
| 1          | Alice        | 85          |
| 2          | Bob          | 0           |
| 3          | Carol        | 92          |
| 4          | Dave         | 0           |
| 5          | Eve          | 78          |


#### Example 3: Using COALESCE in a Conditional Expression:

```sql
SELECT student_id, student_name,
       CASE WHEN grade >= 90 THEN 'A'
            WHEN grade >= 80 THEN 'B'
            ELSE 'C'
       END AS letter_grade,
       COALESCE(grade, 0) AS final_grade
FROM Students;
```

| student_id | student_name | letter_grade | final_grade |
|------------|--------------|--------------|-------------|
| 1          | Alice        | B            | 85          |
| 2          | Bob          | C            | 0           |
| 3          | Carol        | A            | 92          |
| 4          | Dave         | C            | 0           |
| 5          | Eve          | C            | 78          |


### diiference between IS NOT NULL and COALESCE

- Use IS NOT NULL to filter rows where a specific column or expression has a non-null value.


- Use COALESCE to handle NULL values by providing an alternative non-null value, which is particularly useful when displaying data or performing calculations.


___

# Indexes

good video : https://www.youtube.com/watch?v=fsG1XaZEa78

- Index is a database object that makes data retrival faster.


- it is created on columns that are frequnetly used.


- Indexes work similarly to the index of a book, helping the database locate the desired data more efficiently. Without indexes, the DBMS would need to scan the entire table, which can become inefficient for large datasets.


- Indexes are created on specific columns of a table and store a copy of the data in those columns in a separate data structure. This allows the DBMS to rapidly locate data without having to scan the entire table.


- __Index for primary and unique constraints are automatically created and dropped during table creation and deletion.__


- Index improves performance in select but hamper insert update delete. so not good idea to create index on every column.


- Index contains redundant data already existing in table. hence consumes space.


- Each table can have only one clustered index usually created on a primary key


- No limit on non clustered index

__Index Key :__ Column on which we create Index 

## Types of indexes:

### __1. Clustered Index__

- clustered index is a special type of index that physically reorders the rows of a table to match the order of the index.  This means that the data in the table is stored in the same order as the clustered index. 



- As a result, a clustered index is often used as the primary key of a table, as it can provide fast access to rows based on the primary key value. __In a table there can only be 1 clustered index.__ They are physically ordered in the actual table.


- Clustered Index can be made of only 1 column (Primary Key) or using multiple column (composite key) known as composite cluster key.

### 2. Non-clustered Index

- __Not a primary key column__


- A non-clustered index is a type of index that does not physically reorder the rows of a table. Instead, it creates a separate structure that maps the values of one or more columns in the table to their physical location. 


- When a query is executed that uses a non-clustered index, the database must first look up the index to find the physical location of the data, and then retrieve the actual data from the table.


-  A table can have multiple non clustered index

## Difference between Clustered and Non-clustered Indexes

- The main difference between clustered and non-clustered indexes is that a clustered index physically reorders the rows of a table to match the index, while a non-clustered index provides a mapping of values to physical locations but does not change the physical order of the table.


-  A table can have multiple non-clustered index but it can only have 1 clustered index

## SEEK and SCAN in sql

https://www.youtube.com/watch?v=gZu2ZldwrK4

"Seek" and "Scan" are two methods used by the MySQL database management system to __search for data in a table.__

- __Seek__ is a direct lookup method, where MySQL uses the index of a table to quickly find a specific row of data based on its unique key. This method is fast and efficient, but it can only be used when searching for an exact match of a unique key value.


- __Scan__ on the other hand, is a method where MySQL scans the entire table to find the rows that match a certain condition. This method is slower than "Seek", but it can be used to find all rows that match a certain condition, even if no index exists for the columns being searched. Scans can also be used to return all rows in a table if no specific search condition is provided.


__In summary, "Seek" is a fast, direct lookup method for finding a specific row in a table, while "Scan" is a slower method for finding all rows that match a certain condition.__

### CREATE INDEX

The CREATE INDEX statement is used to create indexes in tables.

Indexes are used to retrieve data from the database more quickly than otherwise. The users cannot see the indexes, they are just used to speed up searches/queries.

```sql
CREATE INDEX index_name
ON table_name (column_name);
```

### Note: Updating a table with indexes takes more time than updating a table without (because the indexes also need an update). So, only create indexes on columns that will be frequently searched against.

#### CREATE UNIQUE INDEXES

```sql
CREATE UNIQUE INDEX index_name
ON table_name (column_name);
```

#### COMPOSITE INDEX

```sql
CREATE INDEX index_name
ON table_name(column1,column2)
```

#### DROP INDEX :

```sql
ALTER TABLE table_name
DROP INDEX index_name;
```

#### VIEW index of a particular  table

```sql
SELECT * from user_indexes where table_name = 'table-name';
```

### Types of scans in sql:

In SQL, database systems use different methods to retrieve and process data during query execution. These methods are often referred to as "scans." Each type of scan has its own characteristics and is suitable for different scenarios.


Here are some common types of scans in SQL:

1. **Table Scan (Full Table Scan):**
   - A table scan involves scanning the entire table to retrieve data. __This is used when there are no indexes available to assist in the query.__
   - It can be inefficient for large tables as it involves reading every row, regardless of the query conditions.


2. **Index Scan:**
   - An index scan involves using an index to locate rows in a table based on the conditions specified in the query.
   - It's efficient for retrieving a subset of rows that satisfy the query conditions, especially when an appropriate index exists.
   - unique index or index on primary key


3. **Index Seek:**
   - An index seek involves looking up specific rows in an index to retrieve the necessary data directly.
   - It's highly efficient for queries with equality conditions or range conditions on indexed columns.


4. **Clustered Index Scan:**
   - A clustered index scan reads all rows of a table based on the order of the clustered index. It's efficient when you need to retrieve all rows of a table in the same order as the clustered index.


5. **Clustered Index Seek:**
   - A clustered index seek involves directly accessing specific rows in a table using the clustered index key.
   - It's efficient for queries that can be satisfied using the clustered index and requires retrieving a small number of rows.


6. **RID (Row ID) Lookup:**
   - A RID lookup occurs when a non-clustered index is used to locate rows in a table, and then a separate lookup is performed using the Row ID to retrieve the actual data.
   - It can be less efficient because of the additional lookup step.


7. **Key Lookup (Bookmark Lookup):**
   - A key lookup is similar to a RID lookup but is used when the non-clustered index includes the columns required by the query.
   - It's often slower than an index seek due to the additional I/O involved.


8. **Covering Index Scan:**
   - A covering index scan occurs when all the required columns for a query are available in the index itself.
   - It can be very efficient since it avoids accessing the main table.


9. **Heap Scan:**
   - A heap scan is similar to a table scan but is specifically for heap-organized tables (tables without a clustered index).
   - It reads all rows sequentially from the table.


10. **Parallel Scan:**
    - A parallel scan involves distributing the scanning process across multiple processors or threads, improving query performance for large datasets.


11. **Bitmap Scan:**
    - A bitmap scan is used in conjunction with bitmap indexes, where the database system performs bitwise operations to filter rows efficiently.


These are some common types of scans in SQL, and the choice of scan method depends on the query conditions, available indexes, and the database system's optimization capabilities. The database optimizer aims to choose the most appropriate scan method to execute queries efficiently.

### B-Tree index

A B-tree index, often referred to simply as a "B-tree," is a type of index structure commonly used in relational database management systems (RDBMS) to improve the efficiency of querying data. B-trees are designed to provide fast access to data in a sorted order, making them well-suited for scenarios where data needs to be retrieved based on a range of values.

Here's an overview of what a B-tree index is and how it works:

**B-Tree Index:**

A B-tree is a self-balancing tree structure where each node can have multiple child nodes. The name "B-tree" stands for "balanced tree," and the structure maintains balance by redistributing data between nodes as data is inserted or deleted.

**Characteristics of B-Tree Index:**

1. **Sorted Order:** B-trees maintain data in a sorted order based on the indexed columns. This enables efficient range-based queries and ordered retrieval.

2. **Balanced Structure:** B-trees are self-balancing, ensuring that the height of the tree remains relatively small. This ensures efficient search operations.

3. **Multiple Levels:** B-trees can have multiple levels of nodes, and each level corresponds to a level of precision in the sorting order.

4. **Branching Factor:** Each node in a B-tree can have multiple children, known as the "branching factor." This factor keeps the number of nodes at each level manageable.

5. **Root and Leaf Nodes:** B-trees have a root node at the top, which branches into intermediate nodes, and ultimately into leaf nodes where actual data resides.

6. **Efficient Insertion and Deletion:** B-trees maintain their balance and structure during insertion and deletion operations, optimizing performance.

**How B-Tree Indexes Are Utilized:**

When you create a B-tree index on a column or set of columns in a table, the DBMS creates a separate data structure that organizes the indexed data in a B-tree format. This index structure is then used by the DBMS to quickly locate the rows that satisfy query conditions involving the indexed columns.

B-tree indexes are particularly effective for scenarios where you need to perform range-based searches, such as finding records within a specific date range, or retrieving data in ascending or descending order based on indexed columns.

Example of Creating a B-Tree Index:

```sql
CREATE INDEX idx_sales_date ON sales(sales_date);
```

In this example, an index named "idx_sales_date" is created on the "sales_date" column of the "sales" table. This B-tree index would improve the efficiency of queries involving date-based range searches.

B-tree indexes are a fundamental tool in database optimization, helping to significantly enhance the performance of various types of queries.

---

## MySQL String Functions
- __ASCII__	$\Longrightarrow$ Returns the ASCII value for the specific character

- __CHAR_LENGTH__	$\Longrightarrow$ Returns the length of a string (in characters)

- __CHARACTER_LENGTH__	$\Longrightarrow$ Returns the length of a string (in characters)

- __CONCAT__	$\Longrightarrow$ Adds two or more expressions together

- __CONCAT_WS__	$\Longrightarrow$ Adds two or more expressions together with a separator

- __FIELD__	$\Longrightarrow$ Returns the index position of a value in a list of values

- __FIND_IN_SET__	$\Longrightarrow$ Returns the position of a string within a list of strings

- __FORMAT__	$\Longrightarrow$ Formats a number to a format like "#,###,###.##", rounded to a specified number of decimal places

- __INSERT__	$\Longrightarrow$ Inserts a string within a string at the specified position and for a certain number of characters

- __INSTR__	$\Longrightarrow$ Returns the position of the first occurrence of a string in another string

- __LCASE__	$\Longrightarrow$ Converts a string to lower-case

- __LEFT__	$\Longrightarrow$  Extracts a number of characters from a string (starting from left)

- __LENGTH__	$\Longrightarrow$ Returns the length of a string (in bytes)

- __LOCATE__	$\Longrightarrow$ Returns the position of the first occurrence of a substring in a string

- __LOWER__	$\Longrightarrow$ Converts a string to lower-case

- __LPAD__	$\Longrightarrow$ Left-pads a string with another string, to a certain length

- __LTRIM__	$\Longrightarrow$ Removes leading spaces from a string

- __MID__	$\Longrightarrow$ Extracts a substring from a string (starting at any position)

- __POSITION__	$\Longrightarrow$ Returns the position of the first occurrence of a substring in a string

- __REPEAT__	$\Longrightarrow$ Repeats a string as many times as specified

- __REPLACE__	$\Longrightarrow$ Replaces all occurrences of a substring within a string, with a new substring

- __REVERSE__	$\Longrightarrow$ Reverses a string and returns the result

- __RIGHT__	$\Longrightarrow$ Extracts a number of characters from a string (starting from right)

- __RPAD__	$\Longrightarrow$ Right-pads a string with another string, to a certain length

- __RTRIM__	$\Longrightarrow$ Removes trailing spaces from a string

- __SPACE__	$\Longrightarrow$ Returns a string of the specified number of space characters

- __STRCMP__	$\Longrightarrow$ Compares two strings

- __SUBSTR__	$\Longrightarrow$ Extracts a substring from a string (starting at any position)

- __SUBSTRING__	$\Longrightarrow$ Extracts a substring from a string (starting at any position)

- __SUBSTRING_INDEX__	$\Longrightarrow$ Returns a substring of a string before a specified number of delimiter occurs

- __TRIM__	$\Longrightarrow$ Removes leading and trailing spaces from a string

- __UCASE__	$\Longrightarrow$ Converts a string to upper-case

- __UPPER__	$\Longrightarrow$ Converts a string to upper-case

## MySQL Numeric Functions


- __ABS__	$\Longrightarrow$ 	Returns the absolute value of a number
- __ACOS__	$\Longrightarrow$ 	Returns the arc cosine of a number
- __ASIN__	$\Longrightarrow$ 	Returns the arc sine of a number
- __ATAN__	$\Longrightarrow$ 	Returns the arc tangent of one or two numbers
- __ATAN2__	$\Longrightarrow$ 	Returns the arc tangent of two numbers
- __AVG__	$\Longrightarrow$ 	Returns the average value of an expression
- __CEIL__	$\Longrightarrow$ 	Returns the smallest integer value that is >= to a number
- __CEILING__	$\Longrightarrow$ 	Returns the smallest integer value that is >= to a number
- __COS__	$\Longrightarrow$ 	Returns the cosine of a number
- __COT__	$\Longrightarrow$ 	Returns the cotangent of a number
- __COUNT__	$\Longrightarrow$ 	Returns the number of records returned by a select query
- __DEGREES__	$\Longrightarrow$ 	Converts a value in radians to degrees
- __DIV__	$\Longrightarrow$ 	Used for integer division
- __EXP__	$\Longrightarrow$ 	Returns e raised to the power of a specified number
- __FLOOR__	$\Longrightarrow$   Returns the largest integer value that is <= to a number
- __GREATEST__	$\Longrightarrow$ 	Returns the greatest value of the list of arguments
- __LEAST__	$\Longrightarrow$ 	Returns the smallest value of the list of arguments
- __LN__	$\Longrightarrow$ 	Returns the natural logarithm of a number
- __LOG__	$\Longrightarrow$ 	Returns the natural logarithm of a number, or the logarithm of a number to a specified base
- __LOG10__	$\Longrightarrow$ 	Returns the natural logarithm of a number to base 10
- __LOG2__	$\Longrightarrow$ 	Returns the natural logarithm of a number to base 2
- __MAX__	$\Longrightarrow$ 	Returns the maximum value in a set of values
- __MIN__	$\Longrightarrow$ 	Returns the minimum value in a set of values
- __MOD__	$\Longrightarrow$ 	Returns the remainder of a number divided by another number
- __PI__	$\Longrightarrow$ 	Returns the value of PI
- __POW__	$\Longrightarrow$ 	Returns the value of a number raised to the power of another number
- __POWER__	$\Longrightarrow$ 	Returns the value of a number raised to the power of another number
- __RADIANS__	$\Longrightarrow$ 	Converts a degree value into radians
- __RAND__	$\Longrightarrow$ 	Returns a random number
- __ROUND__	$\Longrightarrow$ 	Rounds a number to a specified number of decimal places
- __SIGN__	$\Longrightarrow$ 	Returns the sign of a number
- __SIN__	$\Longrightarrow$ 	Returns the sine of a number
- __SQRT__	$\Longrightarrow$ 	Returns the square root of a number
- __SUM__	$\Longrightarrow$ 	Calculates the sum of a set of values
- __TAN__	$\Longrightarrow$ 	Returns the tangent of a number
- __TRUNCATE__	$\Longrightarrow$ 	Truncates a number to the specified number of decimal places

## MySQL Advanced Functions

- __BIN__	$\Longrightarrow$ 	Returns a binary representation of a number
- __BINARY__	$\Longrightarrow$ 	Converts a value to a binary string
- __CASE__	$\Longrightarrow$ 	Goes through conditions and return a value when the first condition is met
- __CAST__	$\Longrightarrow$ 	Converts a value (of any type) into a specified datatype
- __COALESCE__	$\Longrightarrow$ 	Returns the first non-null value in a list
- __CONNECTION_ID__	$\Longrightarrow$ 	Returns the unique connection ID for the current connection
- __CONV__	$\Longrightarrow$ 	Converts a number from one numeric base system to another
- __CONVERT__	$\Longrightarrow$ 	Converts a value into the specified datatype or character set
- __CURRENT_USER__	$\Longrightarrow$ 	Returns the user name and host name for the MySQL account that the server used to authenticate the current client
- __DATABASE__	$\Longrightarrow$ 	Returns the name of the current database
- __IF__	$\Longrightarrow$ 	Returns a value if a condition is TRUE, or another value if a condition is FALSE
- __IFNULL__	$\Longrightarrow$ 	Return a specified value if the expression is NULL, otherwise return the expression
- __ISNULL__	$\Longrightarrow$ 	Returns 1 or 0 depending on whether an expression is NULL
- __LAST_INSERT_ID__	$\Longrightarrow$ 	Returns the AUTO_INCREMENT id of the last row that has been inserted or updated in a table
- __NULLIF__	$\Longrightarrow$ 	Compares two expressions and returns NULL if they are equal. Otherwise, the first expression is returned
- __SESSION_USER__	$\Longrightarrow$ 	Returns the current MySQL user name and host name
- __SYSTEM_USER__	$\Longrightarrow$ 	Returns the current MySQL user name and host name
- __USER__	$\Longrightarrow$ 	Returns the current MySQL user name and host name
- __VERSION__	$\Longrightarrow$ 	Returns the current version of the MySQL database