# Primary Key and Foreign Key in SQL 

### Primary Key

A primary key is a field in a table which uniquely identifies each row/record in a database table. Primary keys must contain unique values. A primary key column cannot have NULL values. A table can have only one primary key, which may consist of single or multiple fields. When multiple fields are used as a primary key, they are called a composite key.

### Foreign Key

A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them.

### Why Primary Key and Foreign Key are used?

 - Unique identification of rows in a table. Uniquely identifying rows is important because it enables:
    - Performing efficient row update and delete operations.
    - Improving the speed and performance of queries.
 - Preventing duplicate data from being entered into the database.

 etc.
   

In [1]:
import pandas as pd
import numpy as np

To be remembered!

Pandas doesn't have any direct method to create primary key or foreign key.

In [17]:
# example

# Create a DataFrame for customers (with primary key)
customers_data = {'customer_id': [1, 2, 3],
                  'name': ['Alice', 'Bob', 'Charlie'],
                  'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']}
customers = pd.DataFrame(customers_data)

# Set 'customer_id' as the primary key (we will consider indexes as primary keys)
customers.set_index('customer_id', inplace=True)  

display(customers)

Unnamed: 0_level_0,name,email
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Charlie,charlie@example.com


The unique customer ids contains the necessary information of customers.

In [19]:
# Create a DataFrame for orders (with foreign key)
order_date = ['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-03']
total_order = [1000, 2000, 3000, 4000]
customer_id = [1, 2, 1, 3]
orders_data = {'order_date': order_date,
               'total_order': total_order,
               'customer_id': customer_id}
orders = pd.DataFrame(orders_data)

display(orders)

Unnamed: 0,order_date,total_order,customer_id
0,2020-01-01,1000,1
1,2020-01-01,2000,2
2,2020-01-02,3000,1
3,2020-01-03,4000,3


In the `order` table, the `customer_id` column is the foreign key which references the `customer_id` column of the `customer` table. Based on this, we can join the two tables.

In [20]:
# join customers and orders

merged_data = pd.merge(orders, customers, on='customer_id')

display(merged_data)

Unnamed: 0,order_date,total_order,customer_id,name,email
0,2020-01-01,1000,1,Alice,alice@example.com
1,2020-01-02,3000,1,Alice,alice@example.com
2,2020-01-01,2000,2,Bob,bob@example.com
3,2020-01-03,4000,3,Charlie,charlie@example.com
