## Problem 1 : Data Modelling 

Imagine you are designing a database for an e-commerce platform. The database should store information about products, customers, orders, and order items. Each order can contain multiple order items, and each order item is associated with a specific product. Each customer can have multiple orders. Customer details such as shipping address, contact number etc. can change over time. We want to retain the historical information as well in our schema.

1. Design a **star-schema / snowflake schema model** for the above requirements
    1. Use an entity-relationship diagram (ERD) that represents the relationships between these entities
    2. Include the necessary attributes and primary/foreign key relationships. Briefly explain your design choices.
2. Generate and insert sample data in the above model. Include the process and code of generating random data in your submission. You data should have:
    1. At least 2 years of order history
    2. At least 10 products; at least 2 products with variants.
    3. At least 10 customers

### Part 1: Designing 


First we list down all the entities that we have to include 

Entities :

**products** - All the products in our platform

**Customers** - Platform's customers 

**Orders** - The orders that a customer places 

**OrderItems** - The product that a particular order contains 

Conditions : 

Each order can have multiple order items and each order item is associated with a specific product

Each customer can have multiple orders. Customer details such as shipping address, contact number etc. can change over time

## Relationship between Entities : 

**Customer - Orders** : One Customer can have multiple orders but one order can be of one and a particular customer 
[One - Many]

**Orders-Items** : One Order can have multiple orders as well as a particular order can be present in multiple orders 
[Many - One]

**Item-Products**: A particular item is associated with a specific product .
[One - One]

**Customer-Address** - A customer can have multiple addresses but one exact address can be of that one particular customer 
[One-Many]

**Customer-Contacts** - A customer can have multiple contacts but one exact contact no can be of that one particular customer
[One-Many]



Hence we draw the Entity Relationship Diagram as Follows : 

![Image Description](https://storage.googleapis.com/reunion-task/All-files/ER_diagram-1.png)



And the Star Schema will look like : 
![SchemDiagram](https://storage.googleapis.com/reunion-task/All-files/SnowFlakeSchema.jpg)





### Part 2: Generating Data 


In [1]:
pip install mysql-connector-python

Note: you may need to restart the kernel to use updated packages.


In [2]:
import mysql.connector  
import datetime
import random
#Creating the connection object   
cnx = mysql.connector.connect(
    host='34.71.225.65',
    user='root'
)
  
#creating the cursor object  
cur = cnx.cursor()  
  
try:  
    dbs = cur.execute("show databases")  
except:  
    myconn.rollback()  
for x in cur:  
    print(x)  

# Enable autocommit mode
cnx.autocommit = True


('ecommercedb',)
('information_schema',)
('mysql',)
('performance_schema',)
('sys',)


In [None]:
cur.execute("""create database ecommercedb""")

In [3]:
cur.execute("""use ecommercedb""")

In [190]:
# Creating all the tables with the necessary foreign key and primary key constraints according to our Schema diagram :

cur.execute("""CREATE TABLE Products (
product_id int PRIMARY KEY NOT NULL,
product_name VARCHAR(255),
product_price VARCHAR(255),
product_category VARCHAR(255)



)""")



cur.execute("""CREATE TABLE Customers(
cust_id int PRIMARY KEY NOT NULL,
cust_name VARCHAR(255),
cust_email VARCHAR(255));
""")

cur.execute("""CREATE TABLE Orders(
order_id INT PRIMARY KEY NOT NULL,
cust_id INT,
foreign key(cust_id) REFERENCES Customers(cust_id));""")

cur.execute("""CREATE TABLE Contacts(
contact_id INT PRIMARY KEY NOT NULL,
cust_id Int,
FOREIGN key(cust_id) references Customers(cust_id));""")

cur.execute("""CREATE TABLE Address(
address int PRIMARY KEY NOT NULL,
cust_id Int,
FOREIGN key(cust_id) references Customers(cust_id));
""")

cur.execute("""CREATE TABLE Order_item(
order_item_id INT PRIMARY KEY NOT NULL,
order_id INT ,
product_id INT ,
order_quantity INT ,
FOREIGN KEY (order_id) references Orders(order_id),
FOREIGN KEY (product_id) references Products(product_id));
""")

In [207]:
cur.execute("show tables;")

In [208]:
cur.fetchall()

[('Address',),
 ('Contacts',),
 ('Customers',),
 ('Order_item',),
 ('Orders',),
 ('Products',)]

In [197]:
cur.execute("""ALTER TABLE Orders ADD order_date Datetime ;
""")

In [198]:
# Changing the name of the column due to a mistake and adding a new column named city 

cur.execute("""
ALTER TABLE Address CHANGE address address_id INT;
ALTER TABLE Address ADD city VARCHAR(255);""")

In [19]:
# Generating Random Data : 
# let us generate customers : 

customers = []
for i in range(1,11):
    customer_id = i
    customer_name = "Customer" + str(i)
    customer_email = customer_name + "@example.com"
    customers.append((customer_id,customer_name, customer_email))


customers

[(1, 'Customer1', 'Customer1@example.com'),
 (2, 'Customer2', 'Customer2@example.com'),
 (3, 'Customer3', 'Customer3@example.com'),
 (4, 'Customer4', 'Customer4@example.com'),
 (5, 'Customer5', 'Customer5@example.com'),
 (6, 'Customer6', 'Customer6@example.com'),
 (7, 'Customer7', 'Customer7@example.com'),
 (8, 'Customer8', 'Customer8@example.com'),
 (9, 'Customer9', 'Customer9@example.com'),
 (10, 'Customer10', 'Customer10@example.com')]

In [15]:
# Generating Random Products with 3 categories
products = []
for i in range(1,11):
    product_id = i
    product_name = "Product" + str(random.randint(1,11))
    product_price = str(random.randint(10, 1000))
    product_category = random.choice(["Electronics", "Apparel", "Home Goods"])
    products.append((product_id,product_name, product_price, product_category))
    
products

[(1, 'Product9', '206', 'Apparel'),
 (2, 'Product2', '558', 'Home Goods'),
 (3, 'Product8', '701', 'Electronics'),
 (4, 'Product1', '450', 'Apparel'),
 (5, 'Product1', '495', 'Apparel'),
 (6, 'Product11', '411', 'Electronics'),
 (7, 'Product5', '552', 'Electronics'),
 (8, 'Product2', '138', 'Home Goods'),
 (9, 'Product3', '541', 'Apparel'),
 (10, 'Product4', '942', 'Apparel')]

In [41]:
orders = []
for i in range(1,101):
    order_id = i
    cust_id = random.randint(1,9)
    order_date = str(datetime.datetime.now() - datetime.timedelta(days=random.randint(1,365)))
    order_date = order_date[:10]
    orders.append((order_id, cust_id, order_date))

In [225]:
orders

[(1, 2, '2022-09-16'),
 (2, 8, '2022-10-31'),
 (3, 8, '2022-11-04'),
 (4, 8, '2023-04-05'),
 (5, 4, '2022-07-21'),
 (6, 1, '2022-08-16'),
 (7, 10, '2023-03-07'),
 (8, 1, '2022-10-30'),
 (9, 3, '2022-12-08'),
 (10, 4, '2022-10-13'),
 (11, 9, '2023-04-04'),
 (12, 8, '2022-11-05'),
 (13, 9, '2022-07-26'),
 (14, 6, '2022-08-15'),
 (15, 5, '2022-10-11'),
 (16, 7, '2023-01-01'),
 (17, 8, '2022-08-27'),
 (18, 8, '2023-02-11'),
 (19, 8, '2022-07-17'),
 (20, 2, '2023-06-03'),
 (21, 8, '2023-06-02'),
 (22, 8, '2023-06-24'),
 (23, 8, '2023-01-27'),
 (24, 1, '2022-10-25'),
 (25, 4, '2023-04-20'),
 (26, 5, '2022-09-21'),
 (27, 3, '2023-04-16'),
 (28, 9, '2023-05-04'),
 (29, 8, '2023-06-15'),
 (30, 9, '2023-02-11'),
 (31, 7, '2022-09-25'),
 (32, 2, '2022-09-22'),
 (33, 10, '2022-10-30'),
 (34, 1, '2023-03-28'),
 (35, 5, '2022-12-14'),
 (36, 1, '2022-12-26'),
 (37, 1, '2022-07-21'),
 (38, 10, '2022-10-01'),
 (39, 3, '2022-12-19'),
 (40, 8, '2022-11-09'),
 (41, 7, '2023-02-21'),
 (42, 9, '2023-05-15')

In [30]:
contacts = []

for i in range(1,11):
    contact_id = i
    cust_id = random.randint(1,9)
    contact_no = random.randint(100000,500000)
    contacts.append((contact_id,cust_id,contact_no))
    
contacts

[(1, 7, 390728),
 (2, 1, 275902),
 (3, 6, 267036),
 (4, 2, 342702),
 (5, 9, 452523),
 (6, 3, 240069),
 (7, 2, 202327),
 (8, 3, 238306),
 (9, 3, 260018),
 (10, 3, 104306)]

In [39]:
addresses = []
for i in range(1,7):
    address_id = i
    cust_id = cust_id = random.randint(1,9)
    city = "CITY"+str(i)
    addresses.append((address_id,cust_id,city))
    
addresses

[(1, 4, 'CITY1'),
 (2, 2, 'CITY2'),
 (3, 2, 'CITY3'),
 (4, 5, 'CITY4'),
 (5, 2, 'CITY5'),
 (6, 1, 'CITY6')]

In [12]:
order_items = []

order_items_id = []

for i in range(1,50):
    order_items_id.append(i)
    
order_id = []
for i in range(1,50):
    order_id.append(random.randint(11,100))
    
product_id = []

for i in range(1,50):
    product_id.append(random.randint(1,10))
    
order_quantity = []
    
for i in range(1,50):
    order_quantity.append(random.randint(1,15)) 

    

In [13]:
for i in range(0,len(order_items_id)):
    order_items.append((order_items_id[i],order_id[i],product_id[i],order_quantity[i]))
    


In [159]:
order_items

[(1, 36, 6, 1),
 (2, 82, 3, 8),
 (3, 94, 7, 4),
 (4, 98, 9, 12),
 (5, 50, 4, 10),
 (6, 100, 7, 11),
 (7, 61, 10, 11),
 (8, 38, 9, 14),
 (9, 89, 3, 10),
 (10, 96, 1, 12),
 (11, 59, 4, 14),
 (12, 77, 1, 8),
 (13, 11, 6, 6),
 (14, 91, 1, 12),
 (15, 67, 6, 3),
 (16, 76, 8, 7),
 (17, 31, 4, 2),
 (18, 79, 9, 11),
 (19, 47, 10, 2),
 (20, 62, 8, 2),
 (21, 79, 4, 5),
 (22, 43, 3, 5),
 (23, 57, 6, 14),
 (24, 93, 9, 1),
 (25, 47, 7, 8),
 (26, 40, 3, 12),
 (27, 80, 9, 7),
 (28, 23, 2, 13),
 (29, 39, 1, 13),
 (30, 59, 5, 10),
 (31, 32, 10, 14),
 (32, 82, 6, 12),
 (33, 24, 1, 7),
 (34, 47, 4, 1),
 (35, 64, 1, 6),
 (36, 36, 5, 4),
 (37, 93, 5, 12),
 (38, 38, 4, 13),
 (39, 49, 3, 3),
 (40, 80, 10, 7),
 (41, 29, 7, 3),
 (42, 53, 2, 6),
 (43, 92, 5, 1),
 (44, 68, 7, 2),
 (45, 30, 4, 8),
 (46, 11, 6, 1),
 (47, 23, 9, 7),
 (48, 16, 5, 1),
 (49, 40, 4, 10)]

In [20]:
# Close and reopen the cursor
for customer in customers:
    cur.execute("INSERT INTO Customers (cust_id, cust_name, cust_email) VALUES (%s, %s, %s)", customer)

cur.fetchall()  # Fetch and consume the results of the previous query


[]

In [21]:
cur.execute("SELECT * FROM Customers")
cur.fetchall()

[(1, 'Customer1', 'Customer1@example.com'),
 (2, 'Customer2', 'Customer2@example.com'),
 (3, 'Customer3', 'Customer3@example.com'),
 (4, 'Customer4', 'Customer4@example.com'),
 (5, 'Customer5', 'Customer5@example.com'),
 (6, 'Customer6', 'Customer6@example.com'),
 (7, 'Customer7', 'Customer7@example.com'),
 (8, 'Customer8', 'Customer8@example.com'),
 (9, 'Customer9', 'Customer9@example.com'),
 (10, 'Customer10', 'Customer10@example.com')]

In [23]:
# Insert data into Products table
for product in products:
    cur.execute("INSERT INTO Products (product_id, product_name, product_price, product_category) VALUES (%s, %s, %s, %s)", product)

cur.fetchall()  # Fetch and consume the results of the previous query

[]

In [24]:
cur.execute("SELECT * FROM Products")
cur.fetchall()

[(1, 'Product9', '206', 'Apparel'),
 (2, 'Product2', '558', 'Home Goods'),
 (3, 'Product8', '701', 'Electronics'),
 (4, 'Product1', '450', 'Apparel'),
 (5, 'Product1', '495', 'Apparel'),
 (6, 'Product11', '411', 'Electronics'),
 (7, 'Product5', '552', 'Electronics'),
 (8, 'Product2', '138', 'Home Goods'),
 (9, 'Product3', '541', 'Apparel'),
 (10, 'Product4', '942', 'Apparel')]

In [9]:
orders

[(1, 6, '2022-10-11'),
 (2, 2, '2022-08-30'),
 (3, 3, '2023-01-07'),
 (4, 5, '2022-12-08'),
 (5, 4, '2023-04-14'),
 (6, 8, '2023-03-17'),
 (7, 3, '2023-01-15'),
 (8, 9, '2022-10-11'),
 (9, 10, '2022-11-29'),
 (10, 1, '2022-07-17'),
 (11, 6, '2022-12-13'),
 (12, 5, '2023-03-27'),
 (13, 7, '2023-06-01'),
 (14, 8, '2023-02-07'),
 (15, 2, '2022-09-30'),
 (16, 8, '2022-07-18'),
 (17, 4, '2022-09-15'),
 (18, 10, '2022-08-08'),
 (19, 2, '2023-05-18'),
 (20, 2, '2023-02-28'),
 (21, 10, '2023-02-11'),
 (22, 6, '2022-08-22'),
 (23, 9, '2022-07-13'),
 (24, 5, '2023-01-05'),
 (25, 5, '2023-04-09'),
 (26, 7, '2022-12-01'),
 (27, 9, '2022-11-09'),
 (28, 10, '2023-01-02'),
 (29, 10, '2022-10-20'),
 (30, 8, '2023-07-08'),
 (31, 10, '2022-12-04'),
 (32, 9, '2023-06-07'),
 (33, 4, '2023-05-29'),
 (34, 10, '2022-09-20'),
 (35, 3, '2022-12-11'),
 (36, 9, '2023-05-30'),
 (37, 2, '2023-03-24'),
 (38, 5, '2023-01-31'),
 (39, 3, '2022-07-23'),
 (40, 1, '2022-07-22'),
 (41, 7, '2022-07-18'),
 (42, 3, '2023-06-

In [25]:
contacts

[(556058, 8),
 (673544, 10),
 (835786, 7),
 (358407, 3),
 (883157, 2),
 (605788, 10),
 (128568, 5),
 (546285, 8),
 (754539, 4),
 (660995, 1)]

In [None]:
for contact in contacts:
    cur.execute("INSERT INTO Contacts (contact_id, cust_id,contact_no) VALUES (%s, %s,%s)", contact)


In [None]:
for address in addresses:
    cur.execute("INSERT INTO Address (address_id, cust_id,city) VALUES (%s, %s,%s)", address)


In [None]:
for order in orders:
    cur.execute("INSERT INTO Orders (order_id, cust_id, order_date) VALUES (%s, %s, %s)", order)


In [46]:
for i in order_items:
    cur.execute("INSERT INTO Order_item (order_item_id, order_id, product_id, order_quantity) VALUES (%s, %s, %s, %s)", i)


In [49]:
# Select all rows from the Customers table
cur.execute("SELECT * FROM Customers")
customers_data = cur.fetchall()
customers_columns = [desc[0] for desc in cur.description]

# Select all rows from the Products table
cur.execute("SELECT * FROM Products")
products_data = cur.fetchall()
products_columns = [desc[0] for desc in cur.description]

# Select all rows from the Orders table
cur.execute("SELECT * FROM Orders")
orders_data = cur.fetchall()
orders_columns = [desc[0] for desc in cur.description]

# Select all rows from the Contacts table
cur.execute("SELECT * FROM Contacts")
contacts_data = cur.fetchall()
contacts_columns = [desc[0] for desc in cur.description]

# Select all rows from the Order_item table
cur.execute("SELECT * FROM Order_item")
order_items_data = cur.fetchall()
order_items_columns = [desc[0] for desc in cur.description]

# Print the data from each table with column headers
print("Customers Data:")
print(customers_columns)
for row in customers_data:
    print(row)

print("\nProducts Data:")
print(products_columns)
for row in products_data:
    print(row)

print("\nOrders Data:")
print(orders_columns)
for row in orders_data:
    print(row)

print("\nContacts Data:")
print(contacts_columns)
for row in contacts_data:
    print(row)

print("\nOrder_item Data:")
print(order_items_columns)
for row in order_items_data:
    print(row)


Customers Data:
['cust_id', 'cust_name', 'cust_email']
(1, 'Customer1', 'Customer1@example.com')
(2, 'Customer2', 'Customer2@example.com')
(3, 'Customer3', 'Customer3@example.com')
(4, 'Customer4', 'Customer4@example.com')
(5, 'Customer5', 'Customer5@example.com')
(6, 'Customer6', 'Customer6@example.com')
(7, 'Customer7', 'Customer7@example.com')
(8, 'Customer8', 'Customer8@example.com')
(9, 'Customer9', 'Customer9@example.com')
(10, 'Customer10', 'Customer10@example.com')

Products Data:
['product_id', 'product_name', 'product_price', 'product_category']
(1, 'Product9', '206', 'Apparel')
(2, 'Product2', '558', 'Home Goods')
(3, 'Product8', '701', 'Electronics')
(4, 'Product1', '450', 'Apparel')
(5, 'Product1', '495', 'Apparel')
(6, 'Product11', '411', 'Electronics')
(7, 'Product5', '552', 'Electronics')
(8, 'Product2', '138', 'Home Goods')
(9, 'Product3', '541', 'Apparel')
(10, 'Product4', '942', 'Apparel')

Orders Data:
['order_id', 'cust_id', 'order_date']
(1, 3, datetime.datetime(2