# Assignment #7 - Data Gathering and Warehousing - DSSA-5102

Instructor: Melissa Laurino</br>
Spring 2025</br>

Name: Thinh Le
</br>
Date: April 05, 2025
<br>
<br>
**At this time in the semester:** <br>
- We have explored a dataset. <br>
- We have cleaned our dataset. <br>
- We created a Github account with a repository for this class and included a metadata read me file about our data. <br>
- We introduced general SQL syntax, queries, and applications in Python.<br>
- Created our own databases from scratch using MySQL Workbench and Python with SQLAlchemy/SQlConnector on our local server and locally on our machine.
<br>

Now we will populate and create **all** tables for our dataset into our database and finalize our ERR diagram.<br>

We created a database three different ways in our previous assignment; One database on our local MySQL server, one test database stored locally that integrates with MySQL and one test database stored only locally as a .db file on your machine. Now we will create all tables and populate your tables with your data from your dataset (Feel free to practice with all methods, but it is encouraged to use the first method that will allow you to create your schema diagram). After populating your database, create a visual database schema diagram in MySQL Workbench. <br>
<br>
Be sure to comment all code. Include a .png image of your database schema from MySQL Workbench in your Blackboard submission or Github repository.

## Load libraries

In [17]:
# Load necessary packages:
# For database server connection
import mysql.connector
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, \
relationship, Session
from sqlalchemy import create_engine, ForeignKey
# Data types in MySQL
# Ref: https://docs.sqlalchemy.org/en/20/dialects/mysql.html#mysql-data-types
from sqlalchemy.dialects.mysql import (
    BIGINT,
    MEDIUMINT,
    SMALLINT,
    INTEGER,
    FLOAT,
    DATETIME,
    TEXT,
    MEDIUMTEXT,
    NVARCHAR,
    VARCHAR,
    DATE
)
# For ata manipulation
import pandas as pd

## Connect to MySQL server

In [21]:
# Connect to the MySQL server
host = '127.0.0.1'
port = 3306
user = 'root'
password = '123456'

mysql_connection = mysql.connector.connect(
    # Server address
    host=host,
    # Port number
    port=port,
    # Username
    user=user,
    # Password
    password=password)

## Create database

In [22]:
# Create a cursor - an object that can execute operations such as SQL statement
cursor = mysql_connection.cursor()

database_name = "thinh_db"
cursor.execute(f"CREATE DATABASE IF NOT EXISTS {database_name}")
# MySQL_SpotifyDatabase will be the name when the database is created.

cursor.close()
mysql_connection.close()

print("Database created successfully in MySQL Workbench!")

Database created successfully in MySQL Workbench!


## Read dataset

In [23]:
df = pd.read_csv('laptop_prices.csv')

View dataset info

In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11768 entries, 0 to 11767
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   brand               11768 non-null  object 
 1   processor           11768 non-null  object 
 2   ram_gb              11768 non-null  float64
 3   storage             11768 non-null  object 
 4   gpu                 11768 non-null  object 
 5   screen_size_inch    11768 non-null  float64
 6   resolution          11768 non-null  object 
 7   battery_life_hours  11768 non-null  float64
 8   weight_kg           11768 non-null  float64
 9   operating_system    11768 non-null  object 
 10  price               11768 non-null  float64
dtypes: float64(5), object(6)
memory usage: 1011.4+ KB


## Create models

Define base model:

In [25]:
class Base(DeclarativeBase):
  pass

Define laptop price model:

In [26]:
class LaptopPrices(Base):
  # Table name
  __tablename__ = 'laptop_prices'

  # Columns
  id: Mapped[int] = mapped_column(SMALLINT(unsigned=True),
                                  primary_key=True,
                                  autoincrement=True)
  brand: Mapped[str] = mapped_column(NVARCHAR(255))
  processor: Mapped[str] = mapped_column(NVARCHAR(255))
  ram_gb: Mapped[float] = mapped_column(FLOAT)
  storage: Mapped[str] = mapped_column(NVARCHAR(255))
  gpu: Mapped[str] = mapped_column(NVARCHAR(255))
  screen_size_inch: Mapped[float] = mapped_column(FLOAT)
  resolution: Mapped[str] = mapped_column(NVARCHAR(255))
  battery_life_hours: Mapped[float] = mapped_column(FLOAT)
  weight_kg: Mapped[float] = mapped_column(FLOAT)
  operating_system: Mapped[str] = mapped_column(NVARCHAR(255))
  price: Mapped[float] = mapped_column(FLOAT)

## Create engine

In [27]:
database_url = f"mysql+mysqlconnector://{user}:{password}@{host}:{port}/{database_name}"

# Create an engine that connect to MySQL server
engine = create_engine(database_url)

print("Engine created successfully!")

Engine created successfully!


## Create table in database

In [28]:
# Create tables if not exist.
Base.metadata.create_all(engine)

## Insert data to table

In [29]:
df.to_sql(LaptopPrices.__tablename__,
          con=engine,
          # Drop table if redo the insert operator
          if_exists="append",
          index=False,
          # Insert 1000 records at a time
          chunksize=1000,
          # Pass multiple values in a single INSERT clause.
          method="multi")

print(f'Successfully inserted all data into {LaptopPrices.__tablename__}')

Successfully inserted all data into laptop_prices


Result:

![Alt text](data-insert-result.png)

## Get database schema

![Table diagram](database_diagram.png)

In [30]:
#Close the database connection :)
engine.dispose()

**MySQL Workbench**<br>
To export your database schema as a .PNG:<br>
->Go to your EER Diagram<br>
->File<br>
->Export<br>
->Export as .PNG