# Python Phase - Reseller

This code will be the skeleton part for our Construc Week Project Approach. This file consists of all the major part of the analysis takes place and finally, connecting MySQL databases for it to import all the files and therefore creating a final dashboard. 

For the following phase, we have a total number of 7 datasets in which all of them are unclean meaning, they are not aligned and have a clustered set of results. In order to get ahead of it, each particular dataset has been arranged to ensure the data has been assigned to their particular columns. 

We shall now begin the basic EDA (Exploratory Data Analysis) and ensure each dataset has been cleaned and is set to be used in creating a database and then the dashboard.

In [76]:
# Importing all the essential libraries for the analysis to be done.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Importing libraries to connect and inject all the values from the dataset into the database server.
from sqlalchemy import create_engine, text
import mysql.connector

In [77]:
# Creating a connector so that the server can be connected here.
db_connector = mysql.connector.connect(
    host = "127.0.0.1",       
    username = "root",
    password = "MySQL12345",
    database = "patternseekers"
)

# A custom message that displays if the operation has been successful.
print(f"You have successfully connected to your database.")

You have successfully connected to your database.


In [78]:
# This engine will be another verification so that all the records made here can be added into the database.
engine = create_engine(f"mysql+mysqlconnector://{"root"}:{"MySQL12345"}@{"127.0.0.1"}/{"patternseekers"}")
print("The connection to the MySQL Engine is now functional.")

The connection to the MySQL Engine is now functional.


In [79]:
# Importing the dataset from its diretory path.
reseller_df = pd.read_csv('Resellers [FIXED].csv')

# Displaying the dataset.
reseller_df

Unnamed: 0,ResellerKey,Business Type,Reseller,City,State-Province,Country-Region
0,277,Specialty Bike Shop,The Bicycle Accessories Company,Alhambra,California,United States
1,455,Value Added Reseller,Timely Shipping Service,Alpine,California,United States
2,609,Value Added Reseller,Good Toys,Auburn,California,United States
3,492,Specialty Bike Shop,Basic Sports Equipment,Baldwin Park,California,United States
4,365,Specialty Bike Shop,Distinctive Store,Barstow,California,United States
...,...,...,...,...,...,...
696,340,Value Added Reseller,Nearby Cycle Shop,West Sussex,England,United Kingdom
697,106,Value Added Reseller,West Side Mart,Wokingham,England,United Kingdom
698,448,Warehouse,Action Bicycle Specialists,Woolston,England,United Kingdom
699,358,Value Added Reseller,Mail Market,York,England,United Kingdom


In [80]:
# Displaying basic information based on the dataset we have.
reseller_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 701 entries, 0 to 700
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   ResellerKey     701 non-null    int64 
 1   Business Type   701 non-null    object
 2   Reseller        701 non-null    object
 3   City            701 non-null    object
 4   State-Province  701 non-null    object
 5   Country-Region  701 non-null    object
dtypes: int64(1), object(5)
memory usage: 33.0+ KB


In [81]:
# Looking for NULL values if they are available in the dataset.
reseller_df.isnull().sum()

ResellerKey       0
Business Type     0
Reseller          0
City              0
State-Province    0
Country-Region    0
dtype: int64

In [82]:
# Searching for duplicate values in the dataset (if exists).
reseller_df.duplicated().sum()

np.int64(0)

In [83]:
# Identifying the data types to see what we will be dealing with.
reseller_df.dtypes

ResellerKey        int64
Business Type     object
Reseller          object
City              object
State-Province    object
Country-Region    object
dtype: object

In [84]:
# Checking out the summary of the following dataset in terms of statistics.
reseller_df.describe(include='all')

Unnamed: 0,ResellerKey,Business Type,Reseller,City,State-Province,Country-Region
count,701.0,701,701,701,701,701
unique,,3,699,451,65,6
top,,Value Added Reseller,Friendly Bike Shop,Toronto,California,United States
freq,,238,2,24,78,427
mean,351.0,,,,,
std,202.505555,,,,,
min,1.0,,,,,
25%,176.0,,,,,
50%,351.0,,,,,
75%,526.0,,,,,


In [85]:
# Converting the 'ResellerKey' column to category type to save memory.
reseller_df['ResellerKey'] = reseller_df['ResellerKey'].astype('category')

In [86]:
# Converting 'Business Type', 'City', 'State-Province', 'Country-Region' to title case for consistency.
reseller_df['Business Type'] = reseller_df['Business Type'].str.strip().str.title()
reseller_df['City'] = reseller_df['City'].str.strip().str.title()
reseller_df['State-Province'] = reseller_df['State-Province'].str.strip().str.title()
reseller_df['Country-Region'] = reseller_df['Country-Region'].str.strip().str.title()

In [87]:
# Displaying the first 5 rows to verify if the changes have been taken place.
reseller_df.head()

Unnamed: 0,ResellerKey,Business Type,Reseller,City,State-Province,Country-Region
0,277,Specialty Bike Shop,The Bicycle Accessories Company,Alhambra,California,United States
1,455,Value Added Reseller,Timely Shipping Service,Alpine,California,United States
2,609,Value Added Reseller,Good Toys,Auburn,California,United States
3,492,Specialty Bike Shop,Basic Sports Equipment,Baldwin Park,California,United States
4,365,Specialty Bike Shop,Distinctive Store,Barstow,California,United States


In [93]:
# Changing the title names before pushing it into the database to avoid errors while quering in MySQL.
reseller_df.columns = reseller_df.columns.str.replace(' ', '_')
reseller_df.columns = reseller_df.columns.str.replace('-', '_')

# Pushing all the data into the MySQL database.
reseller_df.to_sql(
    name = 'Resellers',
    con=engine,
    index = False,
    if_exists = 'append'
)

# Custom message to ensure the operation has been completed successfully.
print("Table 'Resellers' has been created and data has been inserted successfully.")

Table 'Resellers' has been created and data has been inserted successfully.


  reseller_df.to_sql(
