SPRINT_8_2 - THE FOLLOWING SCRIPTS ARE THE ONES USED IN POWER BI TO REPLICATE THE VISUALIZATIONS OF THE SPRINT_8_1

Script used to connect to MYSQL and load the database into Power BI. 

In [None]:
from sqlalchemy import create_engine
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

try:
    engine = create_engine("mysql+pymysql://root:0000@localhost:3306/empresa")

    df_transaction = pd.read_sql("SELECT * FROM transaction;", engine)
    df_product = pd.read_sql("SELECT * FROM product;", engine)
    df_user = pd.read_sql("SELECT * FROM user;", engine)
    df_credit_card = pd.read_sql("SELECT * FROM credit_card;", engine)
    df_card_status = pd.read_sql("SELECT * FROM card_status;", engine)
    df_company = pd.read_sql("SELECT * FROM company;", engine)
    df_transaction_product = pd.read_sql("SELECT * FROM transaction_product;", engine)

    # Just to get data into Power BI, send back one table or merge/join others if needed
    dataset = df_transaction

except Exception as e:
    dataset = pd.DataFrame({'error': [str(e)]})

finally:
    try:
        engine.dispose()
    except:
        pass

LEVEL 1

Exercise 1 - A numeric variable.

Because Power BI by default runs a command that deletes rows with the same values, to plot this graphic we had to use a variable that had unique values, so no row would look the same and prevent rows with the same amount value to be deleted, in this case I chose the column "id" from the "transaction" table. Columns used: df_transaction(id) and df_transaction(amount)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Remove rows with missing values in relevant columns
dataset = dataset.dropna(subset=['id', 'amount'])

# Set Seaborn style
sns.set_theme(style='whitegrid')

plt.figure(figsize=(8, 5))
sns.histplot(data=dataset, x='amount', bins=30, kde=True)

plt.title('Distribution of Transaction Amounts')
plt.xlabel('Amount')
plt.ylabel('Frequency')
plt.grid(True)
plt.tight_layout()
plt.show()

Exercise 2 - Two numeric variables

The columns selected in Power BI for this visual are: df_product(price), df_transaction(amount), df_transaction(id) 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(8, 5))
sns.set(style='whitegrid')
sns.scatterplot(data=dataset, x='price', y='amount')
plt.title('Transaction Amount vs Product Price')
plt.tight_layout()
plt.show()

Exercise 3 - One categoric variable.

The columns selected in Power BI for this visual are: df_company(country) and df_company(id)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Count number of companies per country
country_counts = dataset.groupby('country').size().sort_values(ascending=False)

# Plot
country_counts.plot(kind='bar')
plt.title('Number of Companies by Country')
plt.xlabel('Country')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Exercise 4 - One categoric variable and one numeric.

The columns selected in Power BI for this visual are: df_company(country) and df_transaction(amount)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Drop missing data
dataset = dataset.dropna(subset=['country', 'amount'])

# Group and calculate average
avg_amount = dataset.groupby('country')['amount'].mean().sort_values(ascending=False)

avg_amount.plot(kind='bar', title='Average Transaction Amount per Country')
plt.ylabel('Average Amount')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Exercise 5 - Two categoric variable.

The columns selected in Power BI for this visual are: df_company(country), df_transaction(id) and df_transaction(declined)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Replace numeric with categorical status
dataset['status'] = dataset['declined'].apply(lambda x: 'Declined' if x == 1 else 'Accepted')

# Group and pivot
df_country_status = dataset.groupby(['country', 'status']).size().unstack(fill_value=0)

# Plot
ax = df_country_status.plot(kind='bar', color=['blue', 'red'])

plt.title('Accepted and Declined Transactions by Country')
plt.xlabel('Country')
plt.ylabel('Number of Transactions')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Status', loc='upper left')
plt.tight_layout()
plt.show()

Exercise 6 - Three variables.

The columns selected in Power BI for this visual are: df_product(product_name), df_transaction(amount), df_transaction_product(product_id) and df_transaction_product(transaction_id)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Drop rows with missing data
dataset = dataset.dropna(subset=['product_id', 'transaction_id', 'product_name', 'amount'])

# Group by product name and sum the amount
product_sales = dataset.groupby('product_name')['amount'].sum().reset_index()

# Sort and get top 10
top_10 = product_sales.sort_values(by='amount', ascending=False).head(10)

top_10.set_index('product_name')['amount'].plot.pie(
    autopct='%1.1f%%',
    figsize=(8, 8),
    ylabel=''
)

plt.title("Top 10 Products by Total Amount Sold")
plt.tight_layout()
plt.show()

Exercise 7 - One pairplot.

The columns selected in Power BI for this visual are: df_product(price) and df_transaction(amount)

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df_clean = dataset.dropna(subset=['price', 'amount'])

sns.pairplot(df_clean, vars=['price', 'amount'], diag_kind='kde')
plt.tight_layout()
plt.show()

LEVEL 2

Exercise 1 - Correlation of all the numeric variables.

The columns selected in Power BI for this visual are: df_product(price), df_product(weight), df_transaction(amount), df_transaction(lat) and df_transaction(longitude)

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dataset = dataset.dropna(subset=['amount', 'lat', 'longitude', 'price', 'weight'])

# Select the numeric columns
numeric_cols = dataset[['amount', 'lat', 'longitude', 'price', 'weight']]

# Compute the correlation matrix
corr = numeric_cols.corr()

# Plot heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Between Numeric Variables')
plt.tight_layout()
plt.show()