Analytics Programming — D598  
QKN1 — QKN1 Task 2: Coding  
Student - John D. Pickering    
Programming Language - Python  
Other Information: Panda and NumPy  
Date: 9/12/2025

#### Table of Contents  
Task 2
- Step 1:  Import the data file into a data frame. 
- Step 2:  Identify any duplicate rows in the data set.
- Step 3:  Group all IDs by state, then run descriptive statistics (mean, median, min, & max) for all numeric variables by state and store this result as a new data frame
-  Step 4:  Filter the data frame to identify all businesses with debt-to-equity ratios that are negative.
-  Step 5:  Create a new data frame that provides the debt-to-income ratio for every business in the data set. Debt-to-income ratio is defined as long-term debt divided by revenue.
-  Step 6:  Concatenate the debt-to-income ratio data frame you created with the original data frame.  

### A: Create a program in Gitlab using Python or R to perform the data analysis described in Task 1.

#### Import Dependencies

In [1]:
# -----------------------------
# Import Dependencies
# -----------------------------
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import zscore
import warnings

warnings.filterwarnings('ignore')

#### Import 'D598 Data Set.xlsx' Into Pandas Dataframe

In [2]:
# -----------------------------
# Step 1. Import the data file into a data frame.
# -----------------------------
df = pd.read_excel('D598 Data Set.xlsx')
print('Dataset has been imported')


Dataset has been imported


#### Identify Duplicate Rows

In [3]:
# -----------------------------
# Step 2. Identify any duplicate rows in the data set.
# Send duplicates to Excel for review. 
# -----------------------------
duplicates = df[df.duplicated()]

print("Duplicate Rows:")
if duplicates.empty:
    print("0 duplicate rows found.")
else:
    print(f"{len(duplicates)} duplicate rows found. Exporting to Excel for review...")
    # Export duplicate rows to an Excel file
    duplicates.to_excel("duplicates_review.xlsx", index=False)
    print("Duplicate rows exported to 'duplicates_review.xlsx'")

Duplicate Rows:
0 duplicate rows found.


#### Group all Business IDs by state.  Run descriptive Statistics

In [4]:
# -----------------------------
# Step 3. Group all IDs by state, then run descriptive statistics (mean, median, min, & max) 
# for all numeric variables by state and store this result as a new data frame
# -----------------------------
grouped_stats = df.groupby("Business State").agg({
    "Total Long-term Debt": ["mean", "median", "min", "max"],
    "Total Equity": ["mean", "median", "min", "max"],
    "Debt to Equity": ["mean", "median", "min", "max"],
    "Total Liabilities": ["mean", "median", "min", "max"],
    "Total Revenue": ["mean", "median", "min", "max"],
    "Profit Margin": ["mean", "median", "min", "max"],
})

# Reorder MultiIndex so variables come first, then stats
grouped_stats = grouped_stats.swaplevel(axis=1).sort_index(axis=1)

# Style for readability
styled = (
    grouped_stats.style
    .format("{:,.2f}")                           # commas + 2 decimals
    .set_caption("Descriptive Statistics by State")  # table caption
    .highlight_min(color="lightcoral", axis=0)   # highlight min values
    .highlight_max(color="lightgreen", axis=0)   # highlight max values
    .background_gradient(cmap="Blues", axis=0)   # gradient shading
)

styled

Unnamed: 0_level_0,max,max,max,max,max,max,mean,mean,mean,mean,mean,mean,median,median,median,median,median,median,min,min,min,min,min,min
Unnamed: 0_level_1,Debt to Equity,Profit Margin,Total Equity,Total Liabilities,Total Long-term Debt,Total Revenue,Debt to Equity,Profit Margin,Total Equity,Total Liabilities,Total Long-term Debt,Total Revenue,Debt to Equity,Profit Margin,Total Equity,Total Liabilities,Total Long-term Debt,Total Revenue,Debt to Equity,Profit Margin,Total Equity,Total Liabilities,Total Long-term Debt,Total Revenue
Business State,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2
Alabama,0.47,0.51,2858019000.0,3764193000.0,1343464000.0,1256317000.0,0.35,0.4,1441038739.5,1893945500.0,674389000.0,696830858.0,0.35,0.4,1441038739.5,1893945500.0,674389000.0,696830858.0,0.22,0.3,24058479.0,23698000.0,5314000.0,137344716.0
Arizona,1.1,0.21,59153000.0,110938000.0,65088000.0,215580000.0,1.1,0.21,59153000.0,110938000.0,65088000.0,215580000.0,1.1,0.21,59153000.0,110938000.0,65088000.0,215580000.0,1.1,0.21,59153000.0,110938000.0,65088000.0,215580000.0
Arkansas,1.08,0.12,115946000.0,214408000.0,108843000.0,555005000.0,0.84,0.1,108242000.0,213882000.0,89604500.0,478909000.0,0.84,0.1,108242000.0,213882000.0,89604500.0,478909000.0,0.61,0.07,100538000.0,213356000.0,70366000.0,402813000.0
California,1.04,0.51,321175000.0,788811000.0,334997000.0,276869000.0,0.17,0.29,45359816.43,80181000.0,32739642.86,88695592.93,0.15,0.3,14271500.0,16821000.0,4721500.0,37531500.0,-1.37,-0.08,-15691000.0,2658000.0,15000.0,1100539.0
Colorado,1.06,0.66,933971000.0,582275000.0,375322000.0,696473000.0,0.62,0.35,200758906.0,145452750.0,84700250.0,248654209.62,0.87,0.31,60778500.0,65000500.0,29327000.0,202399000.0,0.05,0.12,12845248.0,6787000.0,3375000.0,7616000.0
Connecticut,4.21,0.56,4399091.0,33874000.0,18512000.0,70980512.0,4.21,0.56,4399091.0,33874000.0,18512000.0,70980512.0,4.21,0.56,4399091.0,33874000.0,18512000.0,70980512.0,4.21,0.56,4399091.0,33874000.0,18512000.0,70980512.0
Delaware,0.87,0.41,278773000.0,558749000.0,117592000.0,556236000.0,0.41,0.21,151442750.0,249186250.0,61563250.0,278948750.0,0.36,0.23,139309500.0,199115500.0,60165000.0,256689500.0,0.05,-0.02,48379000.0,39765000.0,8331000.0,46180000.0
Florida,0.96,0.51,192646000.0,94447000.0,45593000.0,34437774.0,0.32,-0.44,54754857.25,27098500.0,12303500.0,26644697.75,0.15,0.3,11887882.0,5500000.0,1809000.0,30178739.0,0.0,-2.85,2597665.0,2947000.0,3000.0,11783539.0
Hawaii,0.52,0.35,1173200000.0,1110400000.0,605500000.0,365200000.0,0.52,0.35,1173200000.0,1110400000.0,605500000.0,365200000.0,0.52,0.35,1173200000.0,1110400000.0,605500000.0,365200000.0,0.52,0.35,1173200000.0,1110400000.0,605500000.0,365200000.0
Idaho,1.01,0.44,426319000.0,376236000.0,4911000.0,2347485000.0,0.51,0.28,215582507.0,191699000.0,2463500.0,1177167075.0,0.51,0.28,215582507.0,191699000.0,2463500.0,1177167075.0,0.0,0.13,4846014.0,7162000.0,16000.0,6849150.0


#### Filter Data frame to identify negative debt-to-equity

In [5]:
# -----------------------------
# Step 4. Filter the data frame to identify all businesses with debt-to-equity ratios that are negative.
# -----------------------------
negative_de_ratio = df[df["Debt to Equity"] < 0]

print("\nBusinesses with Negative Debt-to-Equity Ratios:")
print(negative_de_ratio)


Businesses with Negative Debt-to-Equity Ratios:
     Business ID Business State  Total Long-term Debt  Total Equity  \
18     934562013           Ohio           263880000.0  -111297000.0   
57    8343652013     Washington            10603000.0   -13271658.0   
87    9323722013     California            21560000.0   -15691000.0   
109  10919832013           Utah             2010000.0    -3602481.0   
117  11245242013     California              556000.0    -2063203.0   
142  14535932013        Montana            16459000.0    -3842372.0   
143  14639722013       New York              187000.0   -13037879.0   

     Debt to Equity  Total Liabilities  Total Revenue  Profit Margin  
18        -2.370953        592174000.0      719783000       0.320697  
57        -0.798921         16625000.0        8949401       0.448119  
87        -1.374036         30048000.0       37782000       0.505955  
109       -0.557949          6302000.0       17757388       0.732562  
117       -0.269484        

#### Create a new data frame for debt-to-income ratio

In [6]:
# -----------------------------
# Stepe 5. Create a new data frame that provides the debt-to-income ratio for every business in the data set. 
# Debt-to-income ratio is defined as long-term debt divided by revenue.
# Created dataframe df_dti for Debt to Income Ratio. 
# -----------------------------
# -----------------------------------------------------
# Function: create_debt_to_income
# Creates a DataFrame with Business ID and Debt-to-Income ratio
# Handles division-by-zero by setting result to NaN
# -----------------------------------------------------
def create_debt_to_income(df):
    df_dti = pd.DataFrame()
    df_dti["Business ID"] = df["Business ID"]

    # Use np.where to handle division by zero
    df_dti["Debt-to-Income"] = np.where(
        df["Total Revenue"] == 0,                # Uses 0 as the condition
        np.nan,                                  # If true then update to NaN (Not a Number)
        df["Total Long-term Debt"] / df["Total Revenue"]  # If not 0 then do the math. 
    )

    return df_dti


# Example usage
df_dti = create_debt_to_income(df)

print("\nDebt-to-Income Ratio DataFrame:")
print(df_dti.head())



Debt-to-Income Ratio DataFrame:
   Business ID  Debt-to-Income
0     41872013        0.123500
1     76232013        0.182665
2    160992013        0.049974
3    197452013        0.264664
4    241042013        0.036268


#### Concatenate the debt-to-income ratio to the original data frame. 

In [8]:
# -----------------------------
# Step 6. Concatenate the debt-to-income ratio data frame you created with the original data frame.
# -----------------------------
df_final = pd.concat([df, df_dti["Debt-to-Income"]], axis=1)

print("\nFinal DataFrame with Debt-to-Income Ratio:")
print(df_final.head())


Final DataFrame with Debt-to-Income Ratio:
   Business ID Business State  Total Long-term Debt  Total Equity  \
0     41872013       Kentucky            16889000.0    18046000.0   
1     76232013           Iowa             6252000.0    18293621.0   
2    160992013          Texas            19200000.0   177858000.0   
3    197452013       Delaware           117592000.0   278773000.0   
4    241042013       Illinois             4408000.0    52064000.0   

   Debt to Equity  Total Liabilities  Total Revenue  Profit Margin  \
0        0.935886         25986000.0      136753000       0.023663   
1        0.341758         14474000.0       34226553       0.265015   
2        0.107951         72787000.0      384196000       0.130413   
3        0.421820        558749000.0      444306000       0.196768   
4        0.084665         19898000.0      121541000       0.168305   

   Debt-to-Income  
0        0.123500  
1        0.182665  
2        0.049974  
3        0.264664  
4        0.036268  


### B.  Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.

No sources were used for this programming. 