# 🧩 SQL Query Analysis Summary
This notebook contains five SQL queries used to extract key business insights from the cement production dataset.
Each query is designed to support different aspects of factory performance analysis and Power BI dashboard visualization.

In [29]:
import pandas as pd
import sqlite3
from pathlib import Path

df = pd.read_csv("/content/cleaned_cement_data.csv")

print(df.head())

        month  production  sales  demand  population       gdp  disbusment  \
0  2010-01-01       347.0  322.0   346.0       122.4  182277.0    60314.00   
1  2010-02-01       306.0  285.0   338.0       122.5  181018.0    61213.92   
2  2010-03-01       236.0  245.0   276.0       122.6  179759.0    62113.83   
3  2010-04-01       234.0  212.0   245.0       122.8  178500.0    63013.75   
4  2010-05-01       296.0  289.0   312.0       122.9  177354.0    63913.67   

   interestrate  efficiency  fulfillment  production_gap  
0         10.25    0.927954     0.930636            25.0  
1         10.33    0.931373     0.843195            21.0  
2         10.42    1.038136     0.887681            -9.0  
3         10.50    0.905983     0.865306            22.0  
4         10.58    0.976351     0.926282             7.0  


In [30]:
conn = sqlite3.connect("cement_factory.db")
df.to_sql("cement_sales", conn, index=False, if_exists="replace")

print("✅ Database created & data loaded successfully!")

✅ Database created & data loaded successfully!


#1. KPI Efficiency `kpi_efficiency.csv`

**Purpose:**  
Calculate the monthly average of production efficiency and demand fulfillment rates.

**Formulas:**  
- `Efficiency = Sales ÷ Production`  
- `Fulfillment = Sales ÷ Demand`  

**Insight:**  
Shows whether the factory consistently meets production targets and market demand.  
- Low Efficiency → production shortfall or equipment downtime  
- Low Fulfillment → weak market response or supply chain issue  

In [42]:
# kpi_efficiency.sql

query = """
SELECT
    strftime('%Y-%m', month) AS period,
    ROUND(AVG(efficiency)*100, 2) AS avg_efficiency,
    ROUND(AVG(fulfillment)*100, 2) AS avg_fulfillment
FROM cement_sales
GROUP BY strftime('%Y-%m', month)
ORDER BY period;

"""

result = pd.read_sql_query(query, conn)
display(result.head())

result.to_csv("/content/kpi_efficiency.csv", index=False, encoding="utf-8-sig")
print("✅ Saved: kpi_efficiency.csv")

Unnamed: 0,period,avg_efficiency,avg_fulfillment
0,2010-01,92.8,93.06
1,2010-02,93.14,84.32
2,2010-03,103.81,88.77
3,2010-04,90.6,86.53
4,2010-05,97.64,92.63


✅ Saved: kpi_efficiency.csv


# 2. Yearly Summary `yearly_summary.csv`

**Purpose:**  
Summarize annual totals for production, sales, and average efficiency.  

**Insight:**  
Helps compare performance across years and evaluate long-term production trends.  
It highlights which years the factory performed best and which years need improvement.

In [43]:
# yearly_summary.sql

query = """
SELECT
    CAST(strftime('%Y', month) AS INTEGER) AS year,
    ROUND(SUM(production), 2) AS total_production,
    ROUND(SUM(sales), 2) AS total_sales,
    ROUND(AVG(efficiency)*100, 2) AS avg_efficiency,
    ROUND(AVG(fulfillment)*100, 2) AS avg_fulfillment
FROM cement_sales
GROUP BY CAST(strftime('%Y', month) AS INTEGER)
ORDER BY year;

"""

result = pd.read_sql_query(query, conn)
display(result.head())

result.to_csv("/content/yearly_summary.csv", index=False, encoding="utf-8-sig")
print("✅ Saved: yearly_summary.csv")

Unnamed: 0,year,total_production,total_sales,avg_efficiency,avg_fulfillment
0,2010,3182.0,3019.0,95.74,92.35
1,2011,4013.0,3780.0,94.76,95.86
2,2012,4621.0,4525.0,98.09,97.32
3,2013,5272.0,5132.0,96.66,99.36
4,2014,5327.0,5197.0,97.8,100.15


✅ Saved: yearly_summary.csv


# 3. GDP Correlation `gdp_correlation.csv`

**Purpose:**  
Evaluate the statistical relationship between economic factors (GDP, Interest Rate) and cement sales.

**Insight:**  
- **Positive correlation** between GDP and sales → sales grow with economic expansion  
- **Negative correlation** between interest rate and sales → high interest rates slow down housing demand

In [45]:
# gdp_correlation.sql

query = """
SELECT
    ROUND(
        (COUNT(*) * SUM(gdp * sales) - SUM(gdp) * SUM(sales)) /
        SQRT((COUNT(*) * SUM(gdp * gdp) - SUM(gdp) * SUM(gdp)) *
             (COUNT(*) * SUM(sales * sales) - SUM(sales) * SUM(sales))),
        3
    ) AS corr_gdp_sales,
    ROUND(
        (COUNT(*) * SUM(interestrate * sales) - SUM(interestrate) * SUM(sales)) /
        SQRT((COUNT(*) * SUM(interestrate * interestrate) - SUM(interestrate) * SUM(interestrate)) *
             (COUNT(*) * SUM(sales * sales) - SUM(sales) * SUM(sales))),
        3
    ) AS corr_interest_sales
FROM cement_sales;

"""

result = pd.read_sql_query(query, conn)
display(result.head())

result.to_csv("/content/gdp_correlation.csv", index=False, encoding="utf-8-sig")
print("✅ Saved: gdp_correlation.csv")

Unnamed: 0,corr_gdp_sales,corr_interest_sales
0,0.584,-0.331


✅ Saved: gdp_correlation.csv


# 4. Low Efficiency `low_efficiency.csv`

**Purpose:**  
Identify months where production efficiency fell below 90%.  

**Insight:**  
These months may correspond to machine downtime, supply chain delays, or operational inefficiencies.  
Helps maintenance teams plan predictive maintenance and reduce future losses.


In [46]:
# low_efficiency.sql

query = """
SELECT
    month,
    production,
    sales,
    ROUND(efficiency * 100, 2) AS efficiency_percent
FROM cement_sales
WHERE efficiency < 0.9
ORDER BY efficiency ASC
LIMIT 10;

"""

result = pd.read_sql_query(query, conn)
display(result.head())

result.to_csv("/content/low_efficiency.csv", index=False, encoding="utf-8-sig")
print("✅ Saved: low_efficiency.csv")

Unnamed: 0,month,production,sales,efficiency_percent
0,2019-05-01,369.0,195.0,52.85
1,2019-06-01,323.0,200.0,61.92
2,2019-04-01,464.0,311.0,67.03
3,2019-11-01,771.0,531.0,68.87
4,2019-07-01,360.0,256.0,71.11


✅ Saved: low_efficiency.csv


# 5. Production Gap `production_gap.csv`

**Purpose:**  
Calculate the average difference between produced and sold quantities per month.

**Formula:**  
`Production Gap = Production − Sales`

**Insight:**  
Shows inventory or stock buildup patterns.  
- Positive Gap → overproduction (stock increase)  
- Negative Gap → sales exceeding production (stock depletion)

In [47]:
# production_gap.sql

query = """
SELECT
    strftime('%Y-%m', month) AS period,
    ROUND(AVG(production - sales), 2) AS avg_production_gap
FROM cement_sales
GROUP BY strftime('%Y-%m', month)
ORDER BY period;

"""

result = pd.read_sql_query(query, conn)
display(result.head())

result.to_csv("/content/production_gap.csv", index=False, encoding="utf-8-sig")
print("✅ Saved: production_gap.csv")

Unnamed: 0,period,avg_production_gap
0,2010-01,25.0
1,2010-02,21.0
2,2010-03,-9.0
3,2010-04,22.0
4,2010-05,7.0


✅ Saved: production_gap.csv
