<a href="https://colab.research.google.com/github/erena29/Data-Analysis-SQL/blob/main/Real%20Estate%20Property%20Management%20Analysis/Real_Estate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Property Management Data Analysis with SQL**

## Data Import and Database Setup

In [1]:
import pandas as pd
import sqlite3

In [2]:
# Load the SQL extension
%load_ext sql

# Create a SQLite database
%sql sqlite://

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
expense = pd.read_csv('/content/drive/MyDrive/Dataset/real_estate/Expense.csv')
property = pd.read_csv('/content/drive/MyDrive/Dataset/real_estate/Property.csv')
sales = pd.read_csv('/content/drive/MyDrive/Dataset/real_estate/Sales.csv')

In [5]:
%sql drop table if exists exp, property, sales;
# Persist the DataFrames as tables in SQLi
%sql --persist expense
%sql --persist property
%sql --persist sales

 * sqlite://
(sqlite3.OperationalError) near ",": syntax error
[SQL: drop table if exists exp, property, sales;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
 * sqlite://
 * sqlite://
 * sqlite://


'Persisted sales'

## Sample Data

In [6]:
%%sql
SELECT * FROM sales LIMIT 5

 * sqlite://
Done.


index,SaleID,PropertyID_Sale,SaleDate,MeansofSales,ClientID,PaymentStatus
0,1,1,2023-07-14,Broker,131,Paid
1,2,3,2024-04-28,Direct,198,Paid
2,3,4,2022-01-05,Online,123,Pending
3,4,5,2022-02-04,Online,198,Paid
4,5,6,2022-06-17,Online,198,Pending


In [7]:
%%sql
SELECT * FROM property LIMIT 5;

 * sqlite://
Done.


index,PropertyID,Type,SquareFootage,Price,Status
0,10,Townhouse,1200,200000,Sold
1,128,Townhouse,1800,200000,Sold
2,129,Condo,2500,800000,Sold
3,185,Apartment,1000,1000000,Sold
4,3,Townhouse,750,700000,Sold


In [8]:
%%sql
SELECT * FROM expense LIMIT 5;

 * sqlite://
Done.


index,ExpenseID,PropertyID_Expense,ExpenseType,Amount
0,1,1,Maintenance,100000
1,2,3,Maintenance,300000
2,3,4,Maintenance,250000
3,4,5,Renovation,50000
4,5,6,Maintenance,250000


## SQL Analysis

### **Yearly Financial Overview and Property Sales**

In [9]:
%%sql
SELECT
  strftime('%Y', s.SaleDate) AS Year, --mysql: EXTRACT(YEAR FROM s.SaleDate)
  SUM(e.Amount) AS Total_Expense,
  SUM(p.Price) - SUM(e.Amount) AS Total_Income,
  COUNT(CASE WHEN p.Status = 'Sold' THEN 1 END) AS Properties_Sold
FROM sales AS s
LEFT JOIN property AS p ON s.PropertyID_Sale = p.PropertyID
LEFT JOIN expense AS e ON s.PropertyID_Sale = e.PropertyID_Expense
GROUP BY Year
ORDER BY Year;

 * sqlite://
Done.


Year,Total_Expense,Total_Income,Properties_Sold
2022,10350000,19550000,56
2023,10550000,30250000,60
2024,8700000,16800000,44


### **Top 3 Months by Revenue in 2024**

In [10]:
%%sql
SELECT
  strftime('%m', s.SaleDate) AS Month,
  SUM(p.Price) AS Revenue
FROM sales AS s
LEFT JOIN property AS p
  ON s.PropertyID_Sale = p.PropertyID
WHERE strftime('%Y', s.SaleDate) = '2024'
GROUP BY Month
ORDER BY Revenue DESC
LIMIT 3;

 * sqlite://
Done.


Month,Revenue
7,6100000
8,3900000
4,3700000


### **Distribution of Sales by Means of Sale**

In [11]:
%%sql
SELECT
  MeansofSales,
  COUNT(MeansofSales) AS Count
FROM sales
GROUP BY MeansofSales

 * sqlite://
Done.


MeansofSales,Count
Broker,61
Direct,45
Online,54


### **Comparison of Payment Status by Sale Method**

In [12]:
%%sql
SELECT
    MeansofSales,
    COUNT(CASE WHEN PaymentStatus = 'Paid' THEN 1 END) AS Paid_Count,
    COUNT(CASE WHEN PaymentStatus = 'Pending' THEN 1 END) AS Pending_Count,
    ROUND(
        (COUNT(CASE WHEN PaymentStatus = 'Paid' THEN 1 END) * 100.0) / COUNT(*),
        2
    ) AS Paid_Percentage
FROM sales
GROUP BY MeansofSales
ORDER BY MeansofSales;

 * sqlite://
Done.


MeansofSales,Paid_Count,Pending_Count,Paid_Percentage
Broker,23,38,37.7
Direct,17,28,37.78
Online,24,30,44.44


### **Revenue by Property Type and Year**

In [13]:
%%sql
SELECT
  p.Type AS Property_Type,
  SUM(CASE WHEN strftime('%Y', s.SaleDate) = '2022' THEN p.Price ELSE 0 END) AS Revenue_2022,
  SUM(CASE WHEN strftime('%Y', s.SaleDate) = '2023' THEN p.Price ELSE 0 END) AS Revenue_2023,
  SUM(CASE WHEN strftime('%Y', s.SaleDate) = '2024' THEN p.Price ELSE 0 END) AS Revenue_2024,
  SUM(p.Price) AS Total_Revenue
FROM sales AS s
LEFT JOIN property AS p
  ON s.PropertyID_Sale = p.PropertyID
GROUP BY Property_Type
ORDER BY Total_Revenue DESC;

 * sqlite://
Done.


Property_Type,Revenue_2022,Revenue_2023,Revenue_2024,Total_Revenue
Condo,6500000,16800000,7500000,30800000
Single Family,9900000,7800000,7300000,25000000
Apartment,7400000,8900000,6900000,23200000
Townhouse,6100000,7300000,3800000,17200000


### **Months with Revenue Smaller than the Yearly Average**

1.  **Average Yearly Revenue Calculation by Year:**
In this step, a subquery is used within the CTE (Common Table Expression) to calculate the average yearly revenue. The subquery groups by year and month, summing the revenue for each month `Monthly_Revenue`. The outer query then calculates the average revenue for each year `Avg_Yearly_Revenue` by averaging the monthly revenues across all months within the same year.

2.   **Filtering Months with Revenue Below Yearly Average:**
In this step, the filtered_months CTE filters months where the total revenue is below the average yearly revenue calculated in the previous CTE `avg_year`.

3.  **Selecting Months with More than One Revenue Entry Above Yearly Average:**
In this step, the query selects months from the `filtered_months` CTE that have more than one occurrence of revenue data.

In [14]:
%%sql
-- Calculate the Average Yearly Revenue for each year
WITH avg_year AS (
  SELECT
    Year,
    ROUND(AVG(Monthly_Revenue), 2) AS Avg_Yearly_Revenue
  FROM (
    SELECT
      strftime('%Y', SaleDate) AS Year,
      strftime('%m', SaleDate) AS Month,
      SUM(Price) AS Monthly_Revenue
    FROM sales AS s
    LEFT JOIN property AS p
      ON s.PropertyID_Sale = p.PropertyID
    GROUP BY Year, Month
  ) AS Monthly_Revenues
  GROUP BY Year
  ORDER BY Year
)
--Filter the months where revenue is below the yearly average
, filtered_months AS(
  SELECT
    strftime('%Y', SaleDate) AS Year,
    strftime('%m', SaleDate) AS Month,
    SUM(Price) AS Revenue
  FROM sales AS s
  LEFT JOIN property AS p
    ON s.PropertyID_Sale = p.PropertyID
  JOIN avg_year AS avg
    ON strftime('%Y', SaleDate) = avg.Year
  GROUP BY Year, Month, avg.Avg_Yearly_Revenue
  HAVING SUM(p.Price) < avg.Avg_Yearly_Revenue
  ORDER BY Month DESC
)
--Select months that have more than one occurrence
SELECT Month
FROM filtered_months
GROUP BY Month
HAVING COUNT(Month) > 1
ORDER BY Month DESC;

 * sqlite://
Done.


Month
11
10
8
6
5
1


### **Percentage of Sold Properties and Net Income by Price Category**

In [15]:
%%sql
SELECT
    CASE
        WHEN Price BETWEEN 0 AND 300000 THEN '1. Low'
        WHEN Price BETWEEN 300001 AND 500000 THEN '2. Affordable'
        WHEN Price BETWEEN 500001 AND 800000 THEN '3. Mid-Range'
        WHEN Price > 800000 THEN '4. High'
    END AS PriceCategory,
    ROUND(AVG(Price),2) AS Average_Price,
    ROUND(COUNT(CASE WHEN Status = 'Sold' THEN 1 END) * 100.0 / COUNT(*), 2) AS Sold_Percentage,
    SUM(Price) - SUM(Amount) AS Total_Income
FROM property AS p
LEFT JOIN expense AS e ON p.PropertyID = e.PropertyID_Expense
GROUP BY PriceCategory
ORDER BY PriceCategory

 * sqlite://
Done.


PriceCategory,Average_Price,Sold_Percentage,Total_Income
1. Low,250000.0,78.0,4600000
2. Affordable,450000.0,69.57,15400000
3. Mid-Range,677777.78,90.74,27400000
4. High,950000.0,80.0,40300000


The **Mid-Range** category has the highest demand with the largest sold percentage, while the High category leads in total income. This suggests a balanced portfolio should focus on increasing **"Mid-Range" **and **"High"** properties to maximize both sold percentage and revenue. The Affordable category, while lower in sold percentage, still provides significant income and may attract budget-conscious buyers.

### **Total Expenses by Property Type and Expense Category**

In [16]:
%%sql
SELECT
  ExpenseType,
  SUM(Amount) AS Expense
FROM expense
GROUP BY ExpenseType
ORDER BY Expense

 * sqlite://
Done.


ExpenseType,Expense
Renovation,6650000
Maintenance,11450000
Property Taxes,11500000


In [17]:
%%sql
SELECT
    pr.Type AS Property_Type,
    SUM(CASE WHEN ExpenseType = 'Maintenance' THEN e.Amount END) AS Maintenance,
    SUM(CASE WHEN ExpenseType = 'Property Taxes' THEN e.Amount END) AS PropTaxes,
    SUM(CASE WHEN ExpenseType = 'Renovation' THEN e.Amount END) AS Renovation
FROM expense e
LEFT JOIN property pr ON e.PropertyID_Expense = pr.PropertyID
GROUP BY pr.Type

 * sqlite://
Done.


Property_Type,Maintenance,PropTaxes,Renovation
Apartment,2400000,3850000,1700000
Condo,3100000,2300000,1900000
Single Family,3350000,3450000,1600000
Townhouse,2600000,1900000,1450000
