# **real_estate_dummy_data**

---

<br><br><br>

## **Objectives**

---

- explore different ways to apply analytics to some dummy real_estate data
- the data is generated by a script

<br><br><br>

## **Imports / Environment Setup**

---

In [49]:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
import lux
lux.config.default_display = 'lux'
# custom imports
# from fp_data_toolbox import eda, notifier, visualization
# visualization.set_plotly_defaults()
# notifier.setup() #Enable for windows toast notifications on Jupyter cell complete
# Magics env settings...
%matplotlib inline
# env variables
df = pd.DataFrame() # creating empty dataframe variable
params = {} # creating empty parameters dictionary

<br><br><br>

## **Functions**

---

In [50]:
import xlwings as xw
def xw_df_to_table(df, tgt_wb_name, tgt_sheet, tgt_table):
    try: wb = xw.books[tgt_wb_name] # look for the workbook by name
    except: wb = xw.books.active # use the active workbook if the named workbook is not found
    try: ws = wb.sheets[tgt_sheet] # Select the worksheet where the named table is located
    except: ws = wb.sheets.add(tgt_sheet) # add a new sheet if the named sheet is not found
    if tgt_table in [table.name for table in ws.tables]:
        ws.tables[tgt_table].update(df, index=False)
    else: ws.tables.add(source=ws['A1'],name=tgt_table).update(df, index=False)

<br><br><br>

## **Data Generation**

---

Here's a plan for each table, including the types of columns that might be relevant:

- Agents:
    - Agent ID (unique identifier)
    - Name
    - Email
    - Phone
    - Region

- Asset Inventory:
    - Asset ID (unique identifier)
    - Type (e.g., House, Apartment, Land)
    - Location
    - Price
    - Agent ID (link to Agents table)

- Customers:
    - Customer ID (unique identifier)
    - Name
    - Email
    - Phone
    - Preferred Region
    - Agent ID (assigned agent)

- Contractors:
    - Contractor ID (unique identifier)
    - Name
    - Specialization (e.g., Plumbing, Electrical, General)
    - Email
    - Phone

- Purchase Journal:
    - Transaction ID (unique identifier)
    - Customer ID
    - Asset ID
    - Purchase Price
    - Purchase Date
    - Agent ID

In [51]:
from faker import Faker
import pandas as pd
import numpy as np

# Initialize Faker generator
fake = Faker()

# Function to generate random data for Agents
def generate_agents(n=100):
    data = {
        'Agent_ID': range(1, n + 1),
        'Name': [fake.name() for _ in range(n)],
        'Email': [fake.email() for _ in range(n)],
        'Phone': [fake.phone_number() for _ in range(n)],
        'Region': [fake.state() for _ in range(n)],
    }
    return pd.DataFrame(data)

# Function to generate random data for Asset Inventory
def generate_asset_inventory(n=200):
    data = {
        'Asset_ID': range(1, n + 1),
        'Type': [np.random.choice(['House', 'Apartment', 'Land']) for _ in range(n)],
        'Location': [fake.address() for _ in range(n)],
        'Price': [round(np.random.uniform(50000, 500000), 2) for _ in range(n)],
        'Agent_ID': [np.random.randint(1, 101) for _ in range(n)],
    }
    return pd.DataFrame(data)

# Function to generate random data for Customers
def generate_customers(n=150):
    data = {
        'Customer_ID': range(1, n + 1),
        'Name': [fake.name() for _ in range(n)],
        'Email': [fake.email() for _ in range(n)],
        'Phone': [fake.phone_number() for _ in range(n)],
        'Preferred_Region': [fake.state() for _ in range(n)],
        'Agent_ID': [np.random.randint(1, 101) for _ in range(n)],
    }
    return pd.DataFrame(data)

# Function to generate random data for Contractors
def generate_contractors(n=50):
    data = {
        'Contractor_ID': range(1, n + 1),
        'Name': [fake.name() for _ in range(n)],
        'Specialization': [np.random.choice(['Plumbing', 'Electrical', 'General']) for _ in range(n)],
        'Email': [fake.email() for _ in range(n)],
        'Phone': [fake.phone_number() for _ in range(n)],
    }
    return pd.DataFrame(data)

# Function to generate random data for Purchase Journal
def generate_purchase_journal(n=300):
    data = {
        'Transaction_ID': range(1, n + 1),
        'Customer_ID': [np.random.randint(1, 151) for _ in range(n)],
        'Asset_ID': [np.random.randint(1, 201) for _ in range(n)],
        'Purchase_Price': [round(np.random.uniform(50000, 500000), 2) for _ in range(n)],
        'Purchase_Date': [fake.date_between(start_date='-2y', end_date='today') for _ in range(n)],
        'Agent_ID': [np.random.randint(1, 101) for _ in range(n)],
    }
    return pd.DataFrame(data)

In [52]:
# Generating data for each table
agents_df = generate_agents()
asset_inventory_df = generate_asset_inventory()
customers_df = generate_customers()
contractors_df = generate_contractors()
purchase_journal_df = generate_purchase_journal()



<br><br><br>

## **Cleaning**

---

In [53]:
# convert date fields to datetime dtype (for better pandas based analysis)
# eda.cast_as_datetime(df,'shp_dt');

In [54]:
# TODO - fill this in later as I work on more projects
# TODO - At some point, transition the data cleaning operations to their own functions

In [55]:
# Data cleaning operations here

<br><br><br>

## **Export to XL**

---

In [56]:
   
xw_df_to_table(
    agents_df,
    tgt_wb_name = '', # leave blank for active workbook
    tgt_sheet = 'agents_df',
    tgt_table = 'agents_df',
)
xw_df_to_table(
    asset_inventory_df,
    tgt_wb_name = '', # leave blank for active workbook
    tgt_sheet = 'asset_inventory_df',
    tgt_table = 'asset_inventory_df',
)
xw_df_to_table(
    customers_df,
    tgt_wb_name = '', # leave blank for active workbook
    tgt_sheet = 'customers_df',
    tgt_table = 'customers_df',
)
xw_df_to_table(
    contractors_df,
    tgt_wb_name = '', # leave blank for active workbook
    tgt_sheet = 'contractors_df',
    tgt_table = 'contractors_df',
)
xw_df_to_table(
    purchase_journal_df,
    tgt_wb_name = '', # leave blank for active workbook
    tgt_sheet = 'purchase_journal_df',
    tgt_table = 'purchase_journal_df',
)

<br><br><br>

## **EDA**

---

In [None]:
# Data processing operations here

<br><br><br>

## **Feature Engineering**

---

In [None]:
# TODO - fill this in later as I work on more projects

In [None]:
# Data processing operations here

<br><br><br>

## **Predictive Modeling**

---

In [None]:
# TODO - fill this in later as I work on more projects

In [None]:
# Data processing operations here

<br><br><br>

## **Visualization**

---

In [None]:
# TODO - fill this in later as I work on more projects

In [None]:
# Data processing operations here

<br><br><br>

## **Outputs**

---

In [None]:
# Final output operations here

In [None]:
#stop