<a href="https://colab.research.google.com/github/reyhanfisena/fe-architrons-pa/blob/main/ModelAI_PA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Data Set

In [5]:
import pandas as pd

# Load the dataset to understand its structure
df = pd.read_csv('Dataset-FIX.csv')

# Display the first few rows of the dataset and column information
df.head(), df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 151 entries, 0 to 150
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   No                   151 non-null    int64  
 1   Tipe Rumah           151 non-null    object 
 2   Luas Tanah (m²)      151 non-null    int64  
 3   Luas Bangunan (m²)   151 non-null    int64  
 4   Jumlah Lantai        151 non-null    int64  
 5   Jumlah Kamar Tidur   149 non-null    float64
 6   Jumlah Kamar Mandi   149 non-null    float64
 7   Tipe Atap            150 non-null    object 
 8   Tipe Dinding         151 non-null    object 
 9   Tipe Pondasi         151 non-null    object 
 10  Material Utama       150 non-null    object 
 11  Jumlah Tenaga Kerja  150 non-null    float64
 12  Durasi (Hari)        150 non-null    float64
 13  Biaya Proyek (Rp)    149 non-null    object 
dtypes: float64(4), int64(4), object(6)
memory usage: 16.6+ KB


(   No Tipe Rumah  Luas Tanah (m²)  Luas Bangunan (m²)  Jumlah Lantai  \
 0   1    Tipe 54               79                  54              1   
 1   2    Tipe 36               73                  36              1   
 2   3    Tipe 60              190                  60              2   
 3   4    Tipe 54              198                  54              1   
 4   5    Tipe 54              112                  54              2   
 
    Jumlah Kamar Tidur  Jumlah Kamar Mandi        Tipe Atap Tipe Dinding  \
 0                 3.0                 1.0        Atap Seng   Bata Merah   
 1                 2.0                 1.0        Atap Seng   Bata Merah   
 2                 3.0                 3.0    Genteng Beton       Batako   
 3                 3.0                 2.0  Genteng Keramik       Batako   
 4                 2.0                 3.0    Genteng Beton  Bata Ringan   
 
       Tipe Pondasi Material Utama  Jumlah Tenaga Kerja  Durasi (Hari)  \
 0  Beton Bertulang         

# Data Preprocessing

In [6]:
# Step 1: Data Preprocessing

# Handle missing values by filling with median for numerical columns and mode for categorical columns
df['Jumlah Kamar Tidur'].fillna(df['Jumlah Kamar Tidur'].median(), inplace=True)
df['Jumlah Kamar Mandi'].fillna(df['Jumlah Kamar Mandi'].median(), inplace=True)
df['Tipe Atap'].fillna(df['Tipe Atap'].mode()[0], inplace=True)
df['Material Utama'].fillna(df['Material Utama'].mode()[0], inplace=True)
df['Jumlah Tenaga Kerja'].fillna(df['Jumlah Tenaga Kerja'].median(), inplace=True)
df['Durasi (Hari)'].fillna(df['Durasi (Hari)'].median(), inplace=True)
df['Biaya Proyek (Rp)'].fillna(df['Biaya Proyek (Rp)'].mode()[0], inplace=True)

# Clean and convert 'Biaya Proyek (Rp)' to numeric by removing 'Rp' and commas, then casting to float
df['Biaya Proyek (Rp)'] = df['Biaya Proyek (Rp)'].replace({'Rp': '', ',': ''}, regex=True).astype(float)

# Drop non-relevant columns for prediction (e.g., 'No')
df.drop(columns=['No'], inplace=True)

# Convert categorical columns to numeric using one-hot encoding
categorical_columns = ['Tipe Rumah', 'Tipe Atap', 'Tipe Dinding', 'Tipe Pondasi', 'Material Utama']
dataset = pd.get_dummies(df, columns=categorical_columns, drop_first=True)

# Separate features and target variables for both models
X = df.drop(columns=['Durasi (Hari)', 'Biaya Proyek (Rp)'])
y_duration = df['Durasi (Hari)']  # Target for duration prediction
y_cost = df['Biaya Proyek (Rp)']  # Target for cost prediction

# Check the prepared dataset
X.head(), y_duration.head(), y_cost.head()


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Jumlah Kamar Tidur'].fillna(df['Jumlah Kamar Tidur'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Jumlah Kamar Mandi'].fillna(df['Jumlah Kamar Mandi'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never

(  Tipe Rumah  Luas Tanah (m²)  Luas Bangunan (m²)  Jumlah Lantai  \
 0    Tipe 54               79                  54              1   
 1    Tipe 36               73                  36              1   
 2    Tipe 60              190                  60              2   
 3    Tipe 54              198                  54              1   
 4    Tipe 54              112                  54              2   
 
    Jumlah Kamar Tidur  Jumlah Kamar Mandi        Tipe Atap Tipe Dinding  \
 0                 3.0                 1.0        Atap Seng   Bata Merah   
 1                 2.0                 1.0        Atap Seng   Bata Merah   
 2                 3.0                 3.0    Genteng Beton       Batako   
 3                 3.0                 2.0  Genteng Keramik       Batako   
 4                 2.0                 3.0    Genteng Beton  Bata Ringan   
 
       Tipe Pondasi Material Utama  Jumlah Tenaga Kerja  
 0  Beton Bertulang           Kayu                  8.0  
 1        