##Prerequisites 
To create a conda environment, follow these steps:

1. **Install Conda**: If you haven't already, download and install Anaconda or Miniconda from the official website.

2. **Create a new environment**: Open your terminal or command prompt and run the following command:
    ```
    conda create --name myenv
    ```
    Replace `myenv` with your desired environment name.

3. **Activate the environment**: Use the following command to activate your new environment:
    ```
    conda activate myenv
    ```

4. **Install necessary packages**: You can now install any packages you need using `conda install` or `pip install`. For example:
    ```
    conda install pandas numpy matplotlib scikit-learn
    ```

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

This notebook contains a basic example of a linear regression. The data used for this example is from https://www.kaggle.com/datasets/aravinii/house-price-prediction-treated-dataset?resource=download

In [3]:
# Load the dataset from the CSV file
df = pd.read_csv('data/df_test.csv')

# Display the first few rows of the dataframe
df.head()

Unnamed: 0,date,price,bedrooms,grade,has_basement,living_in_m2,renovated,nice_view,perfect_condition,real_bathrooms,has_lavatory,single_floor,month,quartile_zone
0,2014-09-26,305000.0,2,1,False,76.18046,False,False,True,1,False,True,9,2
1,2014-05-14,498000.0,3,2,True,210.88981,False,False,False,2,True,True,5,2
2,2015-03-23,590000.0,2,4,False,262.91549,False,False,False,2,True,False,3,2
3,2014-07-15,775000.0,3,3,False,159.79316,False,False,False,1,True,False,7,3
4,2015-04-14,350000.0,2,1,False,92.903,False,False,False,1,True,True,4,3


In [8]:
# Convert all boolean columns to numeric
bool_columns = df.select_dtypes(include='bool').columns
df[bool_columns] = df[bool_columns].astype(int)

# Display the first few rows of the dataframe to verify the changes
df.head()

Unnamed: 0,date,price,bedrooms,grade,has_basement,living_in_m2,renovated,nice_view,perfect_condition,real_bathrooms,has_lavatory,single_floor,month,quartile_zone
0,2014-09-26,305000.0,2,1,0,76.18046,0,0,1,1,0,1,9,2
1,2014-05-14,498000.0,3,2,1,210.88981,0,0,0,2,1,1,5,2
2,2015-03-23,590000.0,2,4,0,262.91549,0,0,0,2,1,0,3,2
3,2014-07-15,775000.0,3,3,0,159.79316,0,0,0,1,1,0,7,3
4,2015-04-14,350000.0,2,1,0,92.903,0,0,0,1,1,1,4,3


In [9]:
# Define the feature columns and the target column
X = df.drop(columns=['price', 'date'])
y = df['price']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

Mean Squared Error: 10874224015.367113
R^2 Score: 0.7531813609577539
