<h3>Tool assisted housing prices</h3>

The goal of this notebook is to work through this project a la typical kaggle style approach with a heavy sprinkle of chatGPT's help.
The primary objective is to use chatGPT as a tool.
Some possible questions to answer at the end:

    1. what problems did you have using chatgpt
    2. how did you prompt chatgpt
    3. where did you believe chatgpt helped the most in the process
    4. how would you rate chatgpt's effectiveness in its answers

Below is a list of tasks to accomplish.
I want chatGPT to assist in each of the tasks.
I'll document my prompts (somehow, no idea yet) and answers:

    1. Data setup
    2. EDA
    3. Pipeline
    4. Modeling
    5. Feature engineering
    6. Ensemble

In [12]:
import pandas as pd
from IPython.display import display, HTML # This import comes from a chatGPT response!

In [2]:
df_train = pd.read_csv('data/train.csv')

<h5><b>Data setup</h5></b>


In [3]:
df_train.describe()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice
count,1460.0,1460.0,1201.0,1460.0,1460.0,1460.0,1460.0,1460.0,1452.0,1460.0,...,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0
mean,730.5,56.89726,70.049958,10516.828082,6.099315,5.575342,1971.267808,1984.865753,103.685262,443.639726,...,94.244521,46.660274,21.95411,3.409589,15.060959,2.758904,43.489041,6.321918,2007.815753,180921.19589
std,421.610009,42.300571,24.284752,9981.264932,1.382997,1.112799,30.202904,20.645407,181.066207,456.098091,...,125.338794,66.256028,61.119149,29.317331,55.757415,40.177307,496.123024,2.703626,1.328095,79442.502883
min,1.0,20.0,21.0,1300.0,1.0,1.0,1872.0,1950.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,34900.0
25%,365.75,20.0,59.0,7553.5,5.0,5.0,1954.0,1967.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0,129975.0
50%,730.5,50.0,69.0,9478.5,6.0,5.0,1973.0,1994.0,0.0,383.5,...,0.0,25.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,163000.0
75%,1095.25,70.0,80.0,11601.5,7.0,6.0,2000.0,2004.0,166.0,712.25,...,168.0,68.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,214000.0
max,1460.0,190.0,313.0,215245.0,10.0,9.0,2010.0,2010.0,1600.0,5644.0,...,857.0,547.0,552.0,508.0,480.0,738.0,15500.0,12.0,2010.0,755000.0


Upon using `df_train.describe()` I am met with a monstrous table display.
We'll use chatgpt to circumvent this.
I'll upload a link to the chat for those interested.

So my first prompt was:
We're doing some analysis with python inside of a jupyter notebook. Can you create a way to view tables in a more concise manner?

    It gave me useful info as far as installing pandas goes but doesn't quite give me what I want.

The second prompt:
I already have pandas, thanks. I would like my tables in the output to be more concise. Can you help me out with that?

    This one was more precise. It showed me how I can change my max_rows, max_columns, max_colwidth, and precision but I don't want to truncate my data.

Third prompt:
I don't want to truncate my data, I would like to still view everything without it taking up a lot of space. Is there a function you can provide me to achieve this?

    This one seems interesting. I'll try it out and see what that looks like in the next block

In [14]:
# Set the display width to fit your Jupyter Notebook's cell
pd.set_option('display.width', None)

# Set the maximum number of columns to display
pd.set_option('display.max_columns', None)

# Set the maximum width for each column
pd.set_option('display.max_colwidth', None) # had to change -1 to None, chatGPT made a mistake here

# Display the DataFrame as a string without truncation
print(df_train.describe().to_string())

                Id   MSSubClass  LotFrontage        LotArea  OverallQual  OverallCond    YearBuilt  YearRemodAdd   MasVnrArea   BsmtFinSF1   BsmtFinSF2    BsmtUnfSF  TotalBsmtSF     1stFlrSF     2ndFlrSF  LowQualFinSF    GrLivArea  BsmtFullBath  BsmtHalfBath     FullBath     HalfBath  BedroomAbvGr  KitchenAbvGr  TotRmsAbvGrd   Fireplaces  GarageYrBlt   GarageCars   GarageArea   WoodDeckSF  OpenPorchSF  EnclosedPorch    3SsnPorch  ScreenPorch     PoolArea       MiscVal       MoSold       YrSold      SalePrice
count  1460.000000  1460.000000  1201.000000    1460.000000  1460.000000  1460.000000  1460.000000   1460.000000  1452.000000  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000   1460.000000  1460.000000   1460.000000   1460.000000  1460.000000  1460.000000   1460.000000   1460.000000   1460.000000  1460.000000  1379.000000  1460.000000  1460.000000  1460.000000  1460.000000    1460.000000  1460.000000  1460.000000  1460.000000   1460.000000  1460.000000 

I will say that it does look a lot more concise but it looks pretty ugly. Time to get more precise on my prompting.

4th prompt: This looks a lot better. However, I would like a function that allows me to use this each time and show case the data in scrollable html table.

    Holy smokes, it actually gave a function to use. However, at a glance I can see that I need more arguments for my use cases and it still has a -1 instead of None. Before we use this let's see if chatGPT can fix that mistake after I mention it and allow for more inputs.

5th prompt: Thanks! I noticed that the -1 for pd.set_option results in this error: ValueError: Value must be a nonnegative integer or None.
Can you fix that value error and allow for more inputs such as table name and table ID

    I'm honestly blown away with the response this one gave me. It even goes as far as how to use it and why my extra inputs are useful. Cool. Not sure I like the 1000 in place of -1, I'll stick with None for now. Tool assisted, I still gotta put in work here.