<a href="https://colab.research.google.com/github/josamontiel/boston-median-house-prices/blob/main/boston_median_home_prices_2021_2022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [138]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from pandas import read_csv # Importing the read_csv method 
                            # as it will allow us to write cleaner code
%matplotlib inline
from pylab import *
from natsort import index_natsorted
import seaborn as sns
sns.set_theme(style="ticks")
import sqlite3

# Connect to a database w sqlite3
connection = sqlite3.connect('mydatbase.db')
crsr_for_database = connection.cursor()
print("Connected to 'mydatabase.db'")
        
# Assigning variable to the datasets
# These names will represent the three different 
# data sets used to create this project

single_family_data = read_csv('/content/datasets/single_family_home.csv')
condo_data = read_csv('/content/datasets/median_condo_price.csv')
black_and_latino_data = read_csv('/content/datasets/black_and_latino_mortgage_rates.csv')

# Format floats to show 2 decimal places for '% Change' col
pd.options.display.float_format = '{:.2f}'.format

# Prints a message to show that everything is loaded up
print("All systems GO!")

Connected to 'mydatabase.db'
All systems GO!


In [139]:
# Some formatting before we begin

# convert inpouts in the '% change' col to float type
# Removing of the % symbol needs to happen first
single_family_data['% Change'] = single_family_data['% Change'].str[:-1]

# Converts the type to a float
single_family_data['% Change'] = single_family_data['% Change'].astype(float)


# Format floats to show 2 decimal places for '% Change' col
pd.options.display.float_format = '{:.2f}'.format

## Single Family Median Price Data:

In this section we will be combing through the single family median price dataset for the months of Jan-Jun of 2021 and the same months of 2022.

We have no aim as to what we hope to uncover, only merely to present the data in such a way that it makes full sense to the end user. 

In [140]:
single_family_data.dtypes

Municipality                           object
Communities Type                       object
Median Sale Price, Jan - June 2021      int64
Median Sale Price, Jan - June 2022      int64
% Change                              float64
Notes                                  object
dtype: object

In [141]:
single_family_data.describe()

Unnamed: 0,"Median Sale Price, Jan - June 2021","Median Sale Price, Jan - June 2022",% Change
count,147.0,147.0,147.0
mean,721655.58,805912.24,10.79
std,380077.98,474227.22,9.58
min,347500.0,370250.0,-21.2
25%,491250.0,547000.0,5.65
50%,615000.0,660000.0,11.2
75%,801250.0,865000.0,15.6
max,3462500.0,4400000.0,44.4


In [142]:

pd.unique(single_family_data['Communities Type'])


array(['Metro Core Communities', 'Regional Urban Centers',
       'Streetcar Suburbs', 'Developing Suburbs', 'Maturing Suburbs',
       'Rural Towns'], dtype=object)

In [143]:
# Removing NaN values by removing the 'Notes column 
# as those were the only 'NaN' values in the entire data set.
# The other two data sets were complete and did not need to be amended

# Prints off all the columns and the amount of NaN vals for each
# As shown below, There are no more 'NaN' values 
# as we removed the 'Notes'
print(f"Total number of 'NaN' inputs: \n\n{single_family_data.isna().sum()}\n")

# Shows all values which may be 'Null'
print(f"Total number of 'Null' inputs: \n{single_family_data.isnull().sum()}\n")

# Prints all of the coumn names
print(f"\n{single_family_data.columns}")

# Used the below command to remove the notes column as all of the values were 'NaN'
# If you do not run all cells it will trigger an error when re run

del single_family_data['Notes']

Total number of 'NaN' inputs: 

Municipality                            0
Communities Type                        0
Median Sale Price, Jan - June 2021      0
Median Sale Price, Jan - June 2022      0
% Change                                0
Notes                                 146
dtype: int64

Total number of 'Null' inputs: 
Municipality                            0
Communities Type                        0
Median Sale Price, Jan - June 2021      0
Median Sale Price, Jan - June 2022      0
% Change                                0
Notes                                 146
dtype: int64


Index(['Municipality', 'Communities Type',
       'Median Sale Price, Jan - June 2021',
       'Median Sale Price, Jan - June 2022', '% Change', 'Notes'],
      dtype='object')


In [144]:
# testing the columns post deletion of 'Notes' col
print(single_family_data.columns)

Index(['Municipality', 'Communities Type',
       'Median Sale Price, Jan - June 2021',
       'Median Sale Price, Jan - June 2022', '% Change'],
      dtype='object')


In [145]:
# Need to remove % character from all values in the '% Change column before converting the type to a float

In [146]:
# Showing the column names to get a little 
# more insight into the values we will be seeing
def show_column_names():

  # Single family
  print("\nSngle family data:\n")
  for col in single_family_data.columns:
    print(col)

  # Condo
  print("\nCondo data:\n")
  for col in condo_data.columns:
    print(col)

  # Black And Latino mortgages
  print("\nBlack and Latino data:\n")
  for col in black_and_latino_data.columns:
    print(col)

show_column_names()


Sngle family data:

Municipality
Communities Type
Median Sale Price, Jan - June 2021
Median Sale Price, Jan - June 2022
% Change

Condo data:

Town
Community Type
Condo.2021
Condo.2022
Condo.PercChange

Black and Latino data:

Municipality
Community Type
Percent of Home Loans to Black and Latino Buyers


#### Showing the Head and Tail

In [147]:
# This command will show us the first 5 rows of data
# The data is sorted in no particular order
# Just the order they were entered into the CSV file

single_family_data.head()

Unnamed: 0,Municipality,Communities Type,"Median Sale Price, Jan - June 2021","Median Sale Price, Jan - June 2022",% Change
0,Boston,Metro Core Communities,3462500,4400000,27.1
1,Cambridge,Metro Core Communities,1537500,1775000,15.4
2,Lowell,Regional Urban Centers,415000,439000,5.8
3,Brockton,Regional Urban Centers,377500,430000,13.9
4,Quincy,Regional Urban Centers,605000,640000,5.8


In [148]:
# This command will show us the last 5 rows of data
single_family_data.tail()

Unnamed: 0,Municipality,Communities Type,"Median Sale Price, Jan - June 2021","Median Sale Price, Jan - June 2022",% Change
142,Essex,Developing Suburbs,625750,670000,7.1
143,Dunstable,Developing Suburbs,617500,690000,11.7
144,Nahant,Maturing Suburbs,805000,1030000,28.0
145,Ashby,Rural Towns,350000,370250,5.8
146,Plympton,Developing Suburbs,487500,525000,7.7


In [149]:
single_family_start_price = single_family_data.sort_values(by="Median Sale Price, Jan - June 2021", ascending=False)
print(single_family_start_price)
single_family_start_price.head()

    Municipality        Communities Type  Median Sale Price, Jan - June 2021  \
0         Boston  Metro Core Communities                             3462500   
100       Weston        Maturing Suburbs                             1852500   
13     Brookline       Streetcar Suburbs                             1850000   
41     Wellesley        Maturing Suburbs                             1665000   
1      Cambridge  Metro Core Communities                             1537500   
..           ...                     ...                                 ...   
113     Townsend      Developing Suburbs                              378000   
3       Brockton  Regional Urban Centers                              377500   
6       Lawrence  Regional Urban Centers                              360000   
145        Ashby             Rural Towns                              350000   
61       Wareham      Developing Suburbs                              347500   

     Median Sale Price, Jan - June 2022

Unnamed: 0,Municipality,Communities Type,"Median Sale Price, Jan - June 2021","Median Sale Price, Jan - June 2022",% Change
0,Boston,Metro Core Communities,3462500,4400000,27.1
100,Weston,Maturing Suburbs,1852500,2182500,17.8
13,Brookline,Streetcar Suburbs,1850000,2542000,37.4
41,Wellesley,Maturing Suburbs,1665000,2055000,23.4
1,Cambridge,Metro Core Communities,1537500,1775000,15.4
