# **Cost of Living Data**

## **Data Exploration and Analysis**



In [8]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

In [3]:
costOfLivingFilepath = os.path.join(os.getcwd(), 'datasets', 'CostOfLivingIndex2021.csv')

costOfLivingDF = pd.read_csv(costOfLivingFilepath, low_memory=False)
costOfLivingDF.columns.to_list()

['City', 'State', 'Cost of Living Index']

### 1. Check for missing values ###

In [4]:
costOfLivingDF.isnull().sum()

City                    0
State                   0
Cost of Living Index    0
dtype: int64

#### 2. Formatting Data ####
- City and State are categorical variables and need to be encoded
- The Cost of Living Index is numerical and should be treated as a continuous feature

#### 3. Feature Engineering ####
Combine City and State into one feature as Location

In [5]:
costOfLivingDF['Location'] = costOfLivingDF['City'] + ", " + costOfLivingDF['State']

In [6]:
costOfLivingDF['Location']

0         Abilene, TX
1          Adrian, MI
2           Akron, OH
3      Alamogordo, NM
4          Albany, GA
            ...      
505      Wheeling, WV
506    New London, CT
507        Daphne, AL
508      Victoria, TX
509      Aberdeen, WA
Name: Location, Length: 510, dtype: object

#### 4. Convert Categorical Features ####
Location is a categorial variable so we need to one-hot encode or label encode into numerical format

In [9]:
# Label encoding
label_encoder = LabelEncoder()
costOfLivingDF['Location_encoded'] = label_encoder.fit_transform(costOfLivingDF['Location'])

In [12]:
costOfLivingDF.head(10)

Unnamed: 0,City,State,Cost of Living Index,Location,Location_encoded
0,Abilene,TX,89.1,"Abilene, TX",1
1,Adrian,MI,90.5,"Adrian, MI",2
2,Akron,OH,89.4,"Akron, OH",3
3,Alamogordo,NM,85.8,"Alamogordo, NM",4
4,Albany,GA,87.3,"Albany, GA",5
5,Albany,OR,105.4,"Albany, OR",7
6,Albany,NY,100.1,"Albany, NY",6
7,Albertville,AL,90.9,"Albertville, AL",8
8,Albuquerque,NM,92.9,"Albuquerque, NM",9
9,Alexandria,LA,86.2,"Alexandria, LA",10
