## Notebook Objectives

The objective of this notebook is to transform Walmart location data for the United States and Canada downloaded from [poi factory](http://www.poi-factory.com/poifiles) for later use in the RV Nav API. This CSV was partly cleaned in MS Excel due to its smaller size. MS Excel is still a great tool for data cleaning!

In [1]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('../Data/Walmart_United States & Canada.csv',
                 header=None,
                 names= ['Longitude', 'Latitude', 'Store Type',
                         'Store ID', 'Gas', 'Address', 'City', 'State',
                         'Zip Code', 'Parking', 'Phone Number'])

In [4]:
df.head()

Unnamed: 0,Longitude,Latitude,Store Type,Store ID,Gas,Address,City,State,Zip Code,Parking,Phone Number
0,-149.881507,61.192165,Walmart SC,#2070,,3101 A St,Anchorage,AK,99503,(NOP),(907) 563-5900
1,-149.74255,61.21274,Walmart SC,#4359,,7405 Debarr Rd,Anchorage,AK,99504,(NOP),(907) 339-9039
2,-149.868161,61.140234,Walmart SC,#2071,,8900 Old Seward Hwy,Anchorage,AK,99515,(NOP),(907) 344-5300
3,-149.535324,61.30978,Walmart SC,#2188,,18600 Eagle River Rd,Eagle River,AK,99577,(NOP),(907) 694-9780
4,-147.688971,64.857356,Walmart SC,#2722,,537 Johansen Expy,Fairbanks,AK,99701,,(907) 451-9900


In [6]:
df['Gas'] = df['Gas'].replace('Gas.', 'Gas')

In [7]:
df['Gas'].value_counts()

Gas/Diesel    2148
Gas           1296
Name: Gas, dtype: int64

In [8]:
# Replace NaN values with 'No Gas'
df['Gas'] = df['Gas'].replace(np.nan, 'No Gas')

In [9]:
# Verify results
df['Gas'].value_counts()

No Gas        2916
Gas/Diesel    2148
Gas           1296
Name: Gas, dtype: int64

In [10]:
df.head()

Unnamed: 0,Longitude,Latitude,Store Type,Store ID,Gas,Address,City,State,Zip Code,Parking,Phone Number
0,-149.881507,61.192165,Walmart SC,#2070,No Gas,3101 A St,Anchorage,AK,99503,(NOP),(907) 563-5900
1,-149.74255,61.21274,Walmart SC,#4359,No Gas,7405 Debarr Rd,Anchorage,AK,99504,(NOP),(907) 339-9039
2,-149.868161,61.140234,Walmart SC,#2071,No Gas,8900 Old Seward Hwy,Anchorage,AK,99515,(NOP),(907) 344-5300
3,-149.535324,61.30978,Walmart SC,#2188,No Gas,18600 Eagle River Rd,Eagle River,AK,99577,(NOP),(907) 694-9780
4,-147.688971,64.857356,Walmart SC,#2722,No Gas,537 Johansen Expy,Fairbanks,AK,99701,,(907) 451-9900


In [11]:
# Clean correct parking values 
df['Parking'] = df['Parking'].replace('(NOP)', 'No Parking')
df['Parking'] = df['Parking'].replace(np.nan, 'Parking Available')

In [14]:
# Check Store Type Values
df['Store Type'].value_counts()

Walmart  SC           3568
Murphy USA            1121
Wm Nbrhd Mkt           686
Sam's Club             591
Walmart                364
Wm Pharm/Clinic         16
Walmart Pickup           5
WM Nbrhd Mkt             5
Walmart Superenter       1
Walmart SC               1
Walmart Pharmacy         1
wm Nbrhd Mkt             1
Name: Store Type, dtype: int64

In [15]:
# Clean/Correct Store Types
df['Store Type'] = df['Store Type'].replace('Walmart  SC', 'Walmart Supercenter')
df['Store Type'] = df['Store Type'].replace('Walmart SC', 'Walmart Supercenter')
df['Store Type'] = df['Store Type'].replace('Walmart Superenter', 'Walmart Supercenter')
df['Store Type'] = df['Store Type'].replace('Wm Nbrhd Mkt', 'Walmart Neighborhood Market')
df['Store Type'] = df['Store Type'].replace('Wm Pharm/Clinic', 'Walmart Pharmacy/Clinic')
df['Store Type'] = df['Store Type'].replace('WM Nbrhd Mkt', 'Walmart Neighborhood Market')
df['Store Type'] = df['Store Type'].replace('wm Nbrhd Mkt', 'Walmart Neighborhood Market')

In [16]:
# Verify corrections
df['Store Type'].value_counts()

Walmart Supercenter            3570
Murphy USA                     1121
Walmart Neighborhood Market     692
Sam's Club                      591
Walmart                         364
Walmart Pharmacy/Clinic          16
Walmart Pickup                    5
Walmart Pharmacy                  1
Name: Store Type, dtype: int64

In [18]:
# Check to ensure no null values or Longitude/Latitude columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6360 entries, 0 to 6359
Data columns (total 11 columns):
Longitude       6360 non-null float64
Latitude        6360 non-null float64
Store Type      6360 non-null object
Store ID        6360 non-null object
Gas             6360 non-null object
Address         6360 non-null object
City            6360 non-null object
State           6360 non-null object
Zip Code        6360 non-null object
Parking         6360 non-null object
Phone Number    6360 non-null object
dtypes: float64(2), object(9)
memory usage: 546.6+ KB


In [19]:
# One last verification
df.head()

Unnamed: 0,Longitude,Latitude,Store Type,Store ID,Gas,Address,City,State,Zip Code,Parking,Phone Number
0,-149.881507,61.192165,Walmart Supercenter,#2070,No Gas,3101 A St,Anchorage,AK,99503,No Parking,(907) 563-5900
1,-149.74255,61.21274,Walmart Supercenter,#4359,No Gas,7405 Debarr Rd,Anchorage,AK,99504,No Parking,(907) 339-9039
2,-149.868161,61.140234,Walmart Supercenter,#2071,No Gas,8900 Old Seward Hwy,Anchorage,AK,99515,No Parking,(907) 344-5300
3,-149.535324,61.30978,Walmart Supercenter,#2188,No Gas,18600 Eagle River Rd,Eagle River,AK,99577,No Parking,(907) 694-9780
4,-147.688971,64.857356,Walmart Supercenter,#2722,No Gas,537 Johansen Expy,Fairbanks,AK,99701,Parking Available,(907) 451-9900


Our dataset is now ready to be used in our API!

In [20]:
df.to_csv('Walmart_API.csv')