**Data EXTRACTION was done manually by downloading the CSV files from municipal open data portals**

In [13]:
import pandas as pd
import numpy as np

In [14]:
df = pd.read_csv("calgary_business_licences.csv")

**Drop columns which are not needed.**

In [15]:
drop_cols = ['COMDISTNM','JOBSTATUSDESC','longitude','latitude','Point']
df = df.drop(drop_cols,axis=1)

**Rename columns.**

In [16]:
df = df.rename(columns={'TRADENAME':'Business Name','LICENCETYPES':'Category','JOBCREATED':'Issue Date','ADDRESS': 'Address'})

**Reorder columns in dataframe. I arbitrarily decided on column ordering based on a gut feeling of what is most intuitive.**

In [17]:
columns = df.columns
df = df[
    [columns[0],columns[2],columns[3],columns[1]]
]

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35715 entries, 0 to 35714
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Business Name  35715 non-null  object
 1   Category       35715 non-null  object
 2   Issue Date     35715 non-null  object
 3   Address        35715 non-null  object
dtypes: object(4)
memory usage: 1.1+ MB


In [19]:
df

Unnamed: 0,Business Name,Category,Issue Date,Address
0,ROOM 210,PERSONAL SERVICE,2015/09/16,#210 815 1 ST SW
1,AIKIDO BOZANKAN,PERSONAL SERVICE,2007/12/05,7004G 5 ST SE
2,CAREWELL PHARMACY,RETAIL DEALER - PREMISES,2017/06/06,#104 580 ACADIA DR SE
3,DUCKTOES COMPUTER SERVICES,MANUFACTURER,2011/01/04,902 CENTRE ST NE
4,CDN CONTROLS,CONTRACTOR (NO PROVINCIAL LICENCE REQUIRED),2018/06/04,#1610 311 6 AV SW
...,...,...,...,...
35710,ST JAMES CORNER RESTAURANT AND IRISH PUB,ALCOHOL BEVERAGE SALES (DRINKING EST/RESTAURANT),2004/08/20,1219 1 ST SW
35711,OUTER SPACE STORAGE,WAREHOUSING,2003/10/15,#100 4279 120 AV SE
35712,EL VAPE,RETAIL DEALER - PREMISES,2021/06/15,399 17 AV SW
35713,HAIRITAGE STUDIO (THE),RETAIL DEALER - PREMISES,2021/11/03,#24 20 DOUGLAS WOODS DR SE


**I chose to include the phone information from the Toronoto dataset. In order to merge all 3 city datasets into 1 unified database, the columns need to be consistent across each table.**

**Adding the missing columns.**

In [20]:
df.insert(0,'License Number',np.nan)
df.insert(4,'Expiry Date',np.nan)
df.insert(5,'Phone Number',np.nan)
df.insert(6,'Phone Ext.',np.nan)
df.insert(7,'Phone Type',np.nan)
df.insert(9,'City','Calgary')

**Modify column dtypes to match their values more appriopriately.**

In [21]:
df['Expiry Date'] = pd.to_datetime(df['Expiry Date'])
df["Issue Date"] = pd.to_datetime(df["Issue Date"])

**Observe 'Issue Date' and 'Expiry Date' are now datetime.**

In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35715 entries, 0 to 35714
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   License Number  0 non-null      float64       
 1   Business Name   35715 non-null  object        
 2   Category        35715 non-null  object        
 3   Issue Date      35715 non-null  datetime64[ns]
 4   Expiry Date     0 non-null      datetime64[ns]
 5   Phone Number    0 non-null      float64       
 6   Phone Ext.      0 non-null      float64       
 7   Phone Type      0 non-null      float64       
 8   Address         35715 non-null  object        
 9   City            35715 non-null  object        
dtypes: datetime64[ns](2), float64(4), object(4)
memory usage: 2.7+ MB


In [23]:
df

Unnamed: 0,License Number,Business Name,Category,Issue Date,Expiry Date,Phone Number,Phone Ext.,Phone Type,Address,City
0,,ROOM 210,PERSONAL SERVICE,2015-09-16,NaT,,,,#210 815 1 ST SW,Calgary
1,,AIKIDO BOZANKAN,PERSONAL SERVICE,2007-12-05,NaT,,,,7004G 5 ST SE,Calgary
2,,CAREWELL PHARMACY,RETAIL DEALER - PREMISES,2017-06-06,NaT,,,,#104 580 ACADIA DR SE,Calgary
3,,DUCKTOES COMPUTER SERVICES,MANUFACTURER,2011-01-04,NaT,,,,902 CENTRE ST NE,Calgary
4,,CDN CONTROLS,CONTRACTOR (NO PROVINCIAL LICENCE REQUIRED),2018-06-04,NaT,,,,#1610 311 6 AV SW,Calgary
...,...,...,...,...,...,...,...,...,...,...
35710,,ST JAMES CORNER RESTAURANT AND IRISH PUB,ALCOHOL BEVERAGE SALES (DRINKING EST/RESTAURANT),2004-08-20,NaT,,,,1219 1 ST SW,Calgary
35711,,OUTER SPACE STORAGE,WAREHOUSING,2003-10-15,NaT,,,,#100 4279 120 AV SE,Calgary
35712,,EL VAPE,RETAIL DEALER - PREMISES,2021-06-15,NaT,,,,399 17 AV SW,Calgary
35713,,HAIRITAGE STUDIO (THE),RETAIL DEALER - PREMISES,2021-11-03,NaT,,,,#24 20 DOUGLAS WOODS DR SE,Calgary


In [25]:
 df.to_csv('/Users/graemebalint/Documents/Python/Jupyter Notebooks/Canadian Businesses/csv_reformatted/calgary_reformatted.csv',index=False)  