# Individual Milestone

James Huvenaars, 30031411

Data 604, L01

Nov. 20, 2023

Our research project delves into the intricate relationships between demographics, crime, and housing prices in Calgary, seeking to unveil crucial insights for informed decision-making, be it for personal choices or the development of effective policies. 

Understanding the interplay between demographics, crime rates, and housing prices is pivotal, as it can offer valuable perspectives for shaping individual decisions and formulating impactful policies. By unraveling these complex relationships, we aim to contribute to a deeper understanding of the factors influencing Calgary's socio-economic landscape.

Moreover, to enhance transparency and accountability, we have assigned specific team members to oversee each dataset. I will be responsible for the dataset on demographics, found by the City of Calgary (Open Calgary) in a 2019 in census by community (Open Calgary, n.d.). 2019 was chosen as it was the most recent survey with community information available. 

The guiding questions used in our analysis will be as follows. My aim with this individual milestone is to find queries that will supplement and help our final combined analysis. 




Guiding Questions: 

What is the relationship between crime and property assessments? 
- Is there a connection between the historical property assessments and the types of crimes committed in different community districts? If so, what is it?
- Do areas with higher crime rates tend to have lower assessed property values? 
- Do areas with certain crime types tend to have different assessed property values?

What is the relationship between community demographics and property assessments? 
- What is the relationship between gender distribution and property assessments?
- Is there an effect between population density and property assessments? 
- Are property assessments in communities related to age demographics?

What is the relationship between demographics and crime rates? 
- Is crime in communities more prevalent when certain age groups are more present?
- Is there a relationship between property types and crime types/rates? 
- Is there a relationship between gender demographics and crime types/rates?


## Data Preparation

In [3]:
import pandas as pd
import csv
import mysql.connector
from mysql.connector import errorcode

In [4]:
#importing data
data = pd.read_csv('https://raw.githubusercontent.com/ethan2411/Data-603-604/main/604%20Data/Census_by_Community_2019_20231027.csv')
data.head()

Unnamed: 0,CLASS,CLASS_CODE,COMM_CODE,NAME,SECTOR,SRG,COMM_STRUCTURE,CNSS_YR,FOIP_IND,RES_CNT,...,OTHER_5_14,OTHER_15_19,OTHER_20_24,OTHER_25_34,OTHER_35_44,OTHER_45_54,OTHER_55_64,OTHER_65_74,OTHER_75,multipolygon
0,Residential,1,LEG,LEGACY,SOUTH,DEVELOPING,BUILDING OUT,2019,,6420,...,0,0,0,0,0,0,0,0,0,MULTIPOLYGON (((-114.021996041091 50.863078904...
1,Residential,1,HPK,HIGHLAND PARK,CENTRE,BUILT-OUT,1950s,2019,,3838,...,0,0,0,0,0,0,0,0,0,MULTIPOLYGON (((-114.0691626854784 51.09565033...
2,Residential,1,CNS,CORNERSTONE,NORTHEAST,DEVELOPING,2000s,2019,,2648,...,0,0,0,0,0,0,0,0,0,MULTIPOLYGON (((-113.91839732026011 51.1760690...
3,Residential,1,MON,MONTGOMERY,NORTHWEST,BUILT-OUT,1950s,2019,,4515,...,0,0,0,0,0,0,0,0,0,MULTIPOLYGON (((-114.16457918083577 51.0814533...
4,Residential,1,TEM,TEMPLE,NORTHEAST,BUILT-OUT,1960s/1970s,2019,,10977,...,0,0,0,0,0,0,0,0,0,MULTIPOLYGON (((-113.93512706147847 51.0960756...


In [5]:
#checking for null values in the dataframe and seeing if any can be dropped from our analysis. 

missing_val=data.isnull().sum()
non_missing_val= data.notnull().sum()
total_val=data.shape[0]
percentage_missing=missing_val/total_val*100

#finding the columns with any null values (with a percentage >0)
drop_columns= percentage_missing[percentage_missing > 0].index
drop_columns

Index(['SRG', 'FOIP_IND'], dtype='object')

The two rows that are missing data (have null values) are SRG and FOIP_IND. 

FOIP_IND is defined as: Indicates results subject to Freedom of Information and Protection of Privacy Legislation. Freedom of Information and Protection of Privacy rules are applied to the data to ensure that no individual can be identified in any of the data released. (Open Calgary, n.d.)

SRG is defined as: Reflects the yearly development capacity or housing supply as outlined in the Suburban Residential Growth document, the valid values are: BUILT-OUT, DEVELOPING, NON RESIDENTIAL, and N/A. (Open Calgary, n.d.)

In order to properly import the dataframe into SQL we have to remove or deal with nulls. 

In this case, neither SRG nor FOIP_IND are relevant to our analysis based on our goals and guiding questions. Therefore, they can be dropped from our dataframe entirely. Since all other columns don't have any null values, they will be imported into mySQL.

Additionally, the column "multipolygon" was removed as it was taking a large amount of storage and causing import challenges due to it's size. The multipolygon data will be used in our final analysis for visualizations, however it is not relevant to the preliminary analysis. 

In [6]:
#dropping the columns with null values. 
df_dropped = data.drop(columns=drop_columns, inplace=False)
df_dropped = df_dropped.drop(columns="multipolygon")
df_dropped

Unnamed: 0,CLASS,CLASS_CODE,COMM_CODE,NAME,SECTOR,COMM_STRUCTURE,CNSS_YR,RES_CNT,DWELL_CNT,PRSCH_CHLD,...,OTHER_0_4,OTHER_5_14,OTHER_15_19,OTHER_20_24,OTHER_25_34,OTHER_35_44,OTHER_45_54,OTHER_55_64,OTHER_65_74,OTHER_75
0,Residential,1,LEG,LEGACY,SOUTH,BUILDING OUT,2019,6420,2766,850,...,0,0,0,0,0,0,0,0,0,0
1,Residential,1,HPK,HIGHLAND PARK,CENTRE,1950s,2019,3838,2277,325,...,0,0,0,0,0,0,0,0,0,0
2,Residential,1,CNS,CORNERSTONE,NORTHEAST,2000s,2019,2648,1285,199,...,0,0,0,0,0,0,0,0,0,0
3,Residential,1,MON,MONTGOMERY,NORTHWEST,1950s,2019,4515,2013,328,...,0,0,0,0,0,0,0,0,0,0
4,Residential,1,TEM,TEMPLE,NORTHEAST,1960s/1970s,2019,10977,3733,908,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
301,Residual Sub Area,4,01H,01H,WEST,UNDEVELOPED,2019,0,0,0,...,0,0,0,0,0,0,0,0,0,0
302,Residential,1,HID,HIDDEN VALLEY,NORTH,1980s/1990s,2019,11566,3880,762,...,0,6,6,6,0,0,9,0,0,0
303,Residential,1,RIV,RIVERBEND,SOUTHEAST,1980s/1990s,2019,9244,3474,579,...,0,0,0,0,0,0,0,0,0,0
304,Residential,1,RID,RIDEAU PARK,CENTRE,INNER CITY,2019,594,342,0,...,0,0,0,0,0,0,0,0,0,0


## Connecting to mySQL 

In [7]:
filepath = "C:\Users\James\Documents\School\password.txt"

with open(filepath) as f:
   passw = f.read()
    
# attempt a connection
myconnection = mysql.connector.connect(user='james_huvenaars', 
                                       password='passw',
                                       host='datasciencedb2.ucalgary.ca', 
                                       database='james_huvenaars',
                                       allow_local_infile=True)
myconnection

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (1253886526.py, line 1)

In [None]:
# CREATE TABLE STATEMENT
create_statement = '''create table james_huvenaars.calgary_census_2019 (
    CLASS varchar(100),
    CLASS_CODE int,
    COMM_CODE varchar(100),
    NAME varchar(100),
    SECTOR varchar(100),
    COMM_STRUCTURE varchar(100),
    CNSS_YR int,
    RES_CNT int,
    DWELL_CNT int,
    PRSCH_CHLD int,
    ELECT_CNT int,
    EMPLYD_CNT int,
    OWNSHP_CNT int,
    DOG_CNT int,
    CAT_CNT int,
    PUB_SCH int,
    SEP_SCH int,
    PUBSEP_SCH int,
    OTHER_SCH int,
    UNKNWN_SCH int,
    SING_FAMLY int,
    DUPLEX int,
    MULTI_PLEX int,
    APARTMENT int,
    TOWN_HOUSE int,
    MANUF_HOME int,
    CONV_STRUC int,
    COMUNL_HSE int,
    RES_COMM int,
    OTHER_RES int,
    NURSING_HM int,
    OTHER_INST int,
    HOTEL_CNT int,
    OTHER_MISC int,
    APT_NO_RES int,
    APT_OCCPD int,
    APT_OWNED int,
    APT_PERSON int,
    APT_VACANT int,
    APT_UC int,
    APT_NA int,
    CNV_NO_RES int,
    CNV_OCCPD int,
    CNV_OWNED int,
    CNV_PERSON int,
    CNV_VACANT int,
    CNV_UC int,
    CNV_NA int,
    DUP_NO_RES int,
    DUP_OCCPD int,
    DUP_OWNED int,
    DUP_PERSON int,
    DUP_VACANT int,
    DUP_UC int,
    DUP_NA int,
    MFH_NO_RES int,
    MFH_OCCPD int,
    MFH_OWNED int,
    MFH_PERSON int,
    MFH_VACANT int,
    MFH_UC int,
    MFH_NA int,
    MUL_NO_RES int,
    MUL_OCCPD int,
    MUL_OWNED int,
    MUL_PERSON int,
    MUL_VACANT int,
    MUL_UC int,
    MUL_NA int,
    OTH_NO_RES int,
    OTH_OCCPD int,
    OTH_OWNED int,
    OTH_PERSON int,
    OTH_VACANT int,
    OTH_UC int,
    OTH_NA int,
    TWN_NO_RES int,
    TWN_OCCPD int,
    TWN_OWNED int,
    TWN_PERSON int,
    TWN_VACANT int,
    TWN_UC int,
    TWN_NA int,
    SF_NO_RES int,
    SF_OCCPD int,
    SF_OWNED int,
    SF_PERSON int,
    SF_VACANT int,
    SF_UC int,
    SF_NA int,
    OTH_STRTY int,
    DWELSZ_1 int,
    DWELSZ_2 int,
    DWELSZ_3 int,
    DWELSZ_4_5 int,
    DWELSZ_6 int,
    MALE_CNT int,
    FEMALE_CNT int,
    MALE_0_4 int,
    MALE_5_14 int,
    MALE_15_19 int,
    MALE_20_24 int,
    MALE_25_34 int,
    MALE_35_44 int,
    MALE_45_54 int,
    MALE_55_64 int,
    MALE_65_74 int,
    MALE_75 int,
    FEM_0_4 int,
    FEM_5_14 int,
    FEM_15_19 int,
    FEM_20_24 int,
    FEM_25_34 int,
    FEM_35_44 int,
    FEM_45_54 int,
    FEM_55_64 int,
    FEM_65_74 int,
    FEM_75 int,
    MF_0_4 int,
    MF_5_14 int,
    MF_15_19 int,
    MF_20_24 int,
    MF_25_34 int,
    MF_35_44 int,
    MF_45_54 int,
    MF_55_64 int,
    MF_65_74 int,
    MF_75 int,
    OTHER_CNT int,
    OTHER_0_4 int,
    OTHER_5_14 int,
    OTHER_15_19 int,
    OTHER_20_24 int,
    OTHER_25_34 int,
    OTHER_35_44 int,
    OTHER_45_54 int,
    OTHER_55_64 int,
    OTHER_65_74 int,
    OTHER_75 int
    );'''
    

create_cursor = myconnection.cursor()
try:
    create_cursor.execute(create_statement)
except mysql.connector.Error as err:
    if err.errno == errorcode.ER_TABLE_EXISTS_ERROR:
        print("Ooops! We already have that table")
    else:
        print(err.msg)
else:
    print("Table created successfully!")

create_cursor.close()

Ooops! We already have that table


True

In [None]:
insertCursor = myconnection.cursor()

columnString = "`,`".join([str(currentColumn) for currentColumn in df_dropped.columns.tolist()])
#print (columnString)

# inserting rows one by one from the DataFrame is sufficient for now
for i, currentRow in df_dropped.iterrows():
    #print (tuple(currentRow))
    insertCommand = "INSERT INTO `calgary_census_2019` (`" + columnString + "`) VALUES (" + "%s,"*(len(currentRow)-1) + "%s)"
    #print (insertCommand)
    #print(tuple(currentRow))
    insertCursor.execute(insertCommand, tuple(currentRow))
    
myconnection.commit()

insertCursor.close()

True

## Query 1 - Breaking down communities by age demographics 

My initial set of queries is centered around age demographics to unravel their implications on crime rates and property assessments. We are driven by two key guiding questions:

- Are property assessments in communities related to age demographics?
- Is crime in communities more prevalent when certain age groups are more present?

To address these questions, our queries focus on examining the proportion of different age groups in each community. Proportions were favored over counts to alleviate the bias introduced by more populous neighborhoods. The age groups were segmented into four brackets: Children (0-19), Young adults (20-45), Middle-aged adults (46-74), and Elderly (75+).

We identified the top ten communities with the highest proportion of each age group, offering a snapshot of how age demographics vary across neighborhoods.

In our final project, analyzing the correlation between age demographics and property assessments is pivotal. It helps us understand if specific age groups are associated with distinct property values. This insight can guide decisions related to property development, community planning, and investment strategies. Furthermore, exploring the prevalence of different age groups in communities with varying crime rates aims to uncover patterns suggesting correlations between age demographics and crime. This information is essential for informed policy development and targeted interventions.

Should distinct differences emerge within a particular age group, our plan is to expand the analysis. Sub-group analysis, such as comparing people aged 45-54 to those aged 55-64, will provide a more detailed understanding of the data.

### Subquery 1.1 -  Which 10 communites have the highest proportion of people over 75? 

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, (MF_75 + OTHER_75)/RES_CNT AS 'Elderly proportion' FROM calgary_census_2019 ORDER BY (MF_75 + OTHER_75)/RES_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'GREENVIEW INDUSTRIAL PARK', 'Elderly proportion': Decimal('0.4400')}
{'NAME': 'PUMP HILL', 'Elderly proportion': Decimal('0.2232')}
{'NAME': 'CHINATOWN', 'Elderly proportion': Decimal('0.1999')}
{'NAME': 'ST. ANDREWS HEIGHTS', 'Elderly proportion': Decimal('0.1705')}
{'NAME': 'SETON', 'Elderly proportion': Decimal('0.1631')}
{'NAME': 'COUNTRY HILLS VILLAGE', 'Elderly proportion': Decimal('0.1567')}
{'NAME': 'CHRISTIE PARK', 'Elderly proportion': Decimal('0.1541')}
{'NAME': 'PALLISER', 'Elderly proportion': Decimal('0.1536')}
{'NAME': 'SHAWNEE SLOPES', 'Elderly proportion': Decimal('0.1497')}
{'NAME': 'EAGLE RIDGE', 'Elderly proportion': Decimal('0.1325')}


True

### Subquery 1.2 - Which 10 communities have the highest proportion of people under 20?

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, (MF_0_4 + MF_5_14 + MF_15_19 + OTHER_0_4 + OTHER_5_14 + OTHER_15_19)/RES_CNT AS 'Children proportion' FROM calgary_census_2019 ORDER BY (MF_0_4 + MF_5_14 + MF_15_19 + OTHER_0_4 + OTHER_5_14 + OTHER_15_19)/RES_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'UNIVERSITY DISTRICT', 'Children proportion': Decimal('0.4227')}
{'NAME': 'WEST SPRINGS', 'Children proportion': Decimal('0.3644')}
{'NAME': 'COUGAR RIDGE', 'Children proportion': Decimal('0.3513')}
{'NAME': 'AUBURN BAY', 'Children proportion': Decimal('0.3400')}
{'NAME': 'ASPEN WOODS', 'Children proportion': Decimal('0.3369')}
{'NAME': 'TUSCANY', 'Children proportion': Decimal('0.3349')}
{'NAME': 'CITYSCAPE', 'Children proportion': Decimal('0.3328')}
{'NAME': 'MAHOGANY', 'Children proportion': Decimal('0.3306')}
{'NAME': 'EVANSTON', 'Children proportion': Decimal('0.3303')}
{'NAME': 'NEW BRIGHTON', 'Children proportion': Decimal('0.3293')}


True

### Subquery 1.3 - Which 10 communities have the highest proportion of young adults?

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, (MF_20_24 + MF_25_34 + MF_35_44 + OTHER_20_24 + OTHER_25_34 + OTHER_35_44)/RES_CNT AS 'Young adults proportion' FROM calgary_census_2019 ORDER BY (MF_20_24 + MF_25_34 + MF_35_44 + OTHER_20_24 + OTHER_25_34 + OTHER_35_44)/RES_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'UNIVERSITY OF CALGARY', 'Young adults proportion': Decimal('0.9983')}
{'NAME': 'BELTLINE', 'Young adults proportion': Decimal('0.7574')}
{'NAME': 'SUNALTA', 'Young adults proportion': Decimal('0.6971')}
{'NAME': 'CLIFF BUNGALOW', 'Young adults proportion': Decimal('0.6934')}
{'NAME': 'LOWER MOUNT ROYAL', 'Young adults proportion': Decimal('0.6838')}
{'NAME': 'MISSION', 'Young adults proportion': Decimal('0.6783')}
{'NAME': 'BANKVIEW', 'Young adults proportion': Decimal('0.6537')}
{'NAME': 'DOWNTOWN COMMERCIAL CORE', 'Young adults proportion': Decimal('0.6531')}
{'NAME': 'SUNNYSIDE', 'Young adults proportion': Decimal('0.6336')}
{'NAME': 'BANFF TRAIL', 'Young adults proportion': Decimal('0.5935')}


True

### Subquery 1.4 - Which 10 communities have the highest proportion of middle aged adults?

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, (MF_45_54 + MF_55_64 + MF_65_74 + OTHER_45_54 + OTHER_55_64 + OTHER_65_74)/RES_CNT AS 'Middle age adults proportion' FROM calgary_census_2019 ORDER BY (MF_45_54 + MF_55_64 + MF_65_74 + OTHER_45_54 + OTHER_55_64 + OTHER_65_74)/RES_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'EAU CLAIRE', 'Middle age adults proportion': Decimal('0.6591')}
{'NAME': 'SHEPARD INDUSTRIAL', 'Middle age adults proportion': Decimal('0.5686')}
{'NAME': 'DIAMOND COVE', 'Middle age adults proportion': Decimal('0.5594')}
{'NAME': 'GREENWOOD/GREENBRIAR', 'Middle age adults proportion': Decimal('0.5558')}
{'NAME': 'BEL-AIRE', 'Middle age adults proportion': Decimal('0.5308')}
{'NAME': 'BRITANNIA', 'Middle age adults proportion': Decimal('0.5261')}
{'NAME': 'RIDEAU PARK', 'Middle age adults proportion': Decimal('0.5253')}
{'NAME': 'EAGLE RIDGE', 'Middle age adults proportion': Decimal('0.5199')}
{'NAME': 'HAMPTONS', 'Middle age adults proportion': Decimal('0.5054')}
{'NAME': 'RED CARPET', 'Middle age adults proportion': Decimal('0.4944')}


True

## Query 2 - Gender Breakdown by Community

Our subsequent set of queries focuses on gender demographics, aiming to shed light on their influence on crime rates and property assessments. We are guided by two key questions:

- Is there a relationship between gender demographics and crime types/rates?**
- What is the relationship between gender distribution and property assessments?**

To address these questions, we formulated queries to examine the proportion of different genders in each community. Proportions were chosen over counts to mitigate biases introduced by population variations. Genders were categorized into three groups: Male, Female, and Other.

We then sorted the communities by the proportion of female residents to organize the dataset though this can be re-arranged for future analysis as needed. 

Analyzing the relationship between gender demographics and crime types/rates is crucial. This exploration can uncover patterns indicating correlations between gender distribution and specific crime types or rates. Such insights are valuable for targeted crime prevention strategies. Understanding the relationship between gender distribution and property assessments is equally important. This analysis can reveal if certain gender demographics are associated with different property values, guiding decisions related to property development, community planning, and investment strategies.

If distinct differences emerge within a particular gender group, our plan is to expand the analysis further. Sub-group analysis, such as comparing crime rates or property assessments among males, females, and others, will provide a more nuanced understanding of the data.

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, MALE_CNT/RES_CNT AS 'Male Proportion', FEMALE_CNT/RES_CNT AS 'Female Proportion', OTHER_CNT/RES_CNT AS 'Other Proportion' FROM calgary_census_2019 WHERE RES_CNT > 0 AND MALE_CNT > 0 ORDER BY FEMALE_CNT/RES_CNT DESC;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'SETON', 'Male Proportion': Decimal('0.4233'), 'Female Proportion': Decimal('0.5714'), 'Other Proportion': Decimal('0.0000')}
{'NAME': 'GREENVIEW INDUSTRIAL PARK', 'Male Proportion': Decimal('0.3333'), 'Female Proportion': Decimal('0.5689'), 'Other Proportion': Decimal('0.0000')}
{'NAME': 'LINCOLN PARK', 'Male Proportion': Decimal('0.4459'), 'Female Proportion': Decimal('0.5533'), 'Other Proportion': Decimal('0.0000')}
{'NAME': 'GARRISON WOODS', 'Male Proportion': Decimal('0.4448'), 'Female Proportion': Decimal('0.5507'), 'Other Proportion': Decimal('0.0000')}
{'NAME': 'KELVIN GROVE', 'Male Proportion': Decimal('0.4499'), 'Female Proportion': Decimal('0.5488'), 'Other Proportion': Decimal('0.0000')}
{'NAME': 'GARRISON GREEN', 'Male Proportion': Decimal('0.4500'), 'Female Proportion': Decimal('0.5485'), 'Other Proportion': Decimal('0.0000')}
{'NAME': 'COUNTRY HILLS VILLAGE', 'Male Proportion': Decimal('0.4524'), 'Female Proportion': Decimal('0.5445'), 'Other Proportion': Decima

True

## Query 3 - Property Type Prevalance by Community

In our latest set of queries, we turn our attention to property types, aiming to uncover their connection to crime rates and types. The guiding question driving this analysis is:

- Is there a relationship between property types and crime types/rates?

To address this question, our queries focus on examining the proportion of different property types in each community. Proportions are preferred over counts to account for variations in community sizes. Property types are categorized into five groups: Single Family, Apartment, Townhouse, Duplex, and Multiplex.

We have identified the top ten communities with the highest proportion of each property type. This exploration provides insights into how property types vary across neighborhoods.

Analyzing the relationship between property types and crime rates is pivotal. This exploration can reveal patterns indicating correlations between specific property types and crime rates or types. Understanding these connections is essential for implementing targeted crime prevention strategies.

If distinct differences emerge within a particular property type, our plan is to expand the analysis further. Sub-group analysis, such as comparing crime rates or types across different property types, will provide a more detailed understanding of the data.

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, SING_FAMLY/DWELL_CNT AS 'Single Family Proportion', DUPLEX/DWELL_CNT AS 'Duplex Proportion', APARTMENT/DWELL_CNT AS 'Apartment Proportion' , MULTI_PLEX/DWELL_CNT AS 'Multiplex Proportion', TOWN_HOUSE/DWELL_CNT AS 'Townhouse Proportion' FROM calgary_census_2019 ORDER BY NAME ASC;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': '01B', 'Single Family Proportion': None, 'Duplex Proportion': None, 'Apartment Proportion': None, 'Multiplex Proportion': None, 'Townhouse Proportion': None}
{'NAME': '01C', 'Single Family Proportion': None, 'Duplex Proportion': None, 'Apartment Proportion': None, 'Multiplex Proportion': None, 'Townhouse Proportion': None}
{'NAME': '01F', 'Single Family Proportion': None, 'Duplex Proportion': None, 'Apartment Proportion': None, 'Multiplex Proportion': None, 'Townhouse Proportion': None}
{'NAME': '01H', 'Single Family Proportion': None, 'Duplex Proportion': None, 'Apartment Proportion': None, 'Multiplex Proportion': None, 'Townhouse Proportion': None}
{'NAME': '01I', 'Single Family Proportion': None, 'Duplex Proportion': None, 'Apartment Proportion': None, 'Multiplex Proportion': None, 'Townhouse Proportion': None}
{'NAME': '01K', 'Single Family Proportion': None, 'Duplex Proportion': None, 'Apartment Proportion': None, 'Multiplex Proportion': None, 'Townhouse Proportion': None

True

### Sub query 3.1 - 10 most single family homes

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, SING_FAMLY/DWELL_CNT AS 'Single Family Proportion' FROM calgary_census_2019 WHERE RES_CNT > 0 ORDER BY SING_FAMLY/DWELL_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'BEL-AIRE', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'BELVEDERE', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'DIAMOND COVE', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'EAST SHEPARD INDUSTRIAL', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'GLENMORE PARK', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'HOTCHKISS', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'MAYFAIR', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'ROXBORO', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'ROYAL VISTA', 'Single Family Proportion': Decimal('1.0000')}
{'NAME': 'SADDLE RIDGE INDUSTRIAL', 'Single Family Proportion': Decimal('1.0000')}


True

### Sub Query 3.2 - 10 most duplexs

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, DUPLEX/DWELL_CNT AS 'Duplex Proportion' FROM calgary_census_2019 WHERE RES_CNT > 0 ORDER BY DUPLEX/DWELL_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'PINE CREEK', 'Duplex Proportion': Decimal('0.3714')}
{'NAME': 'CARRINGTON', 'Duplex Proportion': Decimal('0.3119')}
{'NAME': 'ROSEMONT', 'Duplex Proportion': Decimal('0.3040')}
{'NAME': 'ROSSCARROCK', 'Duplex Proportion': Decimal('0.2980')}
{'NAME': 'MONTGOMERY', 'Duplex Proportion': Decimal('0.2678')}
{'NAME': 'MOUNT PLEASANT', 'Duplex Proportion': Decimal('0.2610')}
{'NAME': 'RICHMOND', 'Duplex Proportion': Decimal('0.2565')}
{'NAME': 'CAPITOL HILL', 'Duplex Proportion': Decimal('0.2410')}
{'NAME': 'CEDARBRAE', 'Duplex Proportion': Decimal('0.2256')}
{'NAME': 'GLENBROOK', 'Duplex Proportion': Decimal('0.2248')}


True

### Sub Query 3.3 - 10 most Multiplexes

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, MULTI_PLEX/DWELL_CNT AS 'Multiplex Proportion' FROM calgary_census_2019 WHERE RES_CNT > 0 ORDER BY MULTI_PLEX/DWELL_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'PALLISER', 'Multiplex Proportion': Decimal('0.0848')}
{'NAME': 'CORNERSTONE', 'Multiplex Proportion': Decimal('0.0537')}
{'NAME': 'SOUTHVIEW', 'Multiplex Proportion': Decimal('0.0526')}
{'NAME': 'SOUTH CALGARY', 'Multiplex Proportion': Decimal('0.0444')}
{'NAME': 'CAMBRIAN HEIGHTS', 'Multiplex Proportion': Decimal('0.0394')}
{'NAME': 'HILLHURST', 'Multiplex Proportion': Decimal('0.0355')}
{'NAME': 'SAGE HILL', 'Multiplex Proportion': Decimal('0.0306')}
{'NAME': 'BRENTWOOD', 'Multiplex Proportion': Decimal('0.0249')}
{'NAME': 'WINDSOR PARK', 'Multiplex Proportion': Decimal('0.0243')}
{'NAME': 'PARKDALE', 'Multiplex Proportion': Decimal('0.0200')}


True

### Sub Query 3.4 - 10 most townhouses

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, TOWN_HOUSE/DWELL_CNT AS 'Townhouse Proportion' FROM calgary_census_2019 WHERE RES_CNT > 0 ORDER BY TOWN_HOUSE/DWELL_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'QUEENS PARK VILLAGE', 'Townhouse Proportion': Decimal('1.0000')}
{'NAME': 'UNIVERSITY DISTRICT', 'Townhouse Proportion': Decimal('0.9483')}
{'NAME': 'POINT MCKAY', 'Townhouse Proportion': Decimal('0.6028')}
{'NAME': 'BELMONT', 'Townhouse Proportion': Decimal('0.5022')}
{'NAME': 'RUTLAND PARK', 'Townhouse Proportion': Decimal('0.5020')}
{'NAME': 'VISTA HEIGHTS', 'Townhouse Proportion': Decimal('0.4360')}
{'NAME': 'BRAESIDE', 'Townhouse Proportion': Decimal('0.3532')}
{'NAME': 'COACH HILL', 'Townhouse Proportion': Decimal('0.3428')}
{'NAME': 'DEER RIDGE', 'Townhouse Proportion': Decimal('0.3272')}
{'NAME': 'RANCHLANDS', 'Townhouse Proportion': Decimal('0.3111')}


True

### Sub Query 4.5 - 10 most apartments

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME, APARTMENT/DWELL_CNT AS 'Apartment Proportion' FROM calgary_census_2019 WHERE RES_CNT > 0 ORDER BY APARTMENT/DWELL_CNT DESC LIMIT 10;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': 'DOWNTOWN WEST END', 'Apartment Proportion': Decimal('0.9971')}
{'NAME': 'DOWNTOWN COMMERCIAL CORE', 'Apartment Proportion': Decimal('0.9940')}
{'NAME': 'DOWNTOWN EAST VILLAGE', 'Apartment Proportion': Decimal('0.9930')}
{'NAME': 'CHINATOWN', 'Apartment Proportion': Decimal('0.9889')}
{'NAME': 'BELTLINE', 'Apartment Proportion': Decimal('0.9707')}
{'NAME': 'EAU CLAIRE', 'Apartment Proportion': Decimal('0.9661')}
{'NAME': 'MISSION', 'Apartment Proportion': Decimal('0.9477')}
{'NAME': 'MANCHESTER', 'Apartment Proportion': Decimal('0.9332')}
{'NAME': 'LOWER MOUNT ROYAL', 'Apartment Proportion': Decimal('0.9136')}
{'NAME': 'SUNALTA', 'Apartment Proportion': Decimal('0.7778')}


True

## Query 4 - Community names (bonus)

This final query was done as a bonus in preparation for joining my database with that of my group mates. We are going to be joining all of the data based on the community name so this is to be extracted and compared with the other community names to ensure consistent nomenclature accross all dataframes. 

In [None]:
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT DISTINCT NAME FROM calgary_census_2019 ORDER BY NAME ASC;")

read_cursor.execute(query_string)

for (comm_name) in read_cursor:
    print(comm_name)
    
read_cursor.close()

{'NAME': '01B'}
{'NAME': '01C'}
{'NAME': '01F'}
{'NAME': '01H'}
{'NAME': '01I'}
{'NAME': '01K'}
{'NAME': '02B'}
{'NAME': '02C'}
{'NAME': '02E'}
{'NAME': '02F'}
{'NAME': '02K'}
{'NAME': '02L'}
{'NAME': '03D'}
{'NAME': '03W'}
{'NAME': '05D'}
{'NAME': '05E'}
{'NAME': '05F'}
{'NAME': '05G'}
{'NAME': '06A'}
{'NAME': '06B'}
{'NAME': '06C'}
{'NAME': '09D'}
{'NAME': '09H'}
{'NAME': '09K'}
{'NAME': '09O'}
{'NAME': '09P'}
{'NAME': '09Q'}
{'NAME': '10D'}
{'NAME': '10E'}
{'NAME': '12A'}
{'NAME': '12B'}
{'NAME': '12C'}
{'NAME': '12I'}
{'NAME': '12J'}
{'NAME': '12K'}
{'NAME': '12L'}
{'NAME': '13A'}
{'NAME': '13B'}
{'NAME': '13C'}
{'NAME': '13D'}
{'NAME': '13E'}
{'NAME': '13F'}
{'NAME': '13G'}
{'NAME': '13H'}
{'NAME': '13I'}
{'NAME': '13J'}
{'NAME': '13L'}
{'NAME': '13M'}
{'NAME': 'ABBEYDALE'}
{'NAME': 'ACADIA'}
{'NAME': 'ALBERT PARK/RADISSON HEIGHTS'}
{'NAME': 'ALTADORE'}
{'NAME': 'ALYTH/BONNYBROOK'}
{'NAME': 'APPLEWOOD PARK'}
{'NAME': 'ARBOUR LAKE'}
{'NAME': 'ASPEN WOODS'}
{'NAME': 'AUBURN BAY'}
{'

True

## References 

Census by community 2019 | open calgary. (n.d.). Retrieved October 28, 2023, from https://data.calgary.ca/Demographics/Census-by-Community-2019/rkfr-buzb

Community crime statistics | open calgary. (n.d.). Retrieved October 28, 2023, from https://data.calgary.ca/Health-and-Safety/Community-Crime-Statistics/78gh-n26t

Historical property assessments (Parcel) | open calgary. (n.d.). Retrieved October 28, 2023, from https://data.calgary.ca/Government/Historical-Property-Assessments-Parcel-/4ur7-wsgc

Open calgary terms of use. (n.d.). Retrieved November 5, 2023, from https://data.calgary.ca/stories/s/Open-Calgary-Terms-of-Use/u45n-7awa/