Skip to content

Data extraction for parks in Nova Scotia, Canada and visualization

Notifications You must be signed in to change notification settings

ht3886/GraphDB-Python

Repository files navigation

Overview

Data extraction for parks in Nova Scotia, Canada and visualization

Dataset Source

https://data.novascotia.ca/Lands-Forests-and-Wildlife/DNR-Camping-Parks-Reservation-Data-2016/4zt7-x443

About the Dataset

The dataset DNR Camping Parks Reservation Data 2016 lists various camping sites in Nova Scotia and this information is collected through the reservation system for the general public to reserve camping sites in Nova Scotia. The dataset has 34,900 rows and 13 columns. It lists a lot of information regarding the park’s name (ParkName), origin state and country of the water body, total booking size (partySize), type of rate (RateType), the type of booking (BookingType), Equipment, Booking start date along with its end date, night and their permits.

File Description

file1.py

I have used the csv module to read and write the contents from the dataset csv file named DNR_Camping_Parks_Reservation_Data_2016.csv to file1.csv respectively. To create the csv file from extracting all the data from dataset, I have used the functions ‘csv.writer’ and ‘csv.reader’ to write and read data respectively. As I am dealing with csv files, naturally the delimiter used is comma (,).

Output file: file1.csv

file2.py

Here I have removed the unnecessary columns and extracted data on ParkName, State, PartySize, BookingType, RateType and Equipment.

Output file: file2.csv

file3.py

Scanned the "Equipment" column, and replaced all “less than” with “LT” [e.g. less than 30 ft. after transforming LT30ft]. Similarly, replaced all “Single tent” with “ST”. I have used the regex module to perform this substitutions. To find and replace these words, I have used the ‘.sub’ function. I have replaced and substituted both the words consecutively rather than simultaneously to make the code simpler.

Output file: file3.csv

file4.py

This file has only the 20 unique parks in Nova Scotia which have the maximum number of "partySize".

Output file: file4.csv

Visualization using Neo4j

Load data and node creation

Parks with identical ‘RateType’, connected using a ‘NeghbourByRate’ relation

Parks with identical ‘Equipment’, connected using a ‘NeghbourByEquipment’ relation

Final Image file that has both the relations: NeghourByRate and NeghbourByEquipment

Using visualization, found the park with maximum "partySize"

Releases

No releases published

Packages

No packages published

Languages