This application is developed using Windows 10. Therefore, the project build guide,commands and installers are specific to Windows 10.
- Python 3.9 or higher
- Optional! Jupyter Notebook (if using the Jupyter Notebook option for the validation script) .ipynb file locate inside ./validate/validate_data_ipynb.zip
- Access to the data files (2021-05.csv, 2021-06.csv, 2021-07.csv, station_data.csv) located in ./validate/data.zip
- MySQL (MySQL Workbench)
- Tableplus
2023-02-03.22-03-43.mp4
- Using Python / Jupyter Notebook for data validation
- Using T3 Stack for backend and frontend
- Next.js
- Tailwind CSS (Styling)
- Prisma ORM
- tRPC (API)
- Rows removed ~ 1.7 million:
- Trips under 10 second were removed
- Trips under 10 meters were removed
- Duplicates were removed
- Columns 'departure_station_name', 'return_station_name' were removed
- Modifications on column names:
- Whitespaces were replaced with underscores
- Other special characters were removed
- Converted all letters to lowercase
- Column names renamed by following:
- Departure -> start_time
- Return -> end_time
- Covered distance (m) -> distance_m
- Duration (sec.) -> duration_s
- Departure station id -> start_station_id
- Return station id -> end_station_id
- All the data were merged into one file and sorted by start_time
- All the column names were converted to lowercase
- Dropped columns:
- nimi
- fid
- stad
- operaattor
- namn
- kapasiteet
- adress
- kaupunki
- Column names were changed by following:
- id -> station_id
- osoite -> address
- y -> latitude
- x -> longitude
- The data were sorted by station_id saved as 'station_data_new.csv'
Latitude and longitude were combined from stations_data.cvs to one column and stored into columns 'start_station_location' and 'end_station_location'
id
start_time
end_time
start_station_id
start_station_name
start_station_location
end_station_id
end_station_name
end_station_location
distance_m
duration_s
model Trip {
id Int @id @default(autoincrement())
start_time DateTime
end_time DateTime
start_station_id Int
start_station_name String
start_station_location String
end_station_id Int
end_station_name String
end_station_location String
distance_m Int
duration_s Int
}
Provided data were processed with Python script which you can find in the 'validation' folder.
2021-05.csv
2021-06.csv
2021-07.csv
station_data.csv
There is 2 options to run the validation script: With Jupyter Notebook or with pure Python.
Make sure you have Jupyter Notebook installed. Open the validate_data.ipynb file and run all cells.
Make sure you have Python 3.9 or higher installed. Open the terminal and run the following commands:
cd validate
python -m venv venv
venv/scripts/activate
pip install -r requirements.txt
python validate_data.py
cd web-app
npm install
- Hosted in Planetscale (Using hobby plan)
- Recommended tools:
- TablePlus for populating (importing) CSV datafile
- Jetbrains Datagrip for querying data etc.
Follow along with this straight forward Video by official Planetscale team, and use TablePlus tool to create connection
npx prisma db push
I used MySQL Workbench while setting server up so it is recommended to follow along with this Video if you don't have it installed.
If you've set up MySQL server locally before and know how to do it, just skip that video and follow these steps:
DATABASE_URL=mysql://root:password@127.0.0.1:3306/helsinki-city-bike
npx prisma db push
We use TablePlus also for importing since it's a simple way to import data and a lot faster than for example, SQL Workbench.
- Which ever connection you choce, open your TablePlus connection by doubleclicking the it e.g.
Local - Helsinki-city-bike
. - Right click your
trip
table and chooseImport -> from CSV
:
- Locate the previously created CSV file from repo
\validate\data
and click open
- Change delimiter to
;
and click import - Depending on your connection, this will take a while (longer if you choce PlanetScale)
- When it's done, hit CTRL + R to reload the data.
- On localhost you see it immediately.
- With a PlanetScale connection, it might take a while to see the data.
- Try to query the data with:
SELECT * FROM trip WHERE start_time BETWEEN '2021-07-05 00:00:00' AND '2021-07-05 23:59:59';
/location?date=2021-05-29&page=819&start_station_location=60.187712639,24.960554135&end_station_location=60.2244037765729,24.9525612440734&start_station_name=Sornainen%20(M)&end_station_name=Kylavoudintie&duration_s=1388&distance_m=4889&start_time=23:58:05&id=363696
- Fix typescript and other errors
- When going back from location view, return to correct page.