London Urban Demographic Analysis

Overview

This project delves into the demographic subdivisions of the City of London, aiming to uncover their connection with the overall quality of life. Through the exploration of factors like age, income, housing, health indicators, education, transportation, and political aspects, the primary objective is to understand how these elements influence the well-being of diverse urban areas.

Motivation

The motivation behind this project lies in the increasing awareness of the impact of demographic factors on individuals' quality of life. As discussions about health, both physical and mental, have gained prominence, understanding how demographic strata influence the overall well-being becomes crucial. This project utilizes sample data from the City of London spanning the years 2014 to 2016 to draw correlations between demographic aspects and quality of life indicators.

Final Report

The final report can be found here. It includes a detailed description of the project, aiming to provide valuable insights into the relationships between demographic characteristics and the well-being of different urban areas in London.

Usage

Ensure 'packages.json' is present.
Execute pipeline.sh to install dependencies and run the data pipeline.

Note: A Kaggle API token must be available locally in order to connect to the remote datasets (~/.kaggle/kaggle.json).

Project Structure

.github/: Directory to store GitHub Actions workflows.
- workflows/: Directory to store GitHub Actions workflows.
  - project-tests.yml: GitHub Actions workflow for project tests.
data/: Directory to store the project data.
- data.sqlite: SQLite database storing the cleaned and processed data.
- plots/: Directory to store generated plots and figures.
project/: Directory to store project files.
- analyse_data.py: Python script for data analysis and plotting.
- csv_files_info.json: Information about CSV files needed for analysis.
- packages.json: File specifying Python package dependencies.
- pipeline.sh: Shell script for pipeline orchestration.
- report.pdf: Final report with analysis results.
- retrieve_data.py: Python script for data retrieval, cleaning, and database population.
- system_tests.sh: Shell script for system tests.
- tests.sh: Shell script executing unit and system tests.
- unit_tests.py: Python script for unit tests.
README.md: Project overview, context, and instructions.

Data Pipeline

The data pipeline consists of two main components: pipeline.sh and retrieve_data.py.

`pipeline.sh`

This shell script installs necessary Python packages based on the specifications in packages.json and then executes the Python data retrieval script.

`retrieve_data.py`

The Python script connects to Kaggle for data retrieval, checks file existence, downloads missing files, and processes existing files. It includes functions for cleaning the dataset and creating/updating SQLite database tables.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
examples		examples
exercises		exercises
project		project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

examples

examples

exercises

exercises

project

project

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

London Urban Demographic Analysis

Overview

Motivation

Final Report

Usage

Project Structure

Data Pipeline

`pipeline.sh`

`retrieve_data.py`

About

Releases

Packages

Languages

License

julian-m10/made-2324

Folders and files

Latest commit

History

Repository files navigation

London Urban Demographic Analysis

Overview

Motivation

Final Report

Usage

Project Structure

Data Pipeline

pipeline.sh

retrieve_data.py

About

Resources

License

Stars

Watchers

Forks

Languages

`pipeline.sh`

`retrieve_data.py`