Skip to content

julian-m10/made-2324

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

London Urban Demographic Analysis

Licence Continuous Integration python

Overview

This project delves into the demographic subdivisions of the City of London, aiming to uncover their connection with the overall quality of life. Through the exploration of factors like age, income, housing, health indicators, education, transportation, and political aspects, the primary objective is to understand how these elements influence the well-being of diverse urban areas.

Motivation

The motivation behind this project lies in the increasing awareness of the impact of demographic factors on individuals' quality of life. As discussions about health, both physical and mental, have gained prominence, understanding how demographic strata influence the overall well-being becomes crucial. This project utilizes sample data from the City of London spanning the years 2014 to 2016 to draw correlations between demographic aspects and quality of life indicators.

Final Report

The final report can be found here. It includes a detailed description of the project, aiming to provide valuable insights into the relationships between demographic characteristics and the well-being of different urban areas in London.

Usage

  1. Ensure 'packages.json' is present.
  2. Execute pipeline.sh to install dependencies and run the data pipeline.

Note: A Kaggle API token must be available locally in order to connect to the remote datasets (~/.kaggle/kaggle.json).

Project Structure

  • .github/: Directory to store GitHub Actions workflows.

    • workflows/: Directory to store GitHub Actions workflows.
      • project-tests.yml: GitHub Actions workflow for project tests.
  • data/: Directory to store the project data.

    • data.sqlite: SQLite database storing the cleaned and processed data.
    • plots/: Directory to store generated plots and figures.
  • project/: Directory to store project files.

    • analyse_data.py: Python script for data analysis and plotting.
    • csv_files_info.json: Information about CSV files needed for analysis.
    • packages.json: File specifying Python package dependencies.
    • pipeline.sh: Shell script for pipeline orchestration.
    • report.pdf: Final report with analysis results.
    • retrieve_data.py: Python script for data retrieval, cleaning, and database population.
    • system_tests.sh: Shell script for system tests.
    • tests.sh: Shell script executing unit and system tests.
    • unit_tests.py: Python script for unit tests.
  • README.md: Project overview, context, and instructions.

Data Pipeline

The data pipeline consists of two main components: pipeline.sh and retrieve_data.py.

pipeline.sh

This shell script installs necessary Python packages based on the specifications in packages.json and then executes the Python data retrieval script.

retrieve_data.py

The Python script connects to Kaggle for data retrieval, checks file existence, downloads missing files, and processes existing files. It includes functions for cleaning the dataset and creating/updating SQLite database tables.

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.5%
  • Shell 6.5%