# Migration's Role in Shaping Ireland's Population and Economic Outlook ☘️
***

**Name:** 
Stephen Hasson

**Student No:** 
sba23014

**Student Email:**
sba23014@student.cct.ie

**Course:** 
CCT MSC in Data Analytics

**Assignment:**
MSC_DA_CA1

**Year:**
Sept-23 Intake

**Primary Data Sources:**
* https://data.cso.ie/product/pme
* https://datacatalog.worldbank.org/search/dataset/0037712/World-Development-Indicators
***

## README

**Introduction**

Welcome to this Jupyter Notebook where we will be conducting analysis regarding the role of migration in shaping Ireland's population and economic outlook. 

This README serves as a precursor to guide you through how to read and interpret this notebook.

**Methodology Annotations**

During the analysis, you will encounter coloured blocks that serve to explain the methodology used for certain tasks, calculations, or data manipulations. These coloured blocks are designed to:

* Clarify the reasoning behind each step
* Explain any assumptions being made
* Provide additional resources or references if applicable

**How to Navigate**

* **Blue Blocks:** Specific methodology or rationale for a particular code cell
* **Green Blocks:** General comments or explanations
* **Yellow Blocks:** Warnings or limitations about a specific part of the code

**Prerequisites**
* Basic understanding of Python programming
* Basic understanding of Python libraries such as:
    * Pandas
    * NumPy
    * Matplotlib
    * Seaborn
    * Scikit-learn
***

<div class="alert alert-block alert-success"> <b>Explanation:</b> The purpose of including this 'README' is to introduce the user to the Jupyter Notebook and provide them with a high level explanation of how to interpret it's contents. In particular I chose the use of coloured code blocks to annotate this analysis as it enhances readability, facilitates easier navigation, and allows for a more intuitive understanding of the methodology employed.</div>

## Table of Contents

### 0. [CRISP-DM Framework](#crisp-dm_framework)
### 1. [Data Understanding Phase](#data_understanding_phase)
### 2. [Data Preparation Phase](#data_preparation_phase)
### 3. [Modelling Phase](#modelling_phase)
### 4. [Evaluation Phase](#evaluation_phase)
### 5. [Deplopyment-Phase](#deployment_phase)
***

<a id='crisp-dm_framework'></a>
###  0. CRISP-DM Framework
***
*"CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.*

*As a methodology , it includes descriptions of the typical phases of a project, the tasks involved with each phase, and an explanation of the relationships between these tasks.*

*As a process model , CRISP-DM provides an overview of the data mining life cycle."*

##### sources: 
https://www.ibm.com/docs/ko/spss-modeler/18.1.0?topic=dm-crisp-help-overview
https://www.ibm.com/docs/en/SS3RA7_18.1.0/modeler_crispdm_ddita/clementine/images/crisp_process.jpg

In [14]:
import warnings
from IPython.display import display, HTML  # Update the import statement to resolve the deprecation warning

# Suppress deprecation warnings only for this block
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=DeprecationWarning)
    display(HTML('<center><img src="https://www.ibm.com/docs/en/SS3RA7_18.1.0/modeler_crispdm_ddita/clementine/images/crisp_process.jpg" width=600 height=300 /></center>'))
    

### Business/ Research Understanding Phase

* **Define project requirements and objectives**
    * “Migration's Role in Shaping Ireland's Population and Economic Outlook”
    * Objective: To quantify and understand the impact of migration on Ireland's population changes over the past 60 years, as well as its relationship with economic growth.
    * Deliverables: A piece of analysis in the form of a report summarising key insights, charts and tables, and a machine learning predictive model.

* **Translate objectives into data exploration problem definition**
    * Primary objectives
        * What is the trend of population in Ireland over a specific time period?
        * What is the trend of migration in Ireland over a specific time period?
        * What is the trend of natural increase in Ireland over a specific time period?
        * What percentage of population change is due to net migration?
        * What percentage of population change is due to natural increase?
        * What correlation exists between migration and economic growth?
        * What has caused changes to trends in population / migration / economic growth over the specific time period?
    * Potential secondary objectives
        * Where has immigrations & emigrants migration come from and moved to

* **Prepare preliminary strategy to meet objectives**
    * Data needed: Population, migration, natural increase, economic indicators like GDP
    * Tools: Python for data preprocessing, analysis, visualisation, machine learning

* **General Programming Requirements:**
    * The project must be explored programmatically, this means that you must implement suita-ble Python tools (code and/or libraries) to complete the analysis required. All of this is to be implemented in a Jupyter Notebook. Your codebook should be properly annotated. The pro-ject documentation must include sound justifications and explanation of your code choices (code quality standards should also be applied). [0-50]
    * Briefly discuss your use of aspects of various programming paradigms in the development of your project. For example, this may include (but is not limited to) how they influenced your design decisions or how they helped you solve problems. Note that marks may not be awarded if the discussion does not involve your specific project. [0-50]

### Data Understanding Phase**
* **Collect data**
    * Obtain population related data from Ireland’s CSO website
    * Obtain economic data either from Ireland’s CSO website or other source e.g., World Bank.
* **Perform exploratory data analysis (EDA)**
    * You must perform appropriate EDA on your dataset, rationalizing and detailing why you chose the specific methods and what insight you gained. [0-20]
    * Summarise your dataset clearly, using relevant descriptive statistics and appropriate plots. These should be carefully motivated and justified, and clearly presented. You should critically analyse your findings, in addition to including the necessary Python code, output and plots in the report. You are required to plot at least three graphs. [0-35]
    * Appropriate visualizations must be used to engender insight into the dataset and to illustrate your final insights gained in your analysis. [0-20]
    * All design and implementation of your visualizations must be justified and detailed in full., making reference to Tufts Principles [0-30]
* **Assess data quality**
    * Check for missing values, duplicate records, and inconsistent data types.
* **Optionally, select interesting subsets**
    * Identify if there’s any particular areas which require further investigation, e.g., years that had major immigration / emigration events or inconsistent economic data

### Data Preparation Phase
* **Prepares for modelling in subsequent phases**
    * You must also rationalise justify and detail all the methods used to prepare the data for ML. [0-30]
    * Explain which project management framework (CRISP-DM, KDD or SEMMA) is required for a data science project. Discuss and justify with real-life scenarios. Provide an explanation of why you chose a supervised, unsupervised, or semi-supervised machine learning technique for the dataset you used for ML modeling. [0 - 20]
* **Select cases and variables appropriate for analysis**
    * Choose relevant columns and subsets to include
* **Cleanse and prepare data so it is ready for modeling tools**
    * Handle missing values by imputation or removal.
    * Standardise data formats.
    * Summarise your dataset clearly, using relevant descriptive statistics and appropriate plots. These should be carefully motivated and justified, and clearly presented. You should critically analyse your findings, in addition to including the necessary Python code, output and plots in the report. You are required to plot at least three graphs. [0-35]
    * Use two discrete distributions (Binomial and/or Poisson) in order to explain/identify some information about your dataset. You must explain your reasoning and the techniques you have used. Visualise your data and explain what happens with the large samples in these cases You must work with Python and your mathematical reasoning must be documented in your report. [0-30]
    * Use Normal distribution to explain or identify some information about your dataset [0-20]
    * Explain the importance of the distributions used in point 3 and 4 in your analysis. Justify the choice of the variables and explain if the variables used for the discrete distributions could be used as normal distribution in this case. [0-15]
    * Appropriate visualizations must be used to engender insight into the dataset and to illustrate your final insights gained in your analysis. [0-20]
    * All design and implementation of your visualizations must be justified and detailed in full., making reference to Tufts Principles [0-30]
* **Perform transformation of certain variables, if needed**
    * Create new variables if required such as ‘GDP per capita’, ‘Percentage Change’ etc.

### Modelling Phase
* **Select and apply one or more modelling techniques**
* Linear regression to establish relationships between migration and economic growth.
* Time-series analysis for trend prediction.
* Machine learning models have a wide range of uses, including prediction, classification, and clustering. It is advised that you assess several approaches (at least two), choose appropri-ate hyperparameters for the optimal outcomes of Machine Learning models using an ap-proach of hyperparameter tunning, such as GridSearchCV or RandomizedSearchCV. [0 - 30]
* **Calibrate model settings to optimize results**
* Adjust hyperparameters and validate the model using cross-validation.

### Evaluation Phase
* **Evaluate one or more models for effectiveness**
    * Assess the model using metrics like R-squared, MAE, or RMSE.
    * Show the results of two or more ML modeling comparisons in a table or graph format. Re-view and critically examine the machine learning models' performance based on the selected metric for supervised, unsupervised, and semi-supervised approaches. [0 - 30]
    * Demonstrate the similarities and differences between your Machine Learning modelling re-sults using the tables or visualizations. Provide a report along with an explanation and inter-pretation of the relevance and effectiveness of your findings. [0 - 20]
* **Determine whether defined objectives achieved**
    * Verify if the model answers the research questions effectively.
* **Make decision regarding data exploration results before deploying to field**
    * Assess whether the model accuracy could be deployeded in the field for this particular use case

### Deployment Phase
* **Make use of models created**
    * This will be completed whilst undergoing the analysis.
* **Simple deployment example: generate report**
    * Complete CA1 analysis accompanying report compiled with insights, charts, and tables into a comprehensive narrative.
* **Complex deployment example: implement parallel data exploration effort in another department**
    * N/A for the purposes of this analysis.
* **In businesses, customer often carries out deployment based on your model**
    * N/A for the purposes of this analysis.
***

## 1. Business/ Research Understanding Phase

* **Define project requirements and objectives**
    * “Migration's Role in Shaping Ireland's Population and Economic Outlook”
    * Objective: To quantify and understand the impact of migration on Ireland's population changes over the past 60 years, as well as its relationship with economic growth.
    * Deliverables: A piece of analysis in the form of a report summarising key insights, charts and tables, and a machine learning predictive model.

* **Translate objectives into data exploration problem definition**
    * Primary objectives
        * What is the trend of population in Ireland over a specific time period?
        * What is the trend of migration in Ireland over a specific time period?
        * What is the trend of natural increase in Ireland over a specific time period?
        * What percentage of population change is due to net migration?
        * What percentage of population change is due to natural increase?
        * What correlation exists between migration and economic growth?
        * What has caused changes to trends in population / migration / economic growth over the specific time period?
    * Potential secondary objectives
        * Where has immigrations & emigrants migration come from and moved to

* **Prepare preliminary strategy to meet objectives**
    * Data needed: Population, migration, natural increase, economic indicators like GDP
    * Tools: Python for data preprocessing, analysis, visualisation, machine learning

* **General Programming Requirements:**
    * The project must be explored programmatically, this means that you must implement suita-ble Python tools (code and/or libraries) to complete the analysis required. All of this is to be implemented in a Jupyter Notebook. Your codebook should be properly annotated. The pro-ject documentation must include sound justifications and explanation of your code choices (code quality standards should also be applied). [0-50]
    * Briefly discuss your use of aspects of various programming paradigms in the development of your project. For example, this may include (but is not limited to) how they influenced your design decisions or how they helped you solve problems. Note that marks may not be awarded if the discussion does not involve your specific project. [0-50]
***

<a id='data_understanding_phase'></a>
##  1. Data Understanding Phase

### Prepares for modelling in subsequent phases

### Select cases and variables appropriate for analysis

### Cleanse and prepare data so it is ready for modeling tools

### Perform transformation of certain variables, if needed

<a id='data_preparation_phase'></a>
##  2. Data Preparation Phase

<a id='modelling_phase'></a>
##  3. Modelling Phase

<a id='evaluation_phase'></a>
##  4. Evaluation Phase

<a id='deployment_phase'></a>
##  5. Deployment Phase