# UK Visa Sponsorship Analysis 

This notebook analyzes trends and insights from UK visa sponsorship data.

### Data Overview

The dataset contains information on organizations sponsored for UK visas, including:

- Organisation Name  
- Location (Town/City, County)
- Visa Type & Rating
- Visa Route

It covers visas like Skilled Worker, Global Business Mobility, Creative Workers etc.

### Data Cleaning

- Load the raw CSV data into Pandas dataframe
- Handle missing values and inconsistencies  
- Add columns like region, visa category etc. for better analysis

### Exploratory Analysis

- Which regions and counties attract the most visa sponsors?
- Which sectors and occupations dominate the visa demand?
- How do creative vs. skilled worker visas compare? 
- Trends over time?

### Visualizations

- Choropleth map of UK showing concentration of sponsors by region
- Bar charts showing top visa categories, occupations
- Time-series of visa volume over the years

### Database Storage 

- Load cleaned dataset into PostgreSQL database
- Write SQL queries to analyze data

### Conclusion

- Key insights and recommendations
- Potential next steps

So in summary, focused the markdown specifically on the visa dataset, the type of analysis possible, sample visualizations and key skills that can be demonstrated.

## Table of Contents

1. [Introduction](#intro)
   - [Problem Statement](#problem)
   - [Project Goals](#goals)
   - [Data Overview](#data)
2. [Data Cleaning](#cleaning)
3. [Exploratory Data Analysis](#eda)
   - [Region/County Breakdown](#region)
   - [Top Visa Routes](#routes)
   - [Trends Over Time](#trends)
4. [Data Visualization](#viz)
   - [Choropleth Map](#map) 
   - [Route Comparison Plots](#plots)
5. [Modeling](#modeling)
   - [Predictive Modeling](#predictive)
6. [Database Storage](#database)   
7. [Conclusion](#conclusion)
8. [References](#references)


## Introduction <a id="intro"></a>

This analysis aims to uncover insights from UK visa sponsorship data. The data contains information on organizations sponsoring foreign nationals for UK visas across multiple categories like Skilled Worker, Global Business Mobility, Creative Occupations etc. 

### Problem Statement <a id="problem"></a>

The UK has a robust visa system to allow organizations to sponsor foreign talent. The key questions this analysis will address are:

- Which regions attract the most visa sponsors and why? 
- What are the most common visa routes being sponsored?
- How do creative visas compare to skilled worker visas?
- Are there any trends over time?

### Project Goals <a id="goals"></a>

The goals are to analyze regional trends, study patterns across visa routes, develop visualizations and build a model to predict future visa volume.

### Data Overview <a id="data"></a>

The data contains information on sponsoring organizations including name, location, visa type & rating and specific visa route. This will help address the key questions and uncover regional and visa-type specific trends.

In summary, the introduction sets the context and purpose for the analysis based on the provided table of contents. The problem statement, goals and data overview align with the planned structure to analyze this visa sponsorship data.

## Data Cleaning <a id="cleaning"></a>

As the first step, the raw CSV data is loaded into a Pandas dataframe for analysis. 

The dataset requires some cleansing and preprocessing before analysis:

- Handle missing values - Rows with critical missing fields like location or visa type are dropped
- Normalize organization names - Organization names are standardized for aggregation/analysis
- Add region column - A new column is generated using the location data to define region
- Visa category - Broad visa categories created like Skilled, Business, Creative etc.

After cleaning, the dataframe will be ready for exploratory analysis and visualizations. Key steps are:

- Assess missing data 
- Fill/drop missing values appropriately
- Perform transformations like normalize, generate new columns etc.
- Validate corrections to ensure quality

The end result is a clean, consistent dataset with all the required fields to drive further analysis on regional trends and visa category comparisons.

In [1]:
import pandas as pd

# Set data file path
csv_path = 'data/visa_sponsorship_data.csv' 

# Load CSV data into Pandas DataFrame
df = pd.read_csv(csv_path)

print("As the first step, the raw CSV data is loaded into a Pandas dataframe for analysis.")

df.head()

As the first step, the raw CSV data is loaded into a Pandas dataframe for analysis.


Unnamed: 0,Organisation Name,Town/City,County,Type & Rating,Route
0,(IECC Care) Independent Excel Care Consortium ...,Colchester,,Worker (A rating),Skilled Worker
1,*ABOUTCARE HASTINGS LTD,East Sussex,,Worker (A rating),Skilled Worker
2,???£ ESS LTD,Manchester,,Worker (A rating),Skilled Worker
3,@ Architect UK Ltd,West Horndon,Essex,Worker (A rating),Skilled Worker
4,@ Home Accommodation Services Ltd,London,,Worker (A rating),Skilled Worker


In [2]:
# Preview train dataset
print('The Shape of the data is: ', df.shape)
df.head()

The Shape of the data is:  (82197, 5)


Unnamed: 0,Organisation Name,Town/City,County,Type & Rating,Route
0,(IECC Care) Independent Excel Care Consortium ...,Colchester,,Worker (A rating),Skilled Worker
1,*ABOUTCARE HASTINGS LTD,East Sussex,,Worker (A rating),Skilled Worker
2,???£ ESS LTD,Manchester,,Worker (A rating),Skilled Worker
3,@ Architect UK Ltd,West Horndon,Essex,Worker (A rating),Skilled Worker
4,@ Home Accommodation Services Ltd,London,,Worker (A rating),Skilled Worker


In [3]:
# Drop rows with missing location or visa type 
df.dropna(subset=['Organisation Name'], inplace=True)
print('The Shape of the data is: ', df.shape)
df.head()

The Shape of the data is:  (82197, 5)


Unnamed: 0,Organisation Name,Town/City,County,Type & Rating,Route
0,(IECC Care) Independent Excel Care Consortium ...,Colchester,,Worker (A rating),Skilled Worker
1,*ABOUTCARE HASTINGS LTD,East Sussex,,Worker (A rating),Skilled Worker
2,???£ ESS LTD,Manchester,,Worker (A rating),Skilled Worker
3,@ Architect UK Ltd,West Horndon,Essex,Worker (A rating),Skilled Worker
4,@ Home Accommodation Services Ltd,London,,Worker (A rating),Skilled Worker


In [4]:
import re
# Standardize organization name format 
df['Organisation Name'] = df['Organisation Name'].str.lower() # convert to lowercase
df['Organisation Name'] = df['Organisation Name'].str.strip() # trim whitespace
df['Town/City'] = df['Town/City'].str.strip() # trim whitespace

# Replace special characters like &,-,/ etc.
df['Organisation Name'] = df['Organisation Name'].str.replace(r'[^a-zA-Z\d\s]', '', regex=True)  

print("Normalize organization names - Organization names are standardized for aggregation/analysis")
df

Normalize organization names - Organization names are standardized for aggregation/analysis


Unnamed: 0,Organisation Name,Town/City,County,Type & Rating,Route
0,iecc care independent excel care consortium li...,Colchester,,Worker (A rating),Skilled Worker
1,aboutcare hastings ltd,East Sussex,,Worker (A rating),Skilled Worker
2,ess ltd,Manchester,,Worker (A rating),Skilled Worker
3,architect uk ltd,West Horndon,Essex,Worker (A rating),Skilled Worker
4,home accommodation services ltd,London,,Worker (A rating),Skilled Worker
...,...,...,...,...,...
82192,zyxel communications uk ltd,Wokingham,Berkshire,Worker (A rating),Skilled Worker
82193,zyxel communications uk ltd,Wokingham,Berkshire,Worker (A rating),Global Business Mobility: Senior or Specialist...
82194,zyzzle limited,Surbiton,,Worker (A rating),Skilled Worker
82195,zza consulting limited,LONDON,,Worker (A rating),Skilled Worker


In [7]:
import pandas as pd
# Create a dictionary mapping counties to regions

county_to_region = {
    'Greater London': ['London'],
    'South East': ['Buckinghamshire', 'East Sussex', 'Hampshire', 'Kent', 'Oxfordshire', 'Surrey', 'West Sussex'],
    'South West': ['Bristol', 'Cornwall', 'Devon', 'Dorset', 'Gloucestershire', 'Somerset', 'Wiltshire'], 
    'East of England': ['Bedfordshire', 'Cambridgeshire', 'Essex', 'Hertfordshire', 'Norfolk', 'Suffolk'],
    'West Midlands': ['Herefordshire', 'Shropshire', 'Staffordshire', 'Warwickshire', 'West Midlands', 'Worcestershire'],
    'East Midlands': ['Derbyshire', 'Leicestershire', 'Lincolnshire', 'Northamptonshire', 'Nottinghamshire'],
    'North West': ['Cheshire', 'Cumbria', 'Greater Manchester', 'Lancashire', 'Merseyside'],
    'Yorkshire and the Humber': ['East Riding of Yorkshire', 'North Yorkshire', 'South Yorkshire', 'West Yorkshire'], 
    'North East': ['County Durham', 'Northumberland', 'Tyne and Wear']
}



# Create a new column 'regions' and populate it based on the county-to-region mapping
df['regions'] = df['Town/City'].map(county_to_region)

# Print the updated DataFrame
df


Unnamed: 0,Organisation Name,Town/City,County,Type & Rating,Route,regions
0,iecc care independent excel care consortium li...,Colchester,,Worker (A rating),Skilled Worker,
1,aboutcare hastings ltd,East Sussex,,Worker (A rating),Skilled Worker,
2,ess ltd,Manchester,,Worker (A rating),Skilled Worker,
3,architect uk ltd,West Horndon,Essex,Worker (A rating),Skilled Worker,
4,home accommodation services ltd,London,,Worker (A rating),Skilled Worker,
...,...,...,...,...,...,...
82192,zyxel communications uk ltd,Wokingham,Berkshire,Worker (A rating),Skilled Worker,
82193,zyxel communications uk ltd,Wokingham,Berkshire,Worker (A rating),Global Business Mobility: Senior or Specialist...,
82194,zyzzle limited,Surbiton,,Worker (A rating),Skilled Worker,
82195,zza consulting limited,LONDON,,Worker (A rating),Skilled Worker,


In [8]:
import pandas as pd
# Create a dictionary mapping counties to regions
county_to_region = {
    'Greater London': ['London'],
    'Buckinghamshire': ['South East'],
    'East Sussex': ['South East'],
    'Hampshire': ['South East'],
    'Kent': ['South East'],
    'Oxfordshire': ['South East'],
    'Surrey': ['South East'],
    'West Sussex': ['South East'],
    'Bristol': ['South West'],
    'Cornwall': ['South West'],
    'Devon': ['South West'],
    'Dorset': ['South West'],
    'Gloucestershire': ['South West'],
    'Somerset': ['South West'],
    'Wiltshire': ['South West'],
    'Bedfordshire': ['East of England'],
    'Cambridgeshire': ['East of England'],
    'Essex': ['East of England'],
    'Hertfordshire': ['East of England'],
    'Norfolk': ['East of England'],
    'Suffolk': ['East of England'],
    'Herefordshire': ['West Midlands'],
    'Shropshire': ['West Midlands'],
    'Staffordshire': ['West Midlands'],
    'Warwickshire': ['West Midlands'],
    'West Midlands': ['West Midlands'],
    'Worcestershire': ['West Midlands'],
    'Derbyshire': ['East Midlands'],
    'Leicestershire': ['East Midlands'],
    'Lincolnshire': ['East Midlands'],
    'Northamptonshire': ['East Midlands'],
    'Nottinghamshire': ['East Midlands'],
    'Cheshire': ['North West'],
    'Cumbria': ['North West'],
    'Greater Manchester': ['North West'],
    'Lancashire': ['North West'],
    'Merseyside': ['North West'],
    'East Riding of Yorkshire': ['Yorkshire and the Humber'],
    'North Yorkshire': ['Yorkshire and the Humber'],
    'South Yorkshire': ['Yorkshire and the Humber'],
    'West Yorkshire': ['Yorkshire and the Humber'],
    'County Durham': ['North East'],
    'Northumberland': ['North East'],
    'Tyne and Wear': ['North East']
}

# Create a function to map counties to regions
def map_county_to_region(county):
    for key, value in county_to_region.items():
        if county in value:
            return key

# Create a new column 'regions' and populate it based on the county-to-region mapping
df['regions'] = df['County'].apply(map_county_to_region)

# Print the updated DataFrame
df


Unnamed: 0,Organisation Name,Town/City,County,Type & Rating,Route,regions
0,iecc care independent excel care consortium li...,Colchester,,Worker (A rating),Skilled Worker,
1,aboutcare hastings ltd,East Sussex,,Worker (A rating),Skilled Worker,
2,ess ltd,Manchester,,Worker (A rating),Skilled Worker,
3,architect uk ltd,West Horndon,Essex,Worker (A rating),Skilled Worker,
4,home accommodation services ltd,London,,Worker (A rating),Skilled Worker,
...,...,...,...,...,...,...
82192,zyxel communications uk ltd,Wokingham,Berkshire,Worker (A rating),Skilled Worker,
82193,zyxel communications uk ltd,Wokingham,Berkshire,Worker (A rating),Global Business Mobility: Senior or Specialist...,
82194,zyzzle limited,Surbiton,,Worker (A rating),Skilled Worker,
82195,zza consulting limited,LONDON,,Worker (A rating),Skilled Worker,
