<a href="https://colab.research.google.com/github/saidnaqwe/Web-Analytics-Data620/blob/main/Centrality_Measures_Said_Naqwe_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Data 620 Web Analytics

Week Four - Assignment Centrality Measures

Said Naqwe

**Assignment: Centrality Measures Overview**

Centrality measures can be incredibly useful for predicting outcomes in a network. For this week's assignment, I’ll be diving into a dataset available on the web that lets me analyze and compare centrality measures across nodes. Each node should have at least one categorical variable, such as gender or political affiliation.

### Dataset Selection

For this assignment, I’ve chosen to use the OpenFlights.org dataset, which includes data on:

- **Airports**
- **Routes**

### Variables

Here are the important variables I’ll be working with:

#### Routes Dataset

This includes variables like Airline, Source Airport, Destination Airport, and Stops.

#### Airports Dataset

This dataset provides information on Airport ID, Name, City, Country, Latitude, and more.

### Nodes and Edges

I’ll create nodes from the Source Airport and Destination Airport variables in the Routes dataset. Each record showing a source and destination airport represents an edge between nodes. I can also use the Stops variable as an optional edge weight.

### Categorical Variable

I’ll introduce a categorical variable called `N_S_Hemisphere` based on the latitude of the airports. Airports with a negative latitude will be labeled `S` (South), and those with a positive latitude will be labeled `N` (North).

### Centrality Measures

I’ll be focusing on these key centrality measures:

- **Degree Centrality**: This looks at how many direct connections each node has.
- **Betweenness Centrality**: This identifies nodes that act as bridges within the network.
- **Closeness Centrality**: This calculates the shortest paths between nodes and assigns scores based on their sum of shortest paths.
- **EigenCentrality**: This measure identifies nodes with influence over the entire network.
- **PageRank**: This uncovers nodes whose influence extends beyond direct connections into the wider network.

### Data Loading and Analysis Plan

1. **Download Data**: Grab the .dat files from OpenFlights.org.
2. **Read Data**: Load the data into Pandas DataFrames.
3. **Import Variables**: Import only the variables needed.
4. **Merge Datasets**: Combine the Routes and Airports datasets.
5. **Create Categorical Variable**: Generate `N_S_Hemisphere` from Latitude.
6. **Save Data**: Write the cleaned and merged data to a .edges file.
7. **NetworkX**: Import the data into NetworkX for analysis.



**Python Code**

In [None]:
import pandas as pd
import urllib.request

# URLs from OpenFlights data page
airports_url = "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat"
routes_url = "https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat"

# Adding headers to bypass potential restrictions
req_airports = urllib.request.Request(airports_url, headers={'User-Agent': 'Mozilla/5.0'})
req_routes = urllib.request.Request(routes_url, headers={'User-Agent': 'Mozilla/5.0'})

# Reading the data
with urllib.request.urlopen(req_airports) as response:
    airports = pd.read_csv(response, header=None)

with urllib.request.urlopen(req_routes) as response:
    routes = pd.read_csv(response, header=None)

# Displaying the first few rows of each dataframe
print(airports.head())
print(routes.head())


   0                                            1             2   \
0   1                               Goroka Airport        Goroka   
1   2                               Madang Airport        Madang   
2   3                 Mount Hagen Kagamuga Airport   Mount Hagen   
3   4                               Nadzab Airport        Nadzab   
4   5  Port Moresby Jacksons International Airport  Port Moresby   

                 3    4     5         6           7     8   9  10  \
0  Papua New Guinea  GKA  AYGA -6.081690  145.391998  5282  10  U   
1  Papua New Guinea  MAG  AYMD -5.207080  145.789001    20  10  U   
2  Papua New Guinea  HGU  AYMH -5.826790  144.296005  5388  10  U   
3  Papua New Guinea  LAE  AYNZ -6.569803  146.725977   239  10  U   
4  Papua New Guinea  POM  AYPY -9.443380  147.220001   146  10  U   

                     11       12           13  
0  Pacific/Port_Moresby  airport  OurAirports  
1  Pacific/Port_Moresby  airport  OurAirports  
2  Pacific/Port_Moresby  airport