<a href="https://colab.research.google.com/github/parthivgv/EDA-with-Python-/blob/main/Metro_Network_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd

In [3]:
metro_data = pd.read_csv("/content/Delhi-Metro-Network.csv")

print(metro_data.head())

   Station ID         Station Name  Distance from Start (km)          Line  \
0           1             Jhil Mil                      10.3      Red line   
1           2  Welcome [Conn: Red]                      46.8     Pink line   
2           3          DLF Phase 3                      10.0   Rapid Metro   
3           4           Okhla NSIC                      23.8  Magenta line   
4           5           Dwarka Mor                      10.2     Blue line   

  Opening Date Station Layout   Latitude  Longitude  
0   2008-04-06       Elevated  28.675790  77.312390  
1   2018-10-31       Elevated  28.671800  77.277560  
2   2013-11-14       Elevated  28.493600  77.093500  
3   2017-12-25       Elevated  28.554483  77.264849  
4   2005-12-30       Elevated  28.619320  77.033260  


In [7]:
# checking for missing values
missing_values = metro_data.isnull().sum()
print(missing_values)

Station ID                  0
Station Name                0
Distance from Start (km)    0
Line                        0
Opening Date                0
Station Layout              0
Latitude                    0
Longitude                   0
dtype: int64


In [8]:
# checking data types
data_types = metro_data.dtypes
print(data_types)



Station ID                    int64
Station Name                 object
Distance from Start (km)    float64
Line                         object
Opening Date                 object
Station Layout               object
Latitude                    float64
Longitude                   float64
dtype: object


In [11]:
# converting 'Opening Date' to datetime format
metro_data['Opening Date'] = pd.to_datetime(metro_data['Opening Date'])
metro_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 285 entries, 0 to 284
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Station ID                285 non-null    int64         
 1   Station Name              285 non-null    object        
 2   Distance from Start (km)  285 non-null    float64       
 3   Line                      285 non-null    object        
 4   Opening Date              285 non-null    datetime64[ns]
 5   Station Layout            285 non-null    object        
 6   Latitude                  285 non-null    float64       
 7   Longitude                 285 non-null    float64       
dtypes: datetime64[ns](1), float64(3), int64(1), object(3)
memory usage: 17.9+ KB


In [17]:
#Geospatial Analysis

# defining a color scheme for the metro lines
line_colors = {
    'Red line': 'red',
    'Blue line': 'blue',
    'Yellow line': 'beige',
    'Green line': 'green',
    'Voilet line': 'purple',
    'Pink line': 'pink',
    'Magenta line': 'darkred',
    'Orange line': 'orange',
    'Rapid Metro': 'cadetblue',
    'Aqua line': 'black',
    'Green line branch': 'lightgreen',
    'Blue line branch': 'lightblue',
    'Gray line': 'lightgray'
}

In [13]:
import folium
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = "plotly_white"

In [15]:
delhi_map_with_line_tooltip = folium.Map(location=[28.7041, 77.1025], zoom_start=11)
delhi_map_with_line_tooltip




In [18]:
# adding colored markers for each metro station with line name in tooltip
for index, row in metro_data.iterrows():
    line = row['Line']
    color = line_colors.get(line, 'black')  # Default color is black if line not found in the dictionary
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"{row['Station Name']}",
        tooltip=f"{row['Station Name']}, {line}",
        icon=folium.Icon(color=color)
    ).add_to(delhi_map_with_line_tooltip)

In [19]:
# Displaying the updated map
delhi_map_with_line_tooltip

Here is the map showing the geographical distribution of Delhi Metro stations. Each marker represents a metro station, and you can hover over or click on the markers to see the station name and the metro line it belongs to. This map provides a visual understanding of how the metro stations are spread across Delhi.

In [22]:
#Temporal Analysis

metro_data['Opening Year'] = metro_data['Opening Date'].dt.year
metro_data['Opening Year']



Unnamed: 0,Opening Year
0,2008
1,2018
2,2013
3,2017
4,2005
...,...
280,2015
281,2006
282,2009
283,2019


In [24]:
# counting the number of stations opened each year
stations_per_year = metro_data['Opening Year'].value_counts().sort_index()
stations_per_year

Unnamed: 0_level_0,count
Opening Year,Unnamed: 1_level_1
2002,6
2003,4
2004,11
2005,28
2006,9
2008,3
2009,17
2010,54
2011,13
2013,5


In [26]:
stations_per_year_df = stations_per_year.reset_index()
stations_per_year_df.columns = ['Year', 'Number of Stations']
stations_per_year_df

Unnamed: 0,Year,Number of Stations
0,2002,6
1,2003,4
2,2004,11
3,2005,28
4,2006,9
5,2008,3
6,2009,17
7,2010,54
8,2011,13
9,2013,5


In [27]:
fig = px.bar(stations_per_year_df, x='Year', y='Number of Stations',
             title="Number of Metro Stations Opened Each Year in Delhi",
             labels={'Year': 'Year', 'Number of Stations': 'Number of Stations Opened'})

In [28]:
fig

In [30]:

fig.update_layout(xaxis_tickangle=-45, xaxis=dict(tickmode='linear'),
                  yaxis=dict(title='Number of Stations Opened'),
                  xaxis_title="Year")
fig.show()

Some years show a significant number of new station openings, indicating phases of rapid network expansion.
Conversely, there are years with few or no new stations, which could be due to various factors like planning, funding, or construction challenges.

In [32]:
#Line Analysis:

stations_per_line = metro_data['Line'].value_counts()
stations_per_line

Unnamed: 0_level_0,count
Line,Unnamed: 1_level_1
Blue line,49
Pink line,38
Yellow line,37
Voilet line,34
Red line,29
Magenta line,25
Aqua line,21
Green line,21
Rapid Metro,11
Blue line branch,8


In [33]:
# calculating the total distance of each metro line (max distance from start)
total_distance_per_line = metro_data.groupby('Line')['Distance from Start (km)'].max()
total_distance_per_line

Unnamed: 0_level_0,Distance from Start (km)
Line,Unnamed: 1_level_1
Aqua line,27.1
Blue line,52.7
Blue line branch,8.1
Gray line,3.9
Green line,24.8
Green line branch,2.1
Magenta line,33.1
Orange line,20.8
Pink line,52.6
Rapid Metro,10.0


In [35]:
avg_distance_per_line = total_distance_per_line / (stations_per_line - 1)
avg_distance_per_line

Unnamed: 0_level_0,0
Line,Unnamed: 1_level_1
Aqua line,1.355
Blue line,1.097917
Blue line branch,1.157143
Gray line,1.95
Green line,1.24
Green line branch,1.05
Magenta line,1.379167
Orange line,4.16
Pink line,1.421622
Rapid Metro,1.0


In [37]:
line_analysis = pd.DataFrame({
    'Line': stations_per_line.index,
    'Number of Stations': stations_per_line.values,
    'Average Distance Between Stations (km)': avg_distance_per_line
})
line_analysis

Unnamed: 0_level_0,Line,Number of Stations,Average Distance Between Stations (km)
Line,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aqua line,Blue line,49,1.355
Blue line,Pink line,38,1.097917
Blue line branch,Yellow line,37,1.157143
Gray line,Voilet line,34,1.95
Green line,Red line,29,1.24
Green line branch,Magenta line,25,1.05
Magenta line,Aqua line,21,1.379167
Orange line,Green line,21,4.16
Pink line,Rapid Metro,11,1.421622
Rapid Metro,Blue line branch,8,1.0


In [39]:
# sorting the DataFrame by the number of stations
line_analysis = line_analysis.sort_values(by='Number of Stations', ascending=False)
line_analysis

Unnamed: 0_level_0,Line,Number of Stations,Average Distance Between Stations (km)
Line,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aqua line,Blue line,49,1.355
Blue line,Pink line,38,1.097917
Blue line branch,Yellow line,37,1.157143
Gray line,Voilet line,34,1.95
Green line,Red line,29,1.24
Green line branch,Magenta line,25,1.05
Magenta line,Aqua line,21,1.379167
Orange line,Green line,21,4.16
Pink line,Rapid Metro,11,1.421622
Rapid Metro,Blue line branch,8,1.0


In [40]:
line_analysis.reset_index(drop=True, inplace=True)
print(line_analysis)

                 Line  Number of Stations  \
0           Blue line                  49   
1           Pink line                  38   
2         Yellow line                  37   
3         Voilet line                  34   
4            Red line                  29   
5        Magenta line                  25   
6           Aqua line                  21   
7          Green line                  21   
8         Rapid Metro                  11   
9    Blue line branch                   8   
10        Orange line                   6   
11          Gray line                   3   
12  Green line branch                   3   

    Average Distance Between Stations (km)  
0                                 1.355000  
1                                 1.097917  
2                                 1.157143  
3                                 1.950000  
4                                 1.240000  
5                                 1.050000  
6                                 1.379167  
7        

The table presents a detailed analysis of the Delhi Metro lines, including the number of stations on each line and the average distance between stations.

In [43]:
# creating subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Number of Stations Per Metro Line',
                                                    'Average Distance Between Stations Per Metro Line'),
                    horizontal_spacing=0.2)
fig

In [44]:
# plot for Number of Stations per Line
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Number of Stations'],
           orientation='h', name='Number of Stations', marker_color='crimson'),
    row=1, col=1)

In [45]:
# plot for Average Distance Between Stations
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Average Distance Between Stations (km)'],
           orientation='h', name='Average Distance (km)', marker_color='navy'),
    row=1, col=2
)

In [46]:
# update xaxis properties
fig.update_xaxes(title_text="Number of Stations", row=1, col=1)
fig.update_xaxes(title_text="Average Distance Between Stations (km)", row=1, col=2)

In [47]:
# update yaxis properties
fig.update_yaxes(title_text="Metro Line", row=1, col=1)
fig.update_yaxes(title_text="", row=1, col=2)


In [48]:
# update layout
fig.update_layout(height=600, width=1200, title_text="Metro Line Analysis", template="plotly_white")

In [49]:
fig.show()

In [51]:
layout_counts = metro_data['Station Layout'].value_counts()
layout_counts

Unnamed: 0_level_0,count
Station Layout,Unnamed: 1_level_1
Elevated,214
Underground,68
At-Grade,3


In [52]:
# creating the bar plot using Plotly
fig = px.bar(x=layout_counts.index, y=layout_counts.values,
             labels={'x': 'Station Layout', 'y': 'Number of Stations'},
             title='Distribution of Delhi Metro Station Layouts',
             color=layout_counts.index,
             color_continuous_scale='pastel')


In [53]:

# updating layout for better presentation
fig.update_layout(xaxis_title="Station Layout",
                  yaxis_title="Number of Stations",
                  coloraxis_showscale=False,
                  template="plotly_white")

fig.show()

The bar chart and the counts show the distribution of different station layouts in the Delhi Metro network.

Observations:
Elevated Stations: The majority of the stations are Elevated. It is a common design choice in urban areas to save space and reduce land acquisition issues.

Underground Stations: The Underground stations are fewer compared to elevated ones. These are likely in densely populated or central areas where above-ground construction is less feasible.

At-Grade Stations: There are only a few At-Grade (ground level) stations, suggesting they are less common in the network, possibly due to land and traffic considerations.

#Summary

So, this is how you can perform Delhi Metro Network Analysis using Python. Metro Network Analysis involves examining the network of metro systems to understand their structure, efficiency, and effectiveness. It typically includes analyzing routes, stations, traffic, connectivity, and other operational aspects.