https://www.researchgate.net/publication/369619791_Predicting_Natural_Gas_Pipeline_Failures_Caused_by_Natural_Forces_An_Artificial_Intelligence_Classification_Approach/download

What to do as final outcome. 
- Identify Predicting High Risk pipeline
- prioritize Inspection and maintenance activities. 
- Benefit - Cost saving and improved safety. 

-Data set - https://www.phmsa.dot.gov/data-and-statistics/pipeline/data-visualization-overview
- Streamlit - https://dataqoil.com/2022/02/20/creating-awesome-data-dashboard-with-plotly-in-streamlit/


Why we are doing it. 

- Visualize energy sources - natural gas is still the preferred energy source

In [28]:
import numpy as np
import pandas as pd
import plotly
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.offline import iplot

In [29]:
#https://ourworldindata.org/energy
energy_source = pd.read_csv("../data/primary-energy-source-bar-3.csv")

In [30]:
energy_source

Unnamed: 0,Entity,Code,Year,Coal consumption - TWh,Oil consumption - TWh,Gas consumption - TWh,Nuclear consumption - TWh,Hydro consumption - TWh,Wind consumption - TWh,Solar consumption - TWh,Other renewables (including geothermal and biomass) - TWh
0,Canada,CAN,1965,179.98587,642.81793,216.68422,0.363202,349.18494,,,
1,Canada,CAN,1966,176.45036,677.26220,237.22487,0.488053,387.36557,,,
2,Canada,CAN,1967,174.82216,723.10620,251.91719,0.434140,396.01184,,,
3,Canada,CAN,1968,189.92952,769.79310,280.52078,2.604839,402.74190,,,
4,Canada,CAN,1969,183.37021,801.54530,313.49630,1.501046,445.43494,,,
...,...,...,...,...,...,...,...,...,...,...,...
169,United States,USA,2018,3689.45850,10299.73000,8219.63300,2156.894300,765.97660,728.6906,249.51344,250.43823
170,United States,USA,2019,3150.46000,10283.93200,8509.99900,2155.131600,752.48486,787.8162,284.61578,234.43285
171,United States,USA,2020,2556.18140,9032.39700,8328.86800,2095.404300,742.66330,896.4879,346.77893,225.99403
172,United States,USA,2021,2936.89430,9863.39800,8358.03400,2060.684800,651.45520,999.6149,434.58510,224.51933


In [23]:
energy_source.groupby("Entity").

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001BF0D55E8F0>

In [31]:
energy_source.Entity.unique().tolist()

['Canada', 'India', 'United States']

In [33]:
energy_source.loc[energy_source['Entity']=="Canada"]

Unnamed: 0,Entity,Code,Year,Coal consumption - TWh,Oil consumption - TWh,Gas consumption - TWh,Nuclear consumption - TWh,Hydro consumption - TWh,Wind consumption - TWh,Solar consumption - TWh,Other renewables (including geothermal and biomass) - TWh
0,Canada,CAN,1965,179.98587,642.81793,216.68422,0.363202,349.18494,,,
1,Canada,CAN,1966,176.45036,677.2622,237.22487,0.488053,387.36557,,,
2,Canada,CAN,1967,174.82216,723.1062,251.91719,0.43414,396.01184,,,
3,Canada,CAN,1968,189.92952,769.7931,280.52078,2.604839,402.7419,,,
4,Canada,CAN,1969,183.37021,801.5453,313.4963,1.501046,445.43494,,,
5,Canada,CAN,1970,196.87263,854.80023,346.3385,2.942503,467.60938,,,
6,Canada,CAN,1971,187.1267,877.85034,369.66525,12.107677,480.39542,,,
7,Canada,CAN,1972,176.5783,924.3328,415.7948,20.46132,537.10455,,,
8,Canada,CAN,1973,181.8583,1009.6328,442.17908,43.28345,575.5898,,,
9,Canada,CAN,1974,184.77744,1027.002,446.97964,41.70012,622.94006,,,


In [45]:
energy_source.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 174 entries, 0 to 173
Data columns (total 11 columns):
 #   Column                                                     Non-Null Count  Dtype  
---  ------                                                     --------------  -----  
 0   Entity                                                     174 non-null    object 
 1   Code                                                       174 non-null    object 
 2   Year                                                       174 non-null    int64  
 3   Coal consumption - TWh                                     174 non-null    float64
 4   Oil consumption - TWh                                      174 non-null    float64
 5   Gas consumption - TWh                                      174 non-null    float64
 6   Nuclear consumption - TWh                                  174 non-null    float64
 7   Hydro consumption - TWh                                    174 non-null    float64
 8   Wind consu

In [7]:
can_energy_source = energy_source[energy_source['Code']=='CAN']
can_energy_source.drop(columns=["Entity", "Code"])

Unnamed: 0,Year,Coal consumption - TWh,Oil consumption - TWh,Gas consumption - TWh,Nuclear consumption - TWh,Hydro consumption - TWh,Wind consumption - TWh,Solar consumption - TWh,Other renewables (including geothermal and biomass) - TWh
0,1965,179.98587,642.81793,216.68422,0.363202,349.18494,,,
1,1966,176.45036,677.2622,237.22487,0.488053,387.36557,,,
2,1967,174.82216,723.1062,251.91719,0.43414,396.01184,,,
3,1968,189.92952,769.7931,280.52078,2.604839,402.7419,,,
4,1969,183.37021,801.5453,313.4963,1.501046,445.43494,,,
5,1970,196.87263,854.80023,346.3385,2.942503,467.60938,,,
6,1971,187.1267,877.85034,369.66525,12.107677,480.39542,,,
7,1972,176.5783,924.3328,415.7948,20.46132,537.10455,,,
8,1973,181.8583,1009.6328,442.17908,43.28345,575.5898,,,
9,1974,184.77744,1027.002,446.97964,41.70012,622.94006,,,


In [14]:
fig = px.line(can_energy_source, x="Year", y =["Coal consumption - TWh",'Coal consumption - TWh',
       'Oil consumption - TWh', 'Gas consumption - TWh',
       'Nuclear consumption - TWh', 'Hydro consumption - TWh',
       'Wind consumption - TWh', 'Solar consumption - TWh',
       'Other renewables (including geothermal and biomass) - TWh'])
fig.show()

In [13]:
can_energy_source.columns

Index(['Entity', 'Code', 'Year', 'Coal consumption - TWh',
       'Oil consumption - TWh', 'Gas consumption - TWh',
       'Nuclear consumption - TWh', 'Hydro consumption - TWh',
       'Wind consumption - TWh', 'Solar consumption - TWh',
       'Other renewables (including geothermal and biomass) - TWh'],
      dtype='object')

In [15]:
usa_energy_source = energy_source[energy_source['Code']=='USA']
usa_energy_source.drop(columns=["Entity", "Code"])

Unnamed: 0,Year,Coal consumption - TWh,Oil consumption - TWh,Gas consumption - TWh,Nuclear consumption - TWh,Hydro consumption - TWh,Wind consumption - TWh,Solar consumption - TWh,Other renewables (including geothermal and biomass) - TWh
116,1965,3224.2441,6414.477,4159.206,10.92204,588.3642,,,42.48756
117,1966,3380.8462,6731.245,4482.756,16.487183,591.21124,,,44.817505
118,1967,3316.9966,6988.794,4733.1885,22.865034,671.8895,,,44.80583
119,1968,3433.0764,7457.961,5066.815,37.420605,674.651,,,49.46122
120,1969,3447.2375,7831.838,5454.107,41.60047,757.07355,,,51.566704
121,1970,3414.6594,8176.1914,5748.658,65.126785,749.57416,,,51.628983
122,1971,3229.2007,8461.684,5926.5264,113.81286,805.0521,,,52.73299
123,1972,3362.425,9119.912,5986.9644,161.56253,824.16016,,,59.056416
124,1973,3611.4897,9655.255,5937.959,249.34128,822.67194,,,63.077263
125,1974,3525.567,9271.507,5732.247,340.42935,908.6373,,,64.21887


In [16]:
fig_usa = px.line(usa_energy_source, x="Year", y =["Coal consumption - TWh",'Coal consumption - TWh',
       'Oil consumption - TWh', 'Gas consumption - TWh',
       'Nuclear consumption - TWh', 'Hydro consumption - TWh',
       'Wind consumption - TWh', 'Solar consumption - TWh',
       'Other renewables (including geothermal and biomass) - TWh'])
fig_usa.show()

In [17]:
ind_energy_source = energy_source[energy_source['Code']=='IND']
ind_energy_source.drop(columns=["Entity", "Code"])

Unnamed: 0,Year,Coal consumption - TWh,Oil consumption - TWh,Gas consumption - TWh,Nuclear consumption - TWh,Hydro consumption - TWh,Wind consumption - TWh,Solar consumption - TWh,Other renewables (including geothermal and biomass) - TWh
58,1965,413.40735,146.99513,2.37252,0.0,56.67643,0.0,,
59,1966,412.41534,164.14967,2.66327,0.0,59.185505,0.0,,
60,1967,419.28525,169.36021,3.55878,0.0,66.229225,0.0,,
61,1968,433.8247,189.77089,3.9542,0.0,76.435425,0.0,,
62,1969,460.8412,227.92883,4.74504,2.056579,84.86436,0.0,,
63,1970,436.80887,226.89145,6.328466,3.711873,90.00013,0.0,,
64,1971,442.8722,243.33246,6.706219,2.758825,98.71657,0.0,0.0,0.0
65,1972,467.68307,262.38324,7.355906,2.620884,95.81544,0.0,0.0,0.0
66,1973,461.44937,276.95908,7.336657,5.54273,102.035866,0.0,0.0,0.0
67,1974,520.8561,271.54297,8.303968,5.103826,98.18077,0.0,0.0,0.0


In [44]:
fig_ind = px.line(ind_energy_source, x="Year", y =["Coal consumption - TWh",'Coal consumption - TWh',
       'Oil consumption - TWh', 'Gas consumption - TWh',
       'Nuclear consumption - TWh', 'Hydro consumption - TWh',
       'Wind consumption - TWh', 'Solar consumption - TWh',
       'Other renewables (including geothermal and biomass) - TWh'])

fig_ind.show()

PipelineIncidents low medium