### Imports

In [1]:
import pandas as pd

### 2. Data Sets

### 2.1 Air Quality

The first data is about air quality of Boston. The time range is between 01/01/2014 and 03/09/2022. The data was downloaded from [here](https://aqicn.org/data-platform/register/). There are 5 parameters: PM25, O3, NO2, SO2, CO. The details of parameters in below (wikipedia):

**Particulate matter/particles(PM):** Atmospheric particulate matter, or fine particles, are tiny particles of solid or liquid suspended in a gas.

**Ground level ozone (O3):** Ozone is formed from NOx and VOCs. It is a key constituent of the troposphere. It is also an important constituent of certain regions of the stratosphere commonly known as the ozone layer. Photochemical and chemical reactions involving it drive many of the chemical processes that occur in the atmosphere by day and by night. At abnormally high concentrations brought about by human activities (largely the combustion of fossil fuel), it is a pollutant and a constituent of smog.

**Nitrogen dioxide (NO2):** An intermediate in the industrial synthesis of nitric acid, millions of tons of which are produced each year for use primarily in the production of fertilizers. At higher temperatures it is a reddish-brown gas.

**sulphur dioxide (SO2):** It is a toxic gas responsible for the smell of burnt matches. It is released naturally by volcanic activity and is produced as a by-product of copper extraction and the burning of sulfur-bearing fossil fuels. 

**Carbon monoxide (CO):** CO is a colorless, odorless, toxic gas. It is a product of combustion of fuel such as natural gas, coal or wood. Vehicular exhaust contributes to the majority of carbon monoxide let into the atmosphere. It creates a smog type formation in the air that has been linked to many lung diseases and disruptions to the natural environment and animals.

In [2]:
air_quality = pd.read_csv('boston-air-quality.csv')

In [3]:
air_quality.head()

Unnamed: 0,date,pm25,o3,no2,so2,co
0,2022/3/2,43,31,17,1.0,3
1,2022/3/3,43,28,13,1.0,3
2,2022/3/4,33,28,13,,2
3,2022/3/5,24,33,12,1.0,2
4,2022/3/6,35,24,16,,2


In [4]:
air_quality.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2970 entries, 0 to 2969
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    2970 non-null   object
 1    pm25   2970 non-null   object
 2    o3     2970 non-null   object
 3    no2    2970 non-null   object
 4    so2    2970 non-null   object
 5    co     2970 non-null   object
dtypes: object(6)
memory usage: 139.3+ KB


In [5]:
air_quality.describe()

Unnamed: 0,date,pm25,o3,no2,so2,co
count,2970,2970,2970,2970.0,2970.0,2970
unique,2970,80,69,42.0,9.0,11
top,2022/3/2,36,23,,,2
freq,1,104,161,387.0,1830.0,993


In [8]:
air_quality['date'] = pd.to_datetime(air_quality['date'])

In [11]:
air_quality = air_quality.sort_values(by=['date'])

In [13]:
air_quality=air_quality.reset_index()

In [14]:
air_quality['date']

0      2014-01-01
1      2014-01-02
2      2014-01-03
3      2014-01-04
4      2014-01-05
          ...    
2965   2022-03-05
2966   2022-03-06
2967   2022-03-07
2968   2022-03-08
2969   2022-03-09
Name: date, Length: 2970, dtype: datetime64[ns]

### 2.2 Weather

The second dataset is about Boston's weather. There are a lot of attributes of weather data.The data range is between 01/01/2014 and 03/22/2022. The data was downloaded from [here](https://www.visualcrossing.com/weather/weather-data-services).

In [38]:
weather = pd.read_csv('weather.csv')

In [39]:
weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3003 entries, 0 to 3002
Data columns (total 34 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   index             3003 non-null   int64  
 1   name              3003 non-null   object 
 2   datetime          3003 non-null   object 
 3   tempmax           3003 non-null   float64
 4   tempmin           3003 non-null   float64
 5   temp              3003 non-null   float64
 6   feelslikemax      3003 non-null   float64
 7   feelslikemin      3003 non-null   float64
 8   feelslike         3003 non-null   float64
 9   dew               3003 non-null   float64
 10  humidity          3003 non-null   float64
 11  precip            3003 non-null   float64
 12  precipprob        72 non-null     float64
 13  precipcover       2931 non-null   float64
 14  preciptype        39 non-null     object 
 15  snow              2273 non-null   float64
 16  snowdepth         2273 non-null   float64


In [41]:
weather['datetime']

0       2014-01-01
1       2014-01-02
2       2014-01-03
3       2014-01-04
4       2014-01-05
           ...    
2998    2022-03-18
2999    2022-03-19
3000    2022-03-20
3001    2022-03-21
3002    2022-03-22
Name: datetime, Length: 3003, dtype: object

## 2.3 Energy Consumption

In [69]:
# Natural Gas Delivered to Consumers in Massachusetts (Including Vehicle Fuel) (Million Cubic Feet).
# Monthly Data
# State Level
natural_gas_consumption=pd.read_csv('Natural_Gas_Delivered_to_Consumers_in_Massachusetts_(Including_Vehicle_Fuel).csv')

In [70]:
# Massachusetts Total Gasoline All Sales per Deliveries by Prime_Supplier
# Monthly Data
# State Level
gasoline_sales=pd.read_csv('Massachusetts_Total_Gasoline_All_Sales_per_Deliveries_by_Prime_Supplier.csv')

## 2.4 Yearly Data of Boston

### 2.4.1 Average Travel Time

Time range is between 2013-2019.

In [44]:
commute_time=pd.read_csv('Commute Time.csv')
commute_time_boston=commute_time[commute_time['Geography']=='Boston, MA']
commute_time_boston

Unnamed: 0,ID Year,Year,Average Commute Time,Geography,ID Geography,Slug Geography
3,2019,2019,29.857915,"Boston, MA",16000US2507000,boston-ma
23,2018,2018,29.76157,"Boston, MA",16000US2507000,boston-ma
43,2017,2017,29.44444,"Boston, MA",16000US2507000,boston-ma
63,2016,2016,29.265221,"Boston, MA",16000US2507000,boston-ma
83,2015,2015,28.975732,"Boston, MA",16000US2507000,boston-ma
103,2014,2014,28.283146,"Boston, MA",16000US2507000,boston-ma
123,2013,2013,27.786776,"Boston, MA",16000US2507000,boston-ma


### 2.4.2 Most Common Transportation Type

Time range is between 2013 and 2019.

In [45]:
com_transportation = pd.read_csv('Commuter Transportation.csv')
com_transportation

Unnamed: 0,ID Group,Group,ID Year,Year,Commute Means,Commute Means Moe,Geography,ID Geography,Slug Geography,share
0,0,Drove Alone,2019,2019,148128,6523.000000,"Boston, MA",16000US2507000,boston-ma,0.369479
1,1,Carpooled,2019,2019,23284,3084.685721,"Boston, MA",16000US2507000,boston-ma,0.058078
2,2,Public Transit,2019,2019,128238,6786.393298,"Boston, MA",16000US2507000,boston-ma,0.319867
3,3,Taxi,2019,2019,2450,709.000000,"Boston, MA",16000US2507000,boston-ma,0.006111
4,4,Motorcycle,2019,2019,281,236.000000,"Boston, MA",16000US2507000,boston-ma,0.000701
...,...,...,...,...,...,...,...,...,...,...
58,4,Motorcycle,2013,2013,204,187.000000,"Boston, MA",16000US2507000,boston-ma,0.000597
59,5,Bicycle,2013,2013,6662,1399.000000,"Boston, MA",16000US2507000,boston-ma,0.019492
60,6,Walked,2013,2013,49558,3918.000000,"Boston, MA",16000US2507000,boston-ma,0.145001
61,7,Other,2013,2013,2412,864.000000,"Boston, MA",16000US2507000,boston-ma,0.007057


### 2.4.3 Population of Boston

In [59]:
population = pd.read_csv('population of boston.csv')
population=population[population['Year']>2013]
population
#Note: 2021 and 2022 data is projected. (https://worldpopulationreview.com/us-cities/boston-ma-population)

Unnamed: 0,Year,Population,Growth,GrowthRate
0,2022,696959,1453,0.0021
1,2021,695506,1453,0.0021
2,2020,694053,1453,0.0021
3,2019,692600,1453,0.0021
4,2018,691147,3359,0.0049
5,2017,687788,7940,0.0117
6,2016,679848,9357,0.014
7,2015,670491,7636,0.0115
8,2014,662855,9853,0.0151
