# DARTE_v2.gdb need to be downloaded from the data sources (ORNL)

In [7]:
%%time
import geopandas as gpd
gdb = 'DARTE_v2.gdb'
fc = gpd.read_file(gdb)

CPU times: user 1min 32s, sys: 1.4 s, total: 1min 34s
Wall time: 1min 34s


The data is large so it takes time to load into process (~2min)

## fc.columns shows the information it contains

In [8]:
fc.columns

Index(['GEOID', 'bg_area_m2', 'geo_num', 'kgco2_1980', 'kgco2_1981',
       'kgco2_1982', 'kgco2_1983', 'kgco2_1984', 'kgco2_1985', 'kgco2_1986',
       'kgco2_1987', 'kgco2_1988', 'kgco2_1989', 'kgco2_1990', 'kgco2_1991',
       'kgco2_1992', 'kgco2_1993', 'kgco2_1994', 'kgco2_1995', 'kgco2_1996',
       'kgco2_1997', 'kgco2_1998', 'kgco2_1999', 'kgco2_2000', 'kgco2_2001',
       'kgco2_2002', 'kgco2_2003', 'kgco2_2004', 'kgco2_2005', 'kgco2_2006',
       'kgco2_2007', 'kgco2_2008', 'kgco2_2009', 'kgco2_2010', 'kgco2_2011',
       'kgco2_2012', 'kgco2_2013', 'kgco2_2014', 'kgco2_2015', 'kgco2_2016',
       'kgco2_2017', 'Shape_Length', 'Shape_Area', 'geometry'],
      dtype='object')

### GEOID is the ID of each region:

A 12-digit GEOID (Geographic Identifier) is commonly used in U.S. geographical statistical data to uniquely identify a geographical location. This identifier is composed of different parts, each representing a different geographical unit. The 12-digit GEOID is typically used to describe smaller geographical units, such as census block-level data. Here's the structure and meaning of the 12-digit GEOID:

- The first two digits represent the State code. Each state has a unique two-digit code.
- The next three digits are the County code. Within each state, each county has a unique three-digit code.
- The following six digits represent the Census Tract. This is used to further subdivide counties into geographic areas.
- The final part is a one-digit Block code. This is the smallest geographic unit, and within each tract, each block has a unique one-digit code.


### bg_area_m2 is the area of each region, unit in m2

### kgco2_XXXX is the annual CO2 for each region
This data set provides a 38-year, 1-km resolution inventory of annual on-road CO2 emissions for the conterminous United States based on roadway-level vehicle traffic data and state-specific emissions factors for multiple vehicle types on urban and rural roads as compiled in the Database of Road Transportation Emissions (DARTE).



# How many unique region?

In [12]:
len(fc.GEOID.unique())

220333

# Area of the first region

In [16]:
fc['bg_area_m2'][0]

6974537.80960697

# Extract the useful information for plot and figure
'GEOID','kgco2_2017','bg_area_m2'

To calculate the CO2 in 2017, raw unit in kilograms

The code is to get the County and State number of each subregion

In [17]:
data = fc[['GEOID','kgco2_2017','bg_area_m2']]
data['County']=data['GEOID'].apply(lambda v: v[:5])
data['State']=data['GEOID'].apply(lambda v: v[:2])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['County']=data['GEOID'].apply(lambda v: v[:5])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['State']=data['GEOID'].apply(lambda v: v[:2])


# National number

In [23]:
round(data.kgco2_2017.sum()/907.2)

1747657704

# Davidson County, GEOID 47037

In [24]:
round(data[data.County == '47037'].kgco2_2017.sum()/907.2,0)

6026327.0

In [25]:
data[data.County == '47037'].bg_area_m2.sum()*3.861E-7

521.3176839131215

# TN, GEOID 47

In [27]:
round(data[data.State == '47'].kgco2_2017.sum()/907.2,0)

41989164.0

In [28]:
data[data.State == '47'].bg_area_m2.sum()*3.861E-7

41819.12192945298

# Southeast, GEOID ['01', '12', '13', '21', '28', '37', '45', '47']

In [29]:
states_list = ['01', '12', '13', '21', '28', '37', '45', '47']
co2 = round(data[data['State'].isin(states_list)].kgco2_2017.sum()/907.2,0)
area = round(data[data['State'].isin(states_list)].bg_area_m2.sum()*3.861E-7,0)
print(co2/area)

958.648356767102


In [30]:
co2 = round(data.kgco2_2017.sum()/907.2,0)
co = round(data.kgco2_2017.sum()/907.2,0)

In [35]:
round(data.bg_area_m2.sum()*3.861E-7,0)

3957886.0

In [36]:
1747657704/(325.1*1.0E+6)

5.375754241771762

In [37]:
378679522/(65740587)

5.760209016691622

In [38]:
41989164/(6.715*1.0E+6)

6.253040059568131

In [116]:
6026327.0/687808

8.761641330138643