In [1]:
import pandas as pd

# Methods

## Defining U.S. Production Boundaries

We define the production boundary for the timber asset account along two dimensions: tree species and spatial extent. The spatial extent is inferred from regions where timber markets are currently active. These regions indicate that buyers (forestry and logging operations) and sellers (timber owners) value the trees as assests when making management decisions. There are four primary U.S. timber markets; the south (11 states), northeast (mostly Maine), the lake states (Michigan, Minnesota, and Wisconsin), and the pacific northwest (Washington, Oregon, and Northern California). Each of these regions has specialized their timber markets to utilize their distinct tree species distribution, topology, and climatic conditions.

### The Southern United States
    
The Southern U.S. market is dominated by yellow pine species (list them) including loblolly pine that makes up X% of total harvest. Because of the heavy concentration of yellow pine, standing timber prices are reported broadly with species distinctions falling into only two categories; pine and hardwood. Pine timber is utilized for building materials needing large sawlog size timber, and pulpwood for paper products relying on smaller trees harvested during thinning operations and residue from processing larger timber. Given these market characteristics, we stratify the standing timber biomass according to diameter size class estimated from the U.S. Forest Service's National Forest Inventory.

Timber Price data for the southern U.S. is curated by TimberMart South, a private firm providing price and market analysis across 11 states. Each state has two regions that generally divide the landscape between the coastal plain and the piedmont region. (what makes these regions different?). Similarly, the NFI estimates forest extent and condition across survey units designed to capture the climatic and topological differences betweent the coastal plain, piedmont, and Appalachian mountain range.

### The Great Lakes Region

The Great Lakes region is comprised of the northern portions of Michigan, Minnesota, and Wisconsin. This region's timber market has formed around large, slow growing hardwood tree species.

## Matching Timber Biomass Data with Market Prices

### Stumpage vs Delivered Price

We use stumpage price because our objective is to value the timber asset prior to harvesting. Once timber is harvested the asset moves out of the timber account into the national account for forestry and logging where value is added through additional human and capital inputs. The delivered price is paid to loggers at the mill and differs from stumpage based on the cost of harvesting the trees and transporting them to the mill.

### How to measure the stock of standing timber

Estimate the timber biomass at the species level first grouped by FIA Survey Unit and diameter class. Diameter classes should be grouped according to the prices reported. 

* In the southern U.S. diameter class include non-merchantable (1"-4.9"), pulpwood (5"-11.9"), and sawtimber (12"+).
* The great lakes diameter classes differ only across large trees, either standard sawtimber size (up to 30") or veneer size (30"+).



In [4]:
# load biomass data from the FS FIA's National Forest Inventory
biomass = pd.read_csv('../data/nca-timber-biomass.csv')

# columns 11-29 are the total volume of timber in cubit feet for each size class
# for example, column 11 is '`0003 5.0-6.9'; which is size class code 0003 and size class 5.0-6.9 inches
# we can use the pandas melt function to convert these columns into rows
biomass = biomass.melt(
    id_vars=biomass.columns[0:10],
    value_vars=biomass.columns[10:29],
    var_name='size_class',
    value_name='volume'
    )

# split the size class code and size class range into two columns
biomass[['size_class_code', 'size_class_range']] = biomass['size_class'].str.split(' ', n=1, expand=True)
# drop the first two characters of the size class code
biomass['size_class_code'] = biomass['size_class_code'].str[2:]
# drop the last character in the size class range
biomass['size_class_range'] = biomass['size_class_range'].str[:-1]

# the last four characters of 'EVALID' is the year
# create a new integer column 'year' with the year
biomass['year'] = biomass['EVALID'].astype(str).str[-4:].astype(int)

# rename SPGRPNM and SPCLASS to all lower case
biomass.rename(columns={'SPGRPNM': 'spgrpnm',
                                'SPCLASS': 'spclass',
                                'STATENM' : 'state'}, inplace=True)

# format fips codes
# STATECD should be two characters
# COUNTYCD should be three characters
biomass['STATECD'] = biomass['STATECD'].astype(str).str.zfill(2)
biomass['COUNTYCD'] = biomass['COUNTYCD'].astype(str).str.zfill(3)
biomass['fips'] = biomass['STATECD'] + biomass['COUNTYCD']

# format survey unit codes; should be two characters
biomass['unitcd'] = biomass['UNITCD'].astype(str).str.zfill(2)

# replace NaN with 0 in the volume column
biomass['cuft'] = biomass['volume'].fillna(0)

# keep only the columns we need
# year, fips, unitcd, spclass, spgrpnm, size_class_code, size_class_range, volume
biomass = biomass[['year', 'state', 'fips', 'unitcd', 'spclass',
                                 'spgrpnm', 'size_class_code',
                                 'size_class_range', 'cuft']]

# in biomass, drop spgrpnm if noncommercial or Urban
biomass = biomass[~biomass['spgrpnm'].
                    isin(['Eastern noncommercial hardwoods',
                           'Urban-specific hardwoods'])]

# add spatial dictionary to biomass to identify price regions based on
# fips codes

# read in the spatial dictionary
spatial_dict = pd.read_csv('../data/priceRegions.csv')
# drop the columns we don't need
spatial_dict = spatial_dict[['fips', 'priceRegion']]

# format fips codes as strings
spatial_dict['fips'] = spatial_dict['fips'].astype(str).str.zfill(5)
spatial_dict['priceRegion'] = spatial_dict['priceRegion'].astype(str).str.zfill(2)

# merge the spatial dictionary with biomass on fips and unitcd
biomass = pd.merge(biomass, spatial_dict, on='fips')

# filter for the year 2020
#biomass = biomass[biomass['year'] == 2020]

# print the unique years in biomass
#print(biomass['year'].unique())

print(biomass.head())


   year    state   fips unitcd   spclass                       spgrpnm  \
0  2001  Alabama  01001     03  Softwood      Longleaf and slash pines   
1  2001  Alabama  01001     03  Softwood  Loblolly and shortleaf pines   
2  2001  Alabama  01001     03  Softwood            Other yellow pines   
3  2001  Alabama  01001     03  Softwood                       Cypress   
4  2001  Alabama  01001     03  Softwood       Other eastern softwoods   

  size_class_code size_class_range          cuft priceRegion  
0            0003          5.0-6.9  3.394020e+07          01  
1            0003          5.0-6.9  4.878028e+08          01  
2            0003          5.0-6.9  4.852567e+06          01  
3            0003          5.0-6.9  3.417577e+06          01  
4            0003          5.0-6.9  2.159237e+06          01  


## Spatial Extent

We link price data to biomass volume using the county fips identifier. Each county belongs to a distinct survey unit and price region. 

In [5]:
# read in spatial crosswalk table
priceRegions = pd.read_csv('../data/priceRegions.csv')
# convert columns to character
priceRegions['fips'] = priceRegions['fips'].astype(str).str.zfill(5)
priceRegions['unitcd'] = priceRegions['unitcd'].astype(str).str.zfill(2)
priceRegions['statecd'] = priceRegions['statecd'].astype(str).str.zfill(2)
priceRegions['priceRegion'] = priceRegions['priceRegion'].astype(str).str.zfill(2)
priceRegions['priceRegion'] = priceRegions['statecd'] + priceRegions['priceRegion']

priceRegions

Unnamed: 0,fips,statecd,priceRegion,unitcd
0,01001,01,0101,3.0
1,01003,01,0102,1.0
2,01005,01,0102,3.0
3,01007,01,0101,4.0
4,01009,01,0101,5.0
...,...,...,...,...
1224,55113,55,5503,2.0
1225,55119,55,5503,2.0
1226,55121,55,5503,4.0
1227,55129,55,5503,2.0


### TimberMart North Stumpage Prices

In [8]:
# read in northern price data from excel file
import openpyxl

pricesNorth = pd.read_excel('../data/Timber Prices/TMN_Price_Series_September2023.xlsx')

# drop all rows where Region has exactly 2 characters
pricesNorth = pricesNorth[pricesNorth['Region'].str.len() != 2]

# filter for Market = 'Stumpage'
pricesNorth = pricesNorth[pricesNorth['Market'] == 'Stumpage']

# convert 'Period End Date' to datetime
pricesNorth['Period End Date'] = pd.to_datetime(pricesNorth['Period End Date'],
                                                errors='coerce')

# create a year variable from column "Period End Date"
pricesNorth['year'] = pricesNorth['Period End Date'].dt.year

# split the Region column into two columns on '-'
pricesNorth[['state_abbr', 'priceRegion']] = pricesNorth['Region'].str.split('-', n=1, expand=True)
pricesNorth['priceRegion'] = pricesNorth['priceRegion'].str.zfill(2)

# add a column for the state fips code
# first, create a dictionary of state abbreviations and fips codes for MN, WI, MI
state_fips = {'MN': '27', 'WI': '55', 'MI': '26'}
pricesNorth['statecd'] = pricesNorth['state_abbr'].map(state_fips)
pricesNorth['priceRegion'] = pricesNorth['statecd'] + pricesNorth['priceRegion']

# select only the columns we need
# year, priceRegion, Species, Product, $ Per Unit, Units
pricesNorth = pricesNorth[['year', 'priceRegion', 'Species',
                            'Product', '$ Per Unit', 'Units']]

# drop if $ Per Unit is NaN or year is NaN
pricesNorth = pricesNorth.dropna(subset=['$ Per Unit', 'year'])

# if the Units column is 'Cords', convert $ Per Unit to $ per cord
# if the Units column is 'MBF', convert $ Per Unit to $ per MBF
# conversion factors are 1 cord = 128 cubic feet and 1 MBF = 1000 board feet
# 12 board feet = 1 cubic foot
pricesNorth['$ Per Unit'] = pricesNorth['$ Per Unit'].astype(float)
pricesNorth['cuftPrice'] = pricesNorth['$ Per Unit']
pricesNorth.loc[pricesNorth['Units'] == 'cord', 'cuftPrice'] = pricesNorth['$ Per Unit'] / 128
pricesNorth.loc[pricesNorth['Units'] == 'mbf', 'cuftPrice'] = pricesNorth['$ Per Unit'] / 12

# drop the $ Per Unit and Units columns
pricesNorth = pricesNorth.drop(columns=['$ Per Unit', 'Units'])

# rename variables
pricesNorth.rename(columns={'Species': 'priceSpecies'}, inplace=True)

# aggregate over years
pricesNorth = pricesNorth.groupby(
    ['priceRegion', 'priceSpecies', 'Product']).mean().reset_index()

# pivot the table so that each row is a unique year, 
# priceRegion, priceSpecies
# and the columns are the products
pricesNorth = pricesNorth.pivot(
    index=['priceRegion', 'priceSpecies'],
    columns='Product',
    values='cuftPrice').reset_index()

# some prices are not reported for all products
# fill missing values with 0
pricesNorth = pricesNorth.fillna(0)

pricesNorth.groupby('priceSpecies').describe()

Product,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
priceSpecies,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Ash,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,9.667374,2.924213,5.465222,7.911426,9.276906,12.112972,13.421702
Aspen,8.0,0.23379,0.038959,0.166281,0.212307,0.234573,0.264151,0.278622,8.0,7.090165,2.480493,3.759807,5.754199,6.076828,8.808975,11.167589
Basswood,8.0,0.080035,0.029286,0.028779,0.065595,0.082433,0.09022,0.129494,8.0,11.178991,2.735776,6.781604,9.016016,12.406528,13.266694,13.843603
Beech,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,7.587131,2.093239,5.989583,6.522374,6.973366,7.388942,12.5
Birch,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,14.583333,,14.583333,14.583333,14.583333,14.583333,14.583333
Black Ash,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,11.041667,5.597929,7.083333,9.0625,11.041667,13.020833,15.0
Black Cherry,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,19.182346,6.372149,11.734611,14.356585,20.184821,21.161942,29.088333
Black Walnut,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,174.229645,,174.229645,174.229645,174.229645,174.229645,174.229645
Elm,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,5.477033,2.654668,3.5999,4.538467,5.477033,6.4156,7.354167
Hard Maple,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,35.526016,14.089273,11.016377,29.00998,35.652637,44.346964,55.098868


Split pulpwood and sawtimber into separate dataframes

In [23]:
pricesPulpNorth = pricesNorth.drop(columns = 'Sawtimber')
pricesPulpNorth = pricesPulpNorth[pricesPulpNorth['Pulpwood'] != 0]
pricesPulpNorth.groupby('priceSpecies').describe()

Product,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
priceSpecies,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Aspen,8.0,0.23379,0.038959,0.166281,0.212307,0.234573,0.264151,0.278622
Basswood,8.0,0.080035,0.029286,0.028779,0.065595,0.082433,0.09022,0.129494
Jack Pine,8.0,0.297293,0.078331,0.183333,0.242509,0.288127,0.352947,0.405556
Oak,8.0,0.171348,0.026688,0.136951,0.158687,0.167964,0.179326,0.226764
Other Hardwood,8.0,0.144099,0.042189,0.06988,0.125598,0.153469,0.16735,0.193819
Other Hdwd,8.0,0.194219,0.064562,0.10275,0.143883,0.21992,0.244826,0.256984
Other Sfwd,8.0,0.218593,0.084367,0.065399,0.187677,0.236286,0.26218,0.339243
Other Softwood,8.0,0.137004,0.029011,0.089059,0.118867,0.138305,0.159814,0.169394
Red Pine,8.0,0.410263,0.102884,0.267858,0.327066,0.433153,0.468439,0.548446
Spruce/Fir,8.0,0.187332,0.037939,0.11433,0.176652,0.18561,0.203101,0.246292


In [24]:
pricesSawNorth = pricesNorth.drop(columns = 'Pulpwood')
pricesSawNorth = pricesSawNorth[pricesSawNorth['Sawtimber'] != 0]
pricesSawNorth.groupby('priceSpecies').describe()

Product,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
priceSpecies,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Ash,8.0,9.667374,2.924213,5.465222,7.911426,9.276906,12.112972,13.421702
Aspen,8.0,7.090165,2.480493,3.759807,5.754199,6.076828,8.808975,11.167589
Basswood,8.0,11.178991,2.735776,6.781604,9.016016,12.406528,13.266694,13.843603
Beech,8.0,7.587131,2.093239,5.989583,6.522374,6.973366,7.388942,12.5
Birch,1.0,14.583333,,14.583333,14.583333,14.583333,14.583333,14.583333
Black Ash,2.0,11.041667,5.597929,7.083333,9.0625,11.041667,13.020833,15.0
Black Cherry,6.0,19.182346,6.372149,11.734611,14.356585,20.184821,21.161942,29.088333
Black Walnut,1.0,174.229645,,174.229645,174.229645,174.229645,174.229645,174.229645
Elm,2.0,5.477033,2.654668,3.5999,4.538467,5.477033,6.4156,7.354167
Hard Maple,8.0,35.526016,14.089273,11.016377,29.00998,35.652637,44.346964,55.098868


In [16]:
# process southern stumpage prices
pricesSouth = pd.read_csv('../data/Timber Prices/prices_south.csv')

# columns 4 through 69 are stumpage prices grouped by product type,
# state abbreviation and state region. for example, sawfl1 is sawtimber prices for
# state of Florida and Florida region 1. we can use the pandas melt function to
# convert these columns into rows

pricesSouth = pricesSouth.melt(
    id_vars=pricesSouth.columns[0:3],
    value_vars=pricesSouth.columns[3:69],
    var_name='product',
    value_name='price'
    )

# split the product column into three columns: product, stateAbbr, priceRegion
pricesSouth['stateAbbr'] = pricesSouth['product'].str[3:5].str.upper()
pricesSouth['priceRegion'] = pricesSouth['product'].str[5:].str.zfill(2)
pricesSouth['product'] = pricesSouth['product'].str[:3]

# drop product if equal to 'pre'
pricesSouth = pricesSouth[pricesSouth['product'] != 'pre']

# change 'saw' to 'Sawtimber' and 'pul' to 'Pulpwood'
pricesSouth['product'] = pricesSouth['product'].replace({'saw': 'Sawtimber',
                                                         'plp': 'Pulpwood'})

# change 'pine' and 'oak' to 'Pine' and 'Oak'
pricesSouth['type'] = pricesSouth['type'].replace({'pine': 'Pine',
                                                            'oak': 'Oak'})

# change column 'type' to 'priceSpecies'
pricesSouth.rename(columns={'type': 'priceSpecies'}, inplace=True)

# add state fips code
state_fips = {'AL': '01', 'AR': '05', 'FL': '12', 'GA': '13', 'KY': '21',
              'LA': '22', 'MS': '28', 'NC': '37', 'OK': '40', 'SC': '45',
              'TN': '47', 'TX': '48', 'VA': '51'}
pricesSouth['statecd'] = pricesSouth['stateAbbr'].map(state_fips)
pricesSouth['priceRegion'] = pricesSouth['statecd'] + pricesSouth['priceRegion']

# aggregate prices by year, state, priceRegion, product, priceSpecies
pricesSouth = pricesSouth.groupby(['priceRegion',
                                  'priceSpecies', 'product'])['price'].mean().reset_index()

# convert price to dollars per cubic foot from dollars per ton
# 1 ton = 40 cubic feet
pricesSouth['price'] = pricesSouth['price'] / 40
pricesSouth.rename(columns={'price': 'cuftPrice',
                            'product': 'Product'}, inplace=True)

pricesSouth.groupby('priceSpecies').describe()
#pricesSouth.head()

Unnamed: 0_level_0,cuftPrice,cuftPrice,cuftPrice,cuftPrice,cuftPrice,cuftPrice,cuftPrice,cuftPrice
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
priceSpecies,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Oak,44.0,4.193308,3.764764,0.314828,0.563048,2.730342,7.904395,9.493706
Pine,44.0,3.019353,2.539561,0.405488,0.553277,2.170732,5.682393,6.275305


Combine north and south prices

In [13]:
# # concatenate pricesNorth and pricesSouth
# prices = pd.concat([pricesNorth, pricesSouth])

# # filter prices for only the years 2015-2020
# #prices = prices[(prices['year'] >= 2015) & (prices['year'] <= 2020)]

# # aggreate prices by region, species, and product
# prices = prices.groupby(['priceRegion',
#                          'priceSpecies', 'Product'])['cuftPrice'].mean().reset_index()

# # pivot the table so that each row is identified by priceRegion and priceSpecies
# # and each column is a unique product price
# prices = prices.pivot(index=['priceRegion', 'priceSpecies'],
#                         columns='Product', values='cuftPrice').reset_index()

# # some species do not have prices for all products; set these to 0
# prices = prices.fillna(0)

# prices.groupby('priceSpecies').describe()

Product,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Pulpwood,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber,Sawtimber
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
priceSpecies,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Oak,22.0,0.567787,0.154789,0.314828,0.446424,0.562014,0.643183,0.862659,22.0,7.818829,1.207148,4.598024,7.624454,7.944612,8.678157,9.493706
Pine,22.0,0.561904,0.096103,0.405488,0.498704,0.549238,0.617454,0.777439,22.0,5.476802,0.737128,3.564024,5.167378,5.692835,6.030488,6.275305


## Match Species in pricesNorth with spgrpnm and spclass in biomass

Species list in the TimberMart North data

In [149]:
# print unique values in the Species column
print(pricesNorth['priceSpecies'].unique())

['Ash' 'Aspen' 'Basswood' 'Beech' 'Hard Maple' 'Jack Pine' 'Oak'
 'Other Hardwood' 'Other Softwood' 'Red Oak' 'Red Pine' 'Soft Maple'
 'Spruce' 'Spruce/Fir' 'White Birch' 'White Oak' 'White Pine'
 'Yellow Birch' 'Mixed Hdwd' 'Other Hdwd' 'Other Sfwd' 'Black Cherry'
 'Black Walnut' 'Elm' 'Hickory' 'Maple Unspecified' 'Pine' 'White Spruce'
 'Mixed Sftwd' nan 'Oak Unspecified' 'White Ash' 'Pine Unspecified'
 'Black Ash' 'Hemlock' 'Birch' 'Scrub Oak' 'Sawtimber ($/mbf)'
 'Spruce Unspecified']


Species groups in the FIA biomass data

In [150]:
# filter for state of Michigan, Wisconsin, and Minnesota
# print unique values in the spgrpnm column
speciesNorth = biomass[biomass['state'].isin(['Michigan', 'Wisconsin', 'Minnesota'])]
speciesNorth = speciesNorth[['spgrpnm']].sort_values('spgrpnm')
print(speciesNorth['spgrpnm'].unique())


['Ash' 'Basswood' 'Beech' 'Black walnut' 'Cottonwood and aspen (East)'
 'Eastern hemlock' 'Eastern white and red pines' 'Hard maple' 'Hickory'
 'Jack pine' 'Other eastern hard hardwoods' 'Other eastern soft hardwoods'
 'Other eastern softwoods' 'Other red oaks' 'Other yellow pines'
 'Select red oaks' 'Select white oaks' 'Soft maple'
 'Spruce and balsam fir' 'Tupelo and blackgum' 'Yellow birch']


* Rule: Prices should be interpolated to match biomass
* Every observation in the FIA data should be assigned a price.

In [171]:
# create a dictionary to assign a 'Species' or group of 'Species' from
# the price data to a 'spgrpnm' from the biomass data. The length of the
# dictionary should be the number of unique values in the 'spgrpnm' column
# of the biomass data.

# species list from price data
# ['Ash''Black Ash'  'White Ash' 
#  'Aspen' 
# 'Basswood'
#  'Beech'
#  'Oak''Red Oak' 'Scrub Oak' 'Oak Unspecified'
#  'Hard Maple''Soft Maple''Maple Unspecified' 
#  'Spruce' 'Spruce/Fir' 'White Birch' 'White Oak' 
# 'Jack Pine''White Pine''Red Pine''Pine''Pine Unspecified'
#  'Yellow Birch' 
# 'Other Hardwood' 'Other Softwood' 
# 'Mixed Hdwd' 'Other Hdwd''Mixed Sftwd'  'Other Sfwd' 
# 'Black Cherry'
#  'Black Walnut' 
# 'Elm' 
# 'Hickory'  
# 'White Spruce''Spruce Unspecified'
#  'Hemlock'
#  'Birch' 

# species list from biomass data
# ['Ash' 
# 'Basswood' 
# 'Beech'
#  'Black walnut' 
# 'Cottonwood and aspen (East)'
#  'Eastern hemlock'
#  'Eastern white and red pines' 
# 'Hard maple' 
# 'Hickory'
#  'Jack pine' 
# 'Other eastern hard hardwoods'
#  'Other eastern soft hardwoods'
#  'Other eastern softwoods'
#  'Other red oaks' 
# 'Other yellow pines'
#  'Select red oaks' 
# 'Select white oaks' 
# 'Soft maple'
#  'Spruce and balsam fir' 
# 'Tupelo and blackgum' 
# 'Yellow birch']

# format is {'priceSpecies': 'fiaSpecies'}

# trying the reverse of the above
speciesCrosswalkNorth = {
    'Ash': 'Ash',
    'Basswood': 'Basswood',
    'Beech': 'Beech',
    'Black Walnut': 'Black walnut',
    'Aspen': 'Cottonwood and aspen (East)',
    'Hemlock': 'Eastern hemlock',
    'Red Pine': 'Eastern white and red pines',
    'White Pine': 'Eastern white and red pines',
    'Hard Maple': 'Hard maple',
    'Hickory': 'Hickory',
    'Jack Pine': 'Jack pine',
    'Pine': 'Loblolly and shortleaf pines',
    'Unspecified Pine': 'Loblolly and shortleaf pines',
    'Other Hardwood': 'Other eastern hard hardwoods',
    'Other Hdwd': 'Other eastern hard hardwoods',
    'Other Softwood': 'Other eastern soft hardwoods',
    'Other Sfwd': 'Other eastern soft hardwoods',
    'Red Oak': 'Other red oaks',
    'White Oak': 'Select white oaks',
    'Soft Maple': 'Soft maple',
    'White Spruce': 'Spruce and balsam fir',
    'Spruce': 'Spruce and balsam fir',
    'Spruce/Fir': 'Spruce and balsam fir',
    'Birch': 'Yellow birch',
    'Black Cherry': '',
    'Elm': '',
    'Mixed Hdwd': '',
    'Mixed Sftwd': '',
    'Oak': '',
    'Oak Unspecified': '',
    'Pine Unspecified': '',
    'Spruce Unspecified': '',
    'White Ash': 'Ash',
    'White Birch': '',
    'Yellow Birch': '',
    'Maple Unspecified': '',
    'Scrub Oak': '',
    'Black Ash': 'Ash'
}


# using the crosswalk, rename the 'Species' column in the price data
# to 'priceSpecies'
pricesNorth['fiaSpecies'] = pricesNorth['priceSpecies'].map(speciesCrosswalkNorth)

# print the priceSpecies where fiaSpecies is null
print(pricesNorth[pricesNorth['fiaSpecies'].isnull()]['priceSpecies'].unique())

# use the crosswalk to create a new column in the biomass data called 'priceSpecies'
# apply this to only the northern states of Michigan, Wisconsin, and Minnesota

#biomass.loc[biomass['state'].isin(['Michigan', 'Wisconsin', 'Minnesota']),
                #    'priceSpecies'] = biomass['spgrpnm'].map(speciesCrosswalkNorth)


['Black Cherry' 'Elm' 'Mixed Hdwd' 'Mixed Sftwd' 'Oak' 'Oak Unspecified'
 'Pine Unspecified' 'Sawtimber ($/mbf)' 'Spruce Unspecified' 'White Ash'
 'White Birch' 'Yellow Birch' 'Maple Unspecified' 'Scrub Oak' 'Black Ash']


In [152]:
# Repeat above steps for the southern states of Alabama, Arkansas,
#  Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina,
#  Oklahoma, South Carolina, Tennessee, Texas, and Virginia

speciesSouth = biomass[biomass['state'].isin(['Alabama', 'Arkansas', 'Florida',
                                                'Georgia', 'Kentucky', 'Louisiana',
                                                'Mississippi', 'North Carolina',
                                                'Oklahoma', 'South Carolina',
                                                'Tennessee', 'Texas', 'Virginia'])]
speciesSouth = speciesSouth[['spclass']].sort_values('spclass')
print(speciesSouth['spclass'].unique())

['Hardwood' 'Softwood']


In [155]:
# create dictionary for spclass to priceSpecies for the southern states

speciesCrosswalkSouth = {
 'Softwood': 'Pine',
 'Hardwood': 'Oak'
}

# use the crosswalk to create a new column in the biomass data called 'priceSpecies'
# apply this to only the southern states of Alabama, Arkansas, Florida,
# Georgia, Kentucky,Louisiana, Mississippi, North Carolina,
# Oklahoma, South Carolina, Tennessee, Texas, and Virginia

biomass.loc[biomass['state'].isin(['Alabama', 'Arkansas', 'Florida',
                                    'Georgia', 'Kentucky', 'Louisiana',
                                    'Mississippi', 'North Carolina',
                                    'Oklahoma', 'South Carolina',
                                    'Tennessee', 'Texas', 'Virginia']),
            'priceSpecies'] = biomass['spclass'].map(speciesCrosswalkSouth)


Now we need to differentiate which products (and prices) go with which size classes.

- Sawtimber price applies to sizes 13 inches to 40+ inches
- Pulpwood price applies to sizes 5 inches to 12.9 inches
- under 5 inches are given a price less than pulpwood (pre-merchantable)

In [None]:
# in biomass, create a new column for priceRegion that is the two digit state
# fips code followed by the two digit price region code
biomass['priceRegion'] = biomass['fips'].str[:2] + biomass['priceRegion']

# print unique spclass from biomass
print(biomass['priceSpecies'].unique())


# merge biomass with prices on priceRegion and priceSpecies; call it assetTable
# assetTable = pd.merge(
#     biomass,
#     prices,
#     on=['priceRegion', 'priceSpecies'],
#     how='left')
