# Forest Inventory and Analysis `10 points`

Source:

* Dataset: https://apps.fs.usda.gov/fia/datamart/datamart.html
* Documentation: https://www.fia.fs.fed.us/library/database-documentation/index.php
        
Description from [Data Is Plural](https://www.data-is-plural.com/archive/2019-08-21-edition/):

> The U.S. Forest Service’s Forest Inventory and Analysis program tracks “trends in forest area and location; in the species, size, and health of trees; in total tree growth, mortality, and removals by harvest; in wood production and utilization rates by various products; and in forest land ownership.” It also “serves as perhaps the largest publicly available” dataset of “downed and dead wood.” The inventory is available to download and comes with user guides.

**Topics:**

* Downloading files
* Opening Excel files
* Using parameters when opening Excel files
* When to do things manually vs doing things with code

## Automatic downloading `2 points`

If you want to download files for Excel, you need to go to [this page](https://apps.fs.usda.gov/fia/datamart/datamart_excel.html) and click on the map. It leads you to a file like `https://apps.fs.usda.gov/fia/datamart/Workbooks/IL.xlsm`. Awful user interface!

Instead, I want you to use `requests` and a `for` loop to download all of the states automatically. You might find [this SO answer](https://stackoverflow.com/questions/44699682/how-to-save-a-file-to-a-specific-directory-in-python) useful.

*Note that the page says they don't have information for every state.*

In [89]:
import pandas as pd
import numpy as np

df1 = pd.read_csv("stateabbr.csv")

abbreveations = df1['Abbr.'].to_numpy()
abbreveations

array(['AK', 'AL', 'AR', 'AS', 'AZ', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL',
       'FM', 'GA', 'GU', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA',
       'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH',
       'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'PR', 'PW',
       'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'VI', 'WA', 'WV',
       'WI', 'WY'], dtype=object)

In [103]:
%%time

import requests

domain = "https://apps.fs.usda.gov/fia/datamart/Workbooks/"

for abbr in abbreveations:
    url = f"{domain}{abbr}.xlsm"
    try:
        r = requests.get(url)
        with open(f"{abbr}.xlsm", 'wb') as f:
            f.write(r.content)
    except requests.ConnectionError:
        print('File not found!')

CPU times: user 1.32 s, sys: 163 ms, total: 1.49 s
Wall time: 1min 35s


## Reading in the data `3 points`

### Read in the data for Virginia

**We're interested in sheet `SR004`**, which explains how many acres cover each type of ownership.

Read the file in so that it the dataset looks like this:

|Forest type group|Total|National Forest|Other federal|State and local|Private|
|---|---|---|---|---|---|
|Total|16025876|1688425.0|518217.0|657963.0|13161271|
|...|...|...|...|...|...|
|Nonstocked|81574|0.0|1590.0|0.0|79984|

and your index goes up to `15`.

In [104]:
df2 = pd.read_excel("VA.xlsm",sheet_name="SR004", skiprows=11, nrows=16, na_values=['-']).fillna(0)
df2

Unnamed: 0,Forest type group,Total,National Forest,Other federal,State and local,Private
0,Total,16025876,1688425.0,518217.0,657963.0,13161271
1,White / red / jack pine group,171292,33764.0,2534.0,0.0,134995
2,Spruce / fir group,7735,0.0,0.0,6188.0,1547
3,Longleaf / slash pine group,10293,0.0,0.0,0.0,10293
4,Loblolly / shortleaf pine group,3038306,63540.0,79536.0,89038.0,2806193
5,Other eastern softwoods group,75076,0.0,0.0,5876.0,69201
6,Exotic softwoods group,4157,0.0,0.0,0.0,4157
7,Oak / pine group,1649711,140950.0,58413.0,53515.0,1396832
8,Oak / hickory group,9755134,1375367.0,314345.0,405917.0,7659506
9,Oak / gum / cypress group,373717,2939.0,40461.0,21746.0,308570


### Read in the data for South Dakota

You'll have fewer rows in this dataset than for Virginia.

In [105]:
df3 = pd.read_excel("SD.xlsm",sheet_name="SR004", skiprows=11, nrows=14,na_values=['-']).fillna(0)
df3

Unnamed: 0,Forest type group,Total,National Forest,Other federal,State and local,Private
0,Total,1908467,995389.0,54186.0,94640.0,764252
1,White / red / jack pine group,6098,0.0,0.0,0.0,6098
2,Spruce / fir group,86176,61067.0,17348.0,0.0,7761
3,Other eastern softwoods group,49053,0.0,6331.0,0.0,42722
4,Pinyon / juniper group,80044,16627.0,5920.0,0.0,57497
5,Ponderosa pine group,1027239,698946.0,4748.0,63442.0,260104
6,Oak / pine group,9775,0.0,0.0,4340.0,5435
7,Oak / hickory group,164851,24441.0,0.0,4435.0,135976
8,Elm / ash / cottonwood group,131272,0.0,0.0,13818.0,117455
9,Maple / beech / birch group,8537,0.0,0.0,0.0,8537


# Calculations `1 point`

## What percent of forested land is a "National Forest" in South Dakota vs Virginia?

You can do this calculation manually. Pay special attention to column names.

In [106]:
#Virginia
(df2['National Forest'][0]/df2['Total'][0]*100).round(1)

10.5

In [107]:
#South Dakota
(df3['National Forest'][0]/df3['Total'][0]*100).round(1)

52.2

## What percent of forested land is privately owned in SD vs VA?

In [108]:
#Virginia
(df2['Private'][0]/df2['Total'][0]*100).round(1)

82.1

In [109]:
#South Dakota
(df3['Private'][0]/df3['Total'][0]*100).round(1)

40.0

## Do the calculation for private ownership of all forests in South Dakota using only one line, and without typing the actual numbers `1 point`

Tip: `df.loc[0]` will be your friend

In [110]:
for p in range (1,13):
    print(df3['Forest type group'][p])
    print((df3['Private'][p]/df3['Total'][p]*100).round(1))
    print("--------")

White / red / jack pine group
100.0
--------
Spruce / fir group
9.0
--------
Other eastern softwoods group
87.1
--------
Pinyon / juniper group
71.8
--------
Ponderosa pine group
25.3
--------
Oak / pine group
55.6
--------
Oak / hickory group
82.5
--------
Elm / ash / cottonwood group
89.5
--------
Maple / beech / birch group
100.0
--------
Aspen / birch group
6.0
--------
Other hardwoods group
37.6
--------
Exotic hardwoods group
100.0
--------


## Using the files you downloaded, calculate the private ownership rate for all forested land in each state `3 points`

> Tip: Use a for loop

In [111]:
for abbr in abbreveations:
    print(abbr)
    try:
        df4 = pd.read_excel(f"{abbr}.xlsm",sheet_name="SR004", skiprows=11, nrows=1,na_values=['-']).fillna(0)
        print((df4['Private'][0]/df4['Total'][0]*100).round(1))
    except:
        print('Transcription Error')
    finally:
        print("--------")

AK
14.7
--------
AL
93.2
--------
AR
80.3
--------
AS
14.3
--------
AZ
38.5
--------
CA
39.5
--------
CO
23.4
--------
CT
71.3
--------
DE
78.8
--------
DC
Transcription Error
--------
FL
63.0
--------
FM
48.8
--------
GA
88.9
--------
GU
39.9
--------
HI
39.5
--------
ID
13.5
--------
IL
82.7
--------
IN
83.2
--------
IA
85.2
--------
KS
92.8
--------
KY
88.4
--------
LA
86.3
--------
ME
92.0
--------
MD
73.0
--------
MA
64.1
--------
MI
61.5
--------
MN
44.8
--------
MS
88.6
--------
MO
81.8
--------
MT
25.8
--------
NE
88.2
--------
NV
3.5
--------
NH
72.6
--------
NJ
46.3
--------
NM
43.7
--------
NY
73.5
--------
NC
82.6
--------
ND
71.8
--------
OH
84.6
--------
OK
87.6
--------
OR
35.8
--------
PA
69.2
--------
PR
83.0
--------
PW
9.4
--------
RI
69.9
--------
SC
87.0
--------
SD
40.0
--------
TN
83.2
--------
TX
93.4
--------
UT
14.7
--------
VT
78.8
--------
VA
82.1
--------
VI
74.9
--------
WA
41.9
--------
WV
86.5
--------
WI
69.7
--------
WY
14.0
--------
