# Forest Soil Characteristics Dataset

About this dataset: From Kaggle (https://www.kaggle.com/datasets/shubhamgupta012/forest-soil-characteristics-dataset)

This dataset provides a comprehensive collection of forest soil characteristics, including various parameters and measurements. It contains data from different sites, each identified by a unique site code. The dataset includes information such as HouseID, REP# (replicate number), CoreID (core identification), Depth (depth range), LU_Current (current land use), LU_Previous (previous land use), Yr_Built (year built), Lawn Age, CoarseVeg (coarse vegetation), StructDen (structural density), BD (bulk density), N_Perc (nitrogen percentage), C_Perc (carbon percentage), C_N (carbon to nitrogen ratio), N_gm2 (nitrogen content in grams per square meter), C_gm2 (carbon content in grams per square meter), Sand_Perc (percentage of sand), Clay_Perc (percentage of clay), Silt_Perc (percentage of silt), MB Carbon (microbial biomass carbon), Respiration, Initial NO3 (+NO2) (initial nitrate and nitrite content), Initial NH4 (initial ammonium content), MBN (microbial biomass nitrogen), Net N Min (net nitrogen mineralization), and Net Nitr (net nitrification).

This dataset serves as a valuable resource for researchers, ecologists, and environmental scientists interested in studying forest soil characteristics and their impact on ecosystem dynamics. It can be used for various purposes, such as analyzing nutrient cycling, evaluating soil quality, and understanding the effects of land use changes on soil properties. The dataset provides a rich source of information that can contribute to a better understanding of forest ecosystems and support evidence-based decision-making in forestry and land management practices.

Columns:
- HouseID
- REP# (replicate number)
- CoreID (core identification)
- Depth (depth range)
- LU_Current (current land use)
- LU_Previous (previous land use)
- Yr_Built (year built)
- Lawn Age
- CoarseVeg (coarse vegetation)
- StructDen (structural density)
- BD (bulk density)
- N_Perc (nitrogen percentage)
- C_Perc (carbon percentage)
- C_N (carbon to nitrogen ratio)
- N_gm2 (nitrogen content in grams per square meter)
- C_gm2 (carbon content in grams per square meter)
- Sand_Perc (percentage of sand)
- Clay_Perc (percentage of clay)
- Silt_Perc (percentage of silt)
- MB Carbon (microbial biomass carbon)
- Respiration
- Initial NO3 (+NO2) (initial nitrate and nitrite content)
- Initial NH4 (initial ammonium content)
- MBN (microbial biomass nitrogen)
- Net N Min (net nitrogen mineralization)
- Net Nitr (net nitrification)

Sources: 
- https://www.soilquality.org.au/factsheets/microbial-biomass-carbon-nsw

In [1]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns

In [12]:
soil_df = pd.read_csv("datasets\Forest Soil Characteristics.csv")
pd.set_option("display.max_columns", 999)

In [37]:
soil_df.tail()

Unnamed: 0,Site,HouseID,REP#,CoreID,Depth,LU_Current,LU_Previous,Yr_Built,Lawn Age,CoarseVeg,StructDen,BD,N_Perc,C_Perc,C_N,N_gm2,C_gm2,Sand_Perc,Clay_Perc,Silt_Perc,MB Carbon,Respiration,Initial NO3 (+NO2),Initial NH4,MBN,Net N Min,Net Nitr
323,House30,30.0,2,30.2,10to30,Residential,Forest,1952.0,55.0,4+,1,1.245,0.071,1.967,27.698,176.83,4897.833,28.375,28.75,42.875,155.68,4.005,8.548,1.002,13.376,0.306,0.374
324,House30,30.0,1,30.1,30to70,Residential,Forest,1952.0,55.0,4+,1,1.297,0.04,0.532,13.465,204.868,2758.625,37.719,25.013,37.269,88.59,3.17,1.142,0.571,1.641,-0.029,-0.001
325,House30,30.0,2,30.2,30to70,Residential,Forest,1952.0,55.0,4+,1,1.42,0.035,0.427,12.067,201.191,2427.722,33.875,29.5,36.625,113.83,2.449,1.64,0.264,2.428,0.017,0.017
326,House30,30.0,1,30.1,70to100,Residential,Forest,1952.0,55.0,4+,1,1.662,0.047,0.579,12.201,236.756,2888.764,32.324,20.766,46.91,149.03,3.608,1.084,0.568,2.464,0.005,0.033
327,House30,30.0,2,30.2,70to100,Residential,Forest,1952.0,55.0,4+,1,0.935,0.049,0.585,11.872,138.267,1641.546,33.342,21.261,45.398,94.6,2.716,1.356,0.261,1.176,0.074,0.074


In [15]:
soil_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 328 entries, 0 to 327
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Site                328 non-null    object 
 1   HouseID             264 non-null    float64
 2   REP#                328 non-null    int64  
 3   CoreID              328 non-null    object 
 4   Depth               328 non-null    object 
 5   LU_Current          328 non-null    object 
 6   LU_Previous         264 non-null    object 
 7   Yr_Built            264 non-null    float64
 8   Lawn Age            264 non-null    float64
 9   CoarseVeg           264 non-null    object 
 10  StructDen           264 non-null    object 
 11  BD                  322 non-null    float64
 12  N_Perc              320 non-null    float64
 13  C_Perc              320 non-null    float64
 14  C_N                 291 non-null    float64
 15  N_gm2               321 non-null    float64
 16  C_gm2   

In [21]:
soil_df.isnull().mean()*100

Site                   0.000000
HouseID               19.512195
REP#                   0.000000
CoreID                 0.000000
Depth                  0.000000
LU_Current             0.000000
LU_Previous           19.512195
Yr_Built              19.512195
Lawn Age              19.512195
CoarseVeg             19.512195
StructDen             19.512195
BD                     1.829268
N_Perc                 2.439024
C_Perc                 2.439024
C_N                   11.280488
N_gm2                  2.134146
C_gm2                  2.134146
Sand_Perc             14.024390
Clay_Perc             14.024390
Silt_Perc             14.024390
MB Carbon              2.439024
Respiration            2.439024
Initial NO3 (+NO2)     2.439024
Initial NH4            2.439024
MBN                    2.743902
Net N Min              2.743902
Net Nitr               2.743902
dtype: float64

In [23]:
soil_df.shape

(328, 27)

In [31]:
print(f"Previous Land Use: {soil_df['LU_Previous'].unique()}")
print(f"Current Land Use: {soil_df['LU_Current'].unique()}")

Previous Land Use: [nan 'Ag' 'Forest']
Current Land Use: ['Forest' 'Residential']


In [24]:
soil_df.describe()

Unnamed: 0,HouseID,REP#,Yr_Built,Lawn Age,BD,N_Perc,C_Perc,C_N,N_gm2,C_gm2,Sand_Perc,Clay_Perc,Silt_Perc,Respiration,Initial NO3 (+NO2),Initial NH4,MBN,Net N Min,Net Nitr
count,264.0,328.0,264.0,264.0,322.0,320.0,320.0,291.0,321.0,321.0,282.0,282.0,282.0,320.0,320.0,320.0,319.0,319.0,319.0
mean,16.621212,1.615854,1972.181818,34.818182,1.102003,0.078947,1.059263,12.112962,111.340785,1404.591776,43.04733,23.550691,33.403415,5.606775,5.033394,2.6695,28.089273,0.240232,0.204881
std,9.139003,0.604766,18.069498,18.069498,0.211488,0.078247,1.160559,4.780619,91.622289,1422.834436,15.995809,9.994355,10.839772,4.729636,8.04006,4.392406,38.545303,0.709831,0.474282
min,1.0,1.0,1920.0,5.0,0.493,0.0,0.0,4.844,0.0,0.0,7.125,1.251,5.853,0.925,0.227,0.201,0.488,-0.592,-0.575
25%,9.0,1.0,1963.0,18.0,0.95025,0.021,0.192,9.1865,0.0,282.192,32.341,17.248,24.82175,2.82375,0.26575,0.58675,1.8635,-0.0215,0.0
50%,17.0,2.0,1968.0,39.0,1.1345,0.0455,0.5225,11.432,116.667,1171.99,41.522,22.642,34.155,3.6875,1.3795,1.083,8.474,0.043,0.026
75%,24.0,2.0,1989.0,44.0,1.253,0.128,1.76575,13.287,176.739,2160.853,52.40925,30.56225,40.5825,5.84625,6.8695,2.592,38.4465,0.301,0.2015
max,32.0,3.0,2002.0,87.0,1.718,0.333,4.919,41.996,508.428,11306.278,89.489,54.041,66.1,30.72,59.75,34.928,174.977,9.419,4.907


- Year between 1920 and 2002, mean 1972
  - Lawn Age between 5 and 87 years (I'm assuming this is years)
- Nitrification is where most nitrogen is leached into the soil and can contaminate ground water
- Mineralization is the process by which organic nitrogen is converted to plant available inorganic forms. It is regarded as a potential indicator to comprehend the soil's response to biological change
- Previous Land Use: Agriculture, Forest, or missing
- Current Land Use: Forest, Residental

In [36]:
soil_df["Yr_Built"].unique()

array([  nan, 1989., 1980., 1978., 2002., 1963., 1998., 1962., 1920.,
       1965., 1953., 1990., 1994., 1997., 2001., 1969., 1987., 1960.,
       1959., 1964., 1967., 1948., 1968., 1952.])