# Wrangling Soil Test data from University of Kentucky's Soil Lab

Use Microsoft Access to export data into CSV text file with FIPS code add and quary to select just County by County name. Export as soildata_fips.txt.

#### import python libraries

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

#### set file path to get data to work on

In [5]:
filePath = Path('data')
file_soil = filePath.joinpath('soildata_fips.txt')

#### Read data into pandas

In [7]:
soil = pd.read_csv(file_soil, dtype='str')

#### Check that file is read into memory

In [8]:
soil.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1190126 entries, 0 to 1190125
Data columns (total 14 columns):
 #   Column   Non-Null Count    Dtype 
---  ------   --------------    ----- 
 0   FIPS_NO  1190126 non-null  object
 1   YEAR     1190126 non-null  object
 2   FM       1190052 non-null  object
 3   COUNTY   1190126 non-null  object
 4   AREA     1190126 non-null  object
 5   PH       1187607 non-null  object
 6   BUPH     1056246 non-null  object
 7   P        1187473 non-null  object
 8   K        1187494 non-null  object
 9   CA       969266 non-null   object
 10  MG       969725 non-null   object
 11  ZN       967041 non-null   object
 12  ACRES    525128 non-null   object
 13  CROP     1183431 non-null  object
dtypes: object(14)
memory usage: 127.1+ MB


In [17]:
soil.tail()

Unnamed: 0,FIPS_NO,YEAR,FM,COUNTY,AREA,PH,BUPH,P,K,CA,MG,ZN,ACRES,CROP
1190121,239.0,2019.0,A,WOODFORD,Bluegrass,5.0,6.3,62.0,319.0,1489.0,223.0,3.5,1.0,Wildlife Food Plot
1190122,239.0,2019.0,A,WOODFORD,Bluegrass,5.9,6.7,46.0,257.0,5247.0,268.0,2.1,2.0,Wildlife Food Plot
1190123,239.0,2019.0,A,WOODFORD,Bluegrass,6.8,7.0,75.0,243.0,12047.0,281.0,1.2,2.0,Wildlife Food Plot
1190124,239.0,2019.0,A,WOODFORD,Bluegrass,5.3,6.6,60.0,407.0,3304.0,396.0,2.8,,Wildlife Food Plot
1190125,239.0,2019.0,A,WOODFORD,Bluegrass,5.0,6.3,59.0,377.0,4341.0,349.0,2.0,1.5,Wildlife Food Plot


#### Need to convert FIPS_NO and Year to an Integer. Convert PH, BUPH, P, K, and Acres into Float type.

In [19]:
df = soil.copy()

In [29]:
df.FIPS_NO = df.FIPS_NO.astype('float')
df.YEAR = df.YEAR.astype('float')
df.PH = df.PH.astype('float')
df.BUPH = df.BUPH.astype('float')
df.P = df.P.astype('float')
df.K = df.K.astype('float')
df.ACRES = df.ACRES.astype('float')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1190126 entries, 0 to 1190125
Data columns (total 14 columns):
 #   Column   Non-Null Count    Dtype  
---  ------   --------------    -----  
 0   FIPS_NO  1190126 non-null  float64
 1   YEAR     1190126 non-null  float64
 2   FM       1190052 non-null  object 
 3   COUNTY   1190126 non-null  object 
 4   AREA     1190126 non-null  object 
 5   PH       1187607 non-null  float64
 6   BUPH     1056246 non-null  float64
 7   P        1187473 non-null  float64
 8   K        1187494 non-null  float64
 9   CA       969266 non-null   object 
 10  MG       969725 non-null   object 
 11  ZN       967041 non-null   object 
 12  ACRES    525128 non-null   float64
 13  CROP     1183431 non-null  object 
dtypes: float64(7), object(7)
memory usage: 127.1+ MB


#### First need to convert FIPS_NO and YEAR into Float type before they can be converted into int32.

In [36]:
df.FIPS_NO = df.FIPS_NO.astype('int32')
df.YEAR = df.YEAR.astype('int32')

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1190126 entries, 0 to 1190125
Data columns (total 14 columns):
 #   Column   Non-Null Count    Dtype  
---  ------   --------------    -----  
 0   FIPS_NO  1190126 non-null  int32  
 1   YEAR     1190126 non-null  int32  
 2   FM       1190052 non-null  object 
 3   COUNTY   1190126 non-null  object 
 4   AREA     1190126 non-null  object 
 5   PH       1187607 non-null  float64
 6   BUPH     1056246 non-null  float64
 7   P        1187473 non-null  float64
 8   K        1187494 non-null  float64
 9   CA       969266 non-null   object 
 10  MG       969725 non-null   object 
 11  ZN       967041 non-null   object 
 12  ACRES    525128 non-null   float64
 13  CROP     1183431 non-null  object 
dtypes: float64(5), int32(2), object(7)
memory usage: 118.0+ MB


#### Drop CA, MG, ZN