# *Field* Operations

- [**Getting Rid of Non-Numeric Field Junk: Dollar Signs, commas, etc...**](#Getting-Rid-of-Non-Numeric-Field-Junk)    
   

- [**Changing Data Type**](#Changing-Data-Type)  
    - string to int  
    - string to float  
    - string to date  
   
   
- [Stripping Extra Spaces](#Stripping-Extra-Spaces)  


- [**Splitting Columns**](#Splitting-Columns)  


In [None]:
import pandas as pd
import os
import pickle 

# Getting Rid of Non-Numeric Field Junk

In [None]:
# Hardcode a dataframe
df_Shoes = pd.DataFrame(
    [
    [1,'2017-01-01','Boots','7 (US)', '$25,000'],
    [2,'2017-01-01','Boots','4 (US)', '$35,000'],
    [3,'2017-01-01','Boots','6 (US)', '$21,000'],
    ],
    columns=['OrderID','Order Date','Product','Size','Price'])

df_Shoes

In [None]:
# Get rid of non-numeric junk in the Price field
df_Shoes['Price'] = df_Shoes['Price'].str.replace('$', '')
df_Shoes['Price'] = df_Shoes['Price'].str.replace(',', '')

df_Shoes['Price'].head()

In [None]:
# Get rid of non-numeric junk in the Size field
df_Shoes['Size'] = df_Shoes['Size'].str.replace('(', '')
df_Shoes['Size'] = df_Shoes['Size'].str.replace(')', '')
df_Shoes['Size'] = df_Shoes['Size'].str.replace('US', '')

df_Shoes['Size'].head()

# Changing Data Type

In [None]:
# Check data types
# Note the data type for Price and Size are objects, which means pandas thinks they are strings
df_Shoes.dtypes

## Changing a string to an int

In [None]:
# Change the data type of Size from string to int
df_Shoes['Size'] = pd.to_numeric(df_Shoes['Size']).astype(int)

df_Shoes.dtypes

## Changing a string to a float

In [None]:
# Change the data type of Price to float
# Convert it to numeric and then specify float 
df_Shoes['Price'] = pd.to_numeric(df_Shoes['Price']).astype(float)
df_Shoes.dtypes

## string to date

In [None]:
# Change Order Date from string to date
df_Shoes['Order Date'] = pd.to_datetime(df_Shoes['Order Date'])

df_Shoes.dtypes

In [None]:
df_Shoes.head()

# Stripping Extra Spaces

In [None]:
# Hardcode a dataframe 
df_employees = pd.DataFrame(
    [
    [1699,'  Robinett, David  ','  david22@adventure-works.com  ', '(827) 525-0100', '06-05-2010', '$80,950'], 
    [1700,'  Robinson, Rebecca  ','  rebecca5 @adventure-works.com  ', '(829) 525-0101', '05-01-2015', '$70,950'],
    [1701,'     Robinson, Dorothy    ','  dorothy3@adventure-works.com    ', '(828) 555-0102', '03-01-2017', '$50,00'],
    ], 
    columns=['BusinessEntityID', 'EmployeeName','EmailAddress', 'PhoneNumber', 'StartDate', 'CurrentSalary'])
  
df_employees

In [None]:
# Show the 'before'
df_employees['EmployeeName']

In [None]:
# Strip extra spaces from Product column
df_employees['EmployeeName'] = df_employees['EmployeeName'].str.strip()

In [None]:
# Show the 'after'
df_employees['EmployeeName']

# Splitting Columns  
The steps involved are:  
1. Split the column.  The two new columns will be in a new temporary dataframe
2. Give names to the two new columns  
3. Create a new Dataframe by concatenating the new temp dataframe to the existing dataframe - **Vertically**!  
  1. This means that instead of adding it 'below' the existing dataframe, we add it to the right side of the existing dataframe!  
  2. So the number of rows doesn't/shouldn't change, but the number of columns will be two greater than the existing dataframe.

In [None]:
# Display the column names
df_employees.columns

### 1. Split the CustomerName column into a temp dataframe

In [None]:
# Note: The 1 causes to split only on the first occurence
df_temp = df_employees['EmployeeName'].str.split(',', 1, expand=True) 
df_temp.head()

### 2. Give names to the two columns in the temp dataframe

In [None]:
df_temp.columns = ['LastName', 'FirstName']
df_temp.head()

### 3. Concatenate the new dataframe with the existing dataframe

In [None]:
df_employees2 = pd.concat([df_employees, df_temp], axis='columns')
df_employees2.head()