# Replace, Fill, Error
Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.

You can use the methods in this notebook to change values in your dataset.

* replace - use this method to replace a value with another value. You can also use this to replace null with a value, or a value with null
* error - use this method to replace a value with an error.
* fill_nulls - this method lets you fill all nulls in a column with a certain value.
* fill_errors - this method lets you fill all errors in a column with a certain value.

## Setup

In [1]:
!pip install azureml



In [2]:
import azureml.dataprep as dprep

In [3]:
dataflow = dprep.read_csv(path='https://dpreptestfiles.blob.core.windows.net/testfiles/BostonWeather.csv')
dataflow.head(1)

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,1/1/2015 0:54,FM-15,22,50,10


In [4]:
dataflow = dataflow.to_datetime('DATE', ['%m/%d/%Y %H:%M'])
dataflow = dataflow.to_number(['HOURLYDRYBULBTEMPF', 'HOURLYRelativeHumidity', 'HOURLYWindSpeed'])
dataflow.head(1)

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-15,22.0,50.0,10.0


## Replace

### String
In this example, we replace a string value with another string value.

In [5]:
dataflow = dataflow.replace('REPORTTPYE', 'FM-15', 'FM-XX')
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50.0,10.0
1,2015-01-01 01:00:00,FM-12,22.0,50.0,10.0
2,2015-01-01 01:54:00,FM-XX,22.0,50.0,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50.0,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,46.0,13.0
5,2015-01-01 04:00:00,FM-12,24.0,46.0,13.0
6,2015-01-01 04:54:00,FM-XX,22.0,52.0,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48.0,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50.0,14.0
9,2015-01-01 07:00:00,FM-12,23.0,50.0,14.0


In this example, we use replace to remove a certain string value from the column, replacing it with null. Note that Pandas shows null values as None.

In [6]:
dataflow = dataflow.replace('REPORTTPYE', 'FM-12', None)
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50.0,10.0
1,2015-01-01 01:00:00,,22.0,50.0,10.0
2,2015-01-01 01:54:00,FM-XX,22.0,50.0,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50.0,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,46.0,13.0
5,2015-01-01 04:00:00,,24.0,46.0,13.0
6,2015-01-01 04:54:00,FM-XX,22.0,52.0,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48.0,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50.0,14.0
9,2015-01-01 07:00:00,,23.0,50.0,14.0


### Numeric
In this example, we replace a numeric value with another numeric value.

In [7]:
dataflow = dataflow.replace('HOURLYRelativeHumidity', 52, 999)
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50.0,10.0
1,2015-01-01 01:00:00,,22.0,50.0,10.0
2,2015-01-01 01:54:00,FM-XX,22.0,50.0,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50.0,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,46.0,13.0
5,2015-01-01 04:00:00,,24.0,46.0,13.0
6,2015-01-01 04:54:00,FM-XX,22.0,999.0,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48.0,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50.0,14.0
9,2015-01-01 07:00:00,,23.0,50.0,14.0


### Date
In this final example, we use replace to swap in a new Date for an existing Date in the data.

In [8]:
from datetime import datetime, timezone
dataflow = dataflow.replace('DATE', 
                 datetime(2015, 1, 1, 1, 54, tzinfo=timezone.utc), 
                 datetime(2018, 7, 4, 0, 0, tzinfo=timezone.utc))
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50.0,10.0
1,2015-01-01 01:00:00,,22.0,50.0,10.0
2,2018-07-04 00:00:00,FM-XX,22.0,50.0,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50.0,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,46.0,13.0
5,2015-01-01 04:00:00,,24.0,46.0,13.0
6,2015-01-01 04:54:00,FM-XX,22.0,999.0,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48.0,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50.0,14.0
9,2015-01-01 07:00:00,,23.0,50.0,14.0


## Error

The error function lets you create Error values. We pass this function the value that we want to find, along with the Error code to use in any Errors created.

In [9]:
dataflow = dataflow.error('HOURLYRelativeHumidity', 46, 'Invalid value')
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50,10.0
1,2015-01-01 01:00:00,,22.0,50,10.0
2,2018-07-04 00:00:00,FM-XX,22.0,50,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,"azureml.dataprep.native.DataPrepError(""'Invali...",13.0
5,2015-01-01 04:00:00,,24.0,"azureml.dataprep.native.DataPrepError(""'Invali...",13.0
6,2015-01-01 04:54:00,FM-XX,22.0,999,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50,14.0
9,2015-01-01 07:00:00,,23.0,50,14.0


## Fill Nulls

Use the `fill_nulls` method to replace all null values in columns with another value. This is similar to Panda's fillna() method.

In [10]:
dataflow = dataflow.fill_nulls('REPORTTPYE', 'N/A')
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50,10.0
1,2015-01-01 01:00:00,,22.0,50,10.0
2,2018-07-04 00:00:00,FM-XX,22.0,50,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,"azureml.dataprep.native.DataPrepError(""'Invali...",13.0
5,2015-01-01 04:00:00,,24.0,"azureml.dataprep.native.DataPrepError(""'Invali...",13.0
6,2015-01-01 04:54:00,FM-XX,22.0,999,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50,14.0
9,2015-01-01 07:00:00,,23.0,50,14.0


## Fill Errors

Use the `fill_errors` method to replace all error values in columns with another value.

In [11]:
dataflow = dataflow.fill_errors('HOURLYRelativeHumidity', -1)
h = dataflow.head(10)
h

Unnamed: 0,DATE,REPORTTPYE,HOURLYDRYBULBTEMPF,HOURLYRelativeHumidity,HOURLYWindSpeed
0,2015-01-01 00:54:00,FM-XX,22.0,50.0,10.0
1,2015-01-01 01:00:00,,22.0,50.0,10.0
2,2018-07-04 00:00:00,FM-XX,22.0,50.0,10.0
3,2015-01-01 02:54:00,FM-XX,22.0,50.0,11.0
4,2015-01-01 03:54:00,FM-XX,24.0,-1.0,13.0
5,2015-01-01 04:00:00,,24.0,-1.0,13.0
6,2015-01-01 04:54:00,FM-XX,22.0,999.0,15.0
7,2015-01-01 05:54:00,FM-XX,23.0,48.0,17.0
8,2015-01-01 06:54:00,FM-XX,23.0,50.0,14.0
9,2015-01-01 07:00:00,,23.0,50.0,14.0
