# Pandas update column conditionally

* [How to Replace Values in Column Based on Condition in Pandas?](https://www.geeksforgeeks.org/how-to-replace-values-in-column-based-on-condition-in-pandas/)


In [1]:
import numpy as np
import pandas as pd

Update the **supplier code** if it is NaN from the **supplier**.

In [2]:
df = pd.read_json("../data/recovery.json")
df

Unnamed: 0,facility,timeStart,processTime,supplier,suppliedM3,recoveredM3,date,timeEnd,supplierCode
0,Bundaberg,9/1/22 8:16 AM,4:05,Mary,5.09,4.13,NaT,,
1,Newcastle,8:29:00 AM,,,2.00,1.55,2022-09-01,9:07:00 AM,har
2,Newcastle,9:27:00 AM,,,6.80,4.15,2022-09-01,11:28:00 AM,dic
3,Newcastle,11:38:00 AM,,,1.95,1.55,2022-09-01,12:21:00 PM,har
4,Bundaberg,9/1/22 12:34 PM,1:50,Mary Therese,3.78,2.56,NaT,,
...,...,...,...,...,...,...,...,...,...
227,Newcastle,11:40:00 AM,,,3.70,2.35,2022-09-30,12:41:00 PM,tom
228,Newcastle,12:52:00 PM,,,6.35,4.55,2022-09-30,2:36:00 PM,dic
229,Bundaberg,9/30/22 1:48 PM,3:40,Mary Therese,4.53,2.73,NaT,,
230,Newcastle,3:02:00 PM,,,2.00,1.45,2022-09-30,3:42:00 PM,har


In [3]:
df[df['supplierCode'].isnull()]

Unnamed: 0,facility,timeStart,processTime,supplier,suppliedM3,recoveredM3,date,timeEnd,supplierCode
0,Bundaberg,9/1/22 8:16 AM,4:05,Mary,5.09,4.13,NaT,,
4,Bundaberg,9/1/22 12:34 PM,1:50,Mary Therese,3.78,2.56,NaT,,
7,Bundaberg,9/1/22 2:48 PM,1:20,Mary Therese,3.55,2.59,NaT,,
9,Bundaberg,9/2/22 8:27 AM,1:25,Mary,5.02,4.13,NaT,,
10,Bundaberg,9/2/22 10:17 AM,4:00,Mary,4.95,3.58,NaT,,
...,...,...,...,...,...,...,...,...,...
218,Bundaberg,9/29/22 10:53 AM,2:00,Mary Anne,2.80,2.31,NaT,,
221,Bundaberg,9/29/22 1:19 PM,3:05,Mary,5.00,3.31,NaT,,
223,Bundaberg,9/30/22 8:21 AM,2:25,Mary,4.94,3.77,NaT,,
226,Bundaberg,9/30/22 11:07 AM,2:10,Mary,5.03,3.47,NaT,,


---
# pandas.Dataframe.apply

* [pandas.DataFrame.apply](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)
* [How to apply a function to two columns of Pandas dataframe](https://stackoverflow.com/a/13337376/4281353)

In [4]:
def get_code_from_supplier(supplier: str):
    supplier_to_code = {
        "mary therese": "mar",
        "mary": "mar"
    }
    return supplier_to_code.get(supplier.lower(), np.nan)
    
    
def f(row):
    # print(f"f(row): type {type(row)} value {row}")
    # print(f"supplier: {row['supplier']} code: {row['supplierCode']}")
    if row['supplier'] not in (np.nan, None):
        return get_code_from_supplier(row['supplier'])
    else:
        return row['supplierCode']

In [5]:
df['supplierCode'] = df.apply(func=f, axis=1)
df

Unnamed: 0,facility,timeStart,processTime,supplier,suppliedM3,recoveredM3,date,timeEnd,supplierCode
0,Bundaberg,9/1/22 8:16 AM,4:05,Mary,5.09,4.13,NaT,,mar
1,Newcastle,8:29:00 AM,,,2.00,1.55,2022-09-01,9:07:00 AM,har
2,Newcastle,9:27:00 AM,,,6.80,4.15,2022-09-01,11:28:00 AM,dic
3,Newcastle,11:38:00 AM,,,1.95,1.55,2022-09-01,12:21:00 PM,har
4,Bundaberg,9/1/22 12:34 PM,1:50,Mary Therese,3.78,2.56,NaT,,mar
...,...,...,...,...,...,...,...,...,...
227,Newcastle,11:40:00 AM,,,3.70,2.35,2022-09-30,12:41:00 PM,tom
228,Newcastle,12:52:00 PM,,,6.35,4.55,2022-09-30,2:36:00 PM,dic
229,Bundaberg,9/30/22 1:48 PM,3:40,Mary Therese,4.53,2.73,NaT,,mar
230,Newcastle,3:02:00 PM,,,2.00,1.45,2022-09-30,3:42:00 PM,har


---
# np.where()

Same with SQL CASE WHEN logic.

* [numpy.where(condition, [x, y, ])](https://numpy.org/doc/stable/reference/generated/numpy.where.html)
* [Conditionally fill column values based on another columns value in pandas](https://stackoverflow.com/a/10726275/4281353)

## Note

Cannot apply function in where as entire rows in series will be fed into it.

In [6]:
df = pd.read_json("../data/recovery.json")

In [7]:
df["supplierCode"] = np.where(
    df["supplier"] == "Mary",               # CASE WHEN supplier is Mary:
    "mar",                                  #   return supplierCode mar
    np.where(                               # ELSE 
        df['supplier'] == "Mary Therese	",  #    WHEN supplier is Mary Therese:
        "mar",                              #        return supplierCode mar
        df['supplierCode']                  #    ELSE: return supplierCode as is
   )
)

In [8]:
df

Unnamed: 0,facility,timeStart,processTime,supplier,suppliedM3,recoveredM3,date,timeEnd,supplierCode
0,Bundaberg,9/1/22 8:16 AM,4:05,Mary,5.09,4.13,NaT,,mar
1,Newcastle,8:29:00 AM,,,2.00,1.55,2022-09-01,9:07:00 AM,har
2,Newcastle,9:27:00 AM,,,6.80,4.15,2022-09-01,11:28:00 AM,dic
3,Newcastle,11:38:00 AM,,,1.95,1.55,2022-09-01,12:21:00 PM,har
4,Bundaberg,9/1/22 12:34 PM,1:50,Mary Therese,3.78,2.56,NaT,,
...,...,...,...,...,...,...,...,...,...
227,Newcastle,11:40:00 AM,,,3.70,2.35,2022-09-30,12:41:00 PM,tom
228,Newcastle,12:52:00 PM,,,6.35,4.55,2022-09-30,2:36:00 PM,dic
229,Bundaberg,9/30/22 1:48 PM,3:40,Mary Therese,4.53,2.73,NaT,,
230,Newcastle,3:02:00 PM,,,2.00,1.45,2022-09-30,3:42:00 PM,har
