# Update dataframe

## Updating copy or original

**DO NOT EVER use dictionary indexing**.

Stick to ```.loc, .iloc``` to identify the dataframe to update to avoid updating the copy.

* [Returning a view versus a copy](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy)

In [3]:
import numpy as np
import pandas as pd

In [53]:
# From a dict of lists
df = pd.DataFrame(
    data={
       'ticker': ['AAPL', 'AAPL', 'MSFT', 'IBM', 'YHOO'],
        'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-30', '2015-12-30'],
        'open': [426.23, 427.81, 42.3, 101.65, 35.53]
    },
)
df

Unnamed: 0,ticker,date,open
0,AAPL,2015-12-30,426.23
1,AAPL,2015-12-31,427.81
2,MSFT,2015-12-30,42.3
3,IBM,2015-12-30,101.65
4,YHOO,2015-12-30,35.53


# Update copy by dictionary indexing

Dictionary indexing creates a copy, hence updating it does not reflect to the original df.

In [50]:
df[df['ticker']  == 'AAPL']['open'] = 0
# No update to the df
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['ticker']  == 'AAPL']['open'] = 0


Unnamed: 0,ticker,date,open
0,AAPL,2015-12-30,426.23
1,AAPL,2015-12-31,427.81
2,MSFT,2015-12-30,42.3
3,IBM,2015-12-30,101.65
4,YHOO,2015-12-30,35.53


# Update original with ```.loc, .iloc```

* [pandas.DataFrame.loc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html?highlight=loc)

> Allowed inputs are:
> ### Scalar indexing
> * A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).  
> 
> ### Vector indexing (1D array)
> * A list or array of labels, e.g. ['a', 'b', 'c'].
> * A slice object with labels, e.g. 'a':'f'.
> * A boolean array of the same length as the axis being sliced, e.g. [True, False, True].
> * An alignable boolean **Series**. The index of the key will be aligned before masking.
> * An alignable Index. The Index of the returned selection will be the input.
> * A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

## Binary array for conditional selection 

The **conditonal** input to ```.loc``` either is **scalar** or **1D array-like**.

```df.loc[:, ['ticker']] == 'AAPL'``` returns Dataframe, which is not scalar nor 1D array-like.

In [57]:
print(f"{type(df.loc[:, ['ticker']] == 'AAPL')}")
df.loc[:, ['ticker']] == 'AAPL'

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,ticker
0,True
1,True
2,False
3,False
4,False


Need to provide ```df.loc[:, 'ticker'] == "AAPL"``` which is Series.

In [51]:
print(f"{type(df.loc[:, 'ticker'] == 'AAPL')}\n")
df.loc[:, 'ticker'] == 'AAPL'

<class 'pandas.core.series.Series'>



0     True
1     True
2    False
3    False
4    False
Name: ticker, dtype: bool

Stick to 1D array like expression for indexing. **DO NOT use dictionary indexing** and avoid scalar indexing.

In [60]:
df.loc[
    df.loc[:, 'ticker'] == 'AAPL',    # Row    indexing as 1D-array-like
    ['open']                          # Column indexing as 1D-array-like
] = 0
df

Unnamed: 0,ticker,date,open
0,AAPL,2015-12-30,0.0
1,AAPL,2015-12-31,0.0
2,MSFT,2015-12-30,42.3
3,IBM,2015-12-30,101.65
4,YHOO,2015-12-30,35.53
