# **Pandas Testing**

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('stock_exchange_data/indexData.csv')

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.Index

In [None]:
df['Index']

In [None]:
df['Index'][0]

## iloc

- Index-based Selection

    Both loc and iloc are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.

    This means that it's marginally easier to retrieve rows, and marginally harder to get retrieve columns. 

In [None]:
df.iloc[0] # retrieve the row at index 0

In [None]:
df.iloc[:, 1] # retrieve all rows (all indexes), at column 1

In [None]:
df.iloc[:3, 1] # retrieve first 3 rows, at column 1

In [None]:
df.iloc[[1, 2, 3, 4], 1] # retrieve all rows in the 

In [None]:
df.tail()

In [None]:
df.iloc[-5:] # retrieve last 5 indexes, equivalent to tail()

## loc

- Label-based selection

    The second paradigm for attribute selection is the one followed by the loc operator: label-based selection. In this paradigm, it's the data index value, not its position, which matters.

In [None]:
df.loc[:5, 'Date'] # retrieve first 5 rows with labels from 0-4 and return only the Date

In [None]:
df.loc[-5:, ['Date', 'Open', 'Volume']] # retrieve every index, at each index retrieve date, open, and volume

## Choosing between loc and iloc

    When choosing or transitioning between loc and iloc, there is one "gotcha" worth keeping in mind, which is that the two methods use slightly different indexing schemes.

    iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9. loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10.

    Why the change? Remember that loc can index any stdlib type: strings, for example. If we have a DataFrame with index values Apples, ..., Potatoes, ..., and we want to select "all the alphabetical fruit choices between Apples and Potatoes", then it's a lot more convenient to index df.loc['Apples':'Potatoes'] than it is to index something like df.loc['Apples', 'Potatoet'] (t coming after s in the alphabet).

    This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000. In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for df.loc[0:999].

    Otherwise, the semantics of using loc are the same as those for iloc.

In [None]:
sf = pd.DataFrame({'Stock':['Apple', 'Google', 'OpenAI', 'Celsius', 'TIKTAK', 'META', 'HNG', 'XKJ'], 'Price':[100, 320, 482, 883, 432, 399, 732, None], 'Location':['USA', 'CAN', 'CHIL', 'UAE', 'KOR', 'SEB', 'CHI', 'JAP'], 'ETF':['Y', 'Y', 'N', 'N', 'Y', 'Y', 'Y', 'N'], 'Valid': [None, None, None, None, None, None, None, None]})

In [None]:
sf.set_index('Stock', inplace=True)

In [None]:
sf.head()

In [None]:
sf.loc['Google':'TIKTAK', ['Location', 'ETF']]

## Conditional Selection

In [None]:
sf.Price > 400

In [None]:
sf.loc[sf['Price'] > 400, :]

In [None]:
sf.loc[(sf['Price'] > 300) & (sf['ETF'] == 'Y'), ['Location', 'Price', 'ETF']]

In [None]:
sf['Price'] > 399

In [None]:
sf[sf['Price'] > 399]

In [None]:
sf.loc[sf['Location'].isin(['CHI', 'JAP', 'USA'])]

In [None]:
sf.loc[sf.Price.notnull()]

## Assign

In [None]:
sf['Valid'] = 67

In [None]:
print(sf.to_string())

In [None]:
sf.iloc[2:7, 0] # this is a series, return only price series

In [None]:
sf.iloc[2:7, :1] # dataframe, slices return a dataframe

In [None]:
sf.loc['OpenAI':'HNG', 'Valid'] = 9999

In [None]:
print(sf.to_string())