# Pandas fundamentals

Use `SHIFT+TAB` to see help on functions here.

## Filtering and ordering

Filtering, or conditional selection, produces a Series of True/False booleans based on the specified condition of each record.

Examples:

`df[df['column_name'] <= 100]`

`df.loc[df.column_name <= 100]`


#### Combining conditions using AND / OR logic

We can use the ampersand `(&)` to bring the two conditions together and apply `'AND'` logic, or use `(|)` to apply `'OR'` logic.

Example:

`df.loc[(df.year>2016) & (df.brand == 'Honda')]`

`df.loc[(df.year>2016) | (df.brand == 'Honda')]`


#### Using `isin` to check if the data is in the list of values

We can use `isin`

`df.loc[df.brand.isin(['Honda','Nissan','Toyota'])]`


#### Using `isnull()` and `notnull()` to find empty and non-empty data

`df.loc[df.price.isnull()]`

`df.loc[df.price.notnull()]`


In [None]:
# FILTERING

# Filter by column value
df[df['column_name'] <= 100]

# Filter by checking if column values are in the list
specific_countries = ['Bangladesh', 'Brazil']
df[df['Country'].isin(specific_countries)]

# Filter by checking if column values contains string fragment
df[df['Country'].str.contains('United')]

# Set the DataFrame index using existing column
df2 = df.set_index('Country')

## Filtering by column names. axis = 1 - indicates we search in column names
df2.filter(items=['Continent','CCA3'], axis = 1)

## Filter by column names using 'like'
df2.filter(like = 'Pop', axis = 1) # lists all columns which names contain 'Pop'

## Filtering by row names (index). axis = 0 - indicates we search in row names
df2.filter(items=['Zimbabwe'], axis = 0)

## Filter by row names using 'like'
df2.filter(like = 'United', axis = 0)

## Filter by row name
df2.loc['Zambia']

## Filter by row position (integer)
df2.iloc[51]


# SORTING / ORDERING

# Sorting by column descending
df[df['Rank'] < 10].sort_values(by='Rank', ascending=False)

# Sorting by a few columns ascending
df[df['Rank'] < 10].sort_values(by=['Continent','Country'], ascending=True)

# Sorting by a few columns applying different ascending option to each sorted column
df[df['Rank'] < 10].sort_values(by=['Continent','Country'], ascending=[True,False])

## Indexing

Index is an object that stores the access labels for all Pandas objects.

In [None]:
# specifying custom index

# Set the DataFrame index using existing column
df = df.set_index('Country')

## Set the DataFrame index using existing column without saving df to a variable using 'inplace = True'
## if use w/o inplace=True, changes won't save
df.set_index('Country', inplace = True)

# Alternative way to set custom index while reading file
df = pd.read_csv("world_population.csv", index_col = "Country")

# Importing dataset to pandas dataframe with Date column, making index to be the Date column, and parse each index value as Date
pd.read_csv("path_to_dataset_file", index_col="Date", parse_dates=True)

# Reset the index. inplace=True means modifying existing dataframe instead of creating a new one
df.reset_index(inplace=True)

# Setting multi-index. In the example below, two columns 'Continent' and 'Country' will be the index
df.set_index(['Continent', 'Country'], inplace=True)

# Sort index
df.sort_index()
df.sort_index(ascending=False) # in descending order
df.sort_index(ascending=[False, True]) # specify different sorting order for different indexes

# Accessing elements using loc and iloc in multi-indexed dataframe
df.loc['Africa', 'Angola'] # searching for 'Africa' as a continent and 'Angola' as a country
df.iloc[0] # even in case of multi-indexed dataframe will be lookup rows using initial integer-based index
