<a href="https://colab.research.google.com/github/joseeden/joeden/blob/master/docs/021-Software-Engineering/025-Jupyter-Notebooks/001-Using-Pandas/003-pandas_methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Methods

Consider the BRICS dataframe below. It is populated with the data from the dictionary.

In [None]:
import pandas as pd

data = {
    "country": ["Brazil", "Russia", "India", "China", "South Africa"],
    "capital": ["Brasília", "Moscow", "New Delhi", "Beijing", "Pretoria"],
    "area": [8.5, 17.1, 3.3, 9.6, 1.2],
    "population": [211, 144, 1380, 1393, 58]
}

brics = pd.DataFrame(data)
brics.index = ["BR", "RU", "IN", "CH", "SA"]
print(brics)

To manipulate DataFrames, you can use some common methods and attributes:

Displays the first few rows of the DataFrame using `head()`: 

In [None]:
print(brics.head())

Summarize column names, data types, and missing values:

In [None]:
print(brics.info())

Return the number of rows and columns as a tuple:  

In [None]:
print(brics.shape)

Compute summary statistics for numerical columns:  

In [None]:
print(brics.describe())

Retrieve column names and row labels: 

In [None]:
print(brics.columns)
print(brics.index)

Access the data as a 2D NumPy array:  

In [None]:
print(brics.values)

# Sorting 

**Sorting by a single column**  
Sort rows by values in a column. For example, sort by `area`:  

In [None]:
print(brics.sort_values("area"))

**Sorting in descending order**  
Set `ascending=False` to reverse the order:  

In [None]:
print(brics.sort_values("area", ascending=False))

**Sorting by multiple columns**  
Pass a list of column names to `sort_values`. For example, sort by `population` and then `area`:  

In [None]:
print(brics.sort_values(["population", "area"]))

**Sorting in different directions**  
Use `ascending` with a list for each column:  

In [None]:
print(brics.sort_values(["population", "area"], ascending=[True, False]))

# Subsetting  

**Selecting one column**  
Use square brackets with the column name:  

In [None]:
print(brics["country"])

**Selecting multiple columns**  
Use double square brackets with a list of column names:  

In [None]:
print(brics[["country", "population"]])

**Filtering rows based on a condition**  
Subset rows where `population` is greater than 200:  

In [None]:
print(brics[brics["population"] > 200])

**Filtering rows based on text data**  
Subset rows where the `country` is `"China"`:  

In [None]:
print(brics[brics["country"] == "China"])

**Filtering rows based on multiple conditions**  
Combine conditions using logical operators:  

In [None]:
print(brics[(brics["population"] > 200) & (brics["area"] < 10)])

**Using `.isin()` for multiple values**  
Subset rows where the `country` is either `"Brazil"` or `"Russia"`:  

In [84]:
print(brics[brics["country"].isin(["Brazil", "Russia"])])

   country   capital  area  population
BR  Brazil  Brasília   8.5         211
RU  Russia    Moscow  17.1         144
