<a href="https://colab.research.google.com/github/joseeden/joeden/blob/master/docs/021-Software-Engineering/021-Jupyter-Notebooks/001-Sample-Notebooks/003-pandas-methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Methods

Consider the BRICS dataframe below. It is populated with the data from the dictionary.

In [50]:
import pandas as pd

data = {
    "country": ["Brazil", "Russia", "India", "China", "South Africa"],
    "capital": ["Brasília", "Moscow", "New Delhi", "Beijing", "Pretoria"],
    "area": [8.5, 17.1, 3.3, 9.6, 1.2],
    "population": [211, 144, 1380, 1393, 58]
}

brics = pd.DataFrame(data)
brics.index = ["BR", "RU", "IN", "CH", "SA"]
print(brics)

         country    capital  area  population
BR        Brazil   Brasília   8.5         211
RU        Russia     Moscow  17.1         144
IN         India  New Delhi   3.3        1380
CH         China    Beijing   9.6        1393
SA  South Africa   Pretoria   1.2          58


To manipulate DataFrames, you can use some common methods and attributes:

Displays the first few rows of the DataFrame using `head()`: 

In [51]:
print(brics.head())

         country    capital  area  population
BR        Brazil   Brasília   8.5         211
RU        Russia     Moscow  17.1         144
IN         India  New Delhi   3.3        1380
CH         China    Beijing   9.6        1393
SA  South Africa   Pretoria   1.2          58


Summarize column names, data types, and missing values:

In [52]:
print(brics.info())

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, BR to SA
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     5 non-null      object 
 1   capital     5 non-null      object 
 2   area        5 non-null      float64
 3   population  5 non-null      int64  
dtypes: float64(1), int64(1), object(2)
memory usage: 200.0+ bytes
None


Return the number of rows and columns as a tuple:  

In [53]:
print(brics.shape)

(5, 4)


Compute summary statistics for numerical columns:  

In [54]:
print(brics.describe())

            area   population
count   5.000000     5.000000
mean    7.940000   637.200000
std     6.203467   686.176144
min     1.200000    58.000000
25%     3.300000   144.000000
50%     8.500000   211.000000
75%     9.600000  1380.000000
max    17.100000  1393.000000


Retrieve column names and row labels: 

In [55]:
print(brics.columns)
print(brics.index)

Index(['country', 'capital', 'area', 'population'], dtype='object')
Index(['BR', 'RU', 'IN', 'CH', 'SA'], dtype='object')


Access the data as a 2D NumPy array:  

In [56]:
print(brics.values)

[['Brazil' 'Brasília' 8.5 211]
 ['Russia' 'Moscow' 17.1 144]
 ['India' 'New Delhi' 3.3 1380]
 ['China' 'Beijing' 9.6 1393]
 ['South Africa' 'Pretoria' 1.2 58]]


# Sorting 

**Sorting by a single column**  
Sort rows by values in a column. For example, sort by `area`:  

In [57]:
print(brics.sort_values("area"))

         country    capital  area  population
SA  South Africa   Pretoria   1.2          58
IN         India  New Delhi   3.3        1380
BR        Brazil   Brasília   8.5         211
CH         China    Beijing   9.6        1393
RU        Russia     Moscow  17.1         144


**Sorting in descending order**  
Set `ascending=False` to reverse the order:  

In [58]:
print(brics.sort_values("area", ascending=False))

         country    capital  area  population
RU        Russia     Moscow  17.1         144
CH         China    Beijing   9.6        1393
BR        Brazil   Brasília   8.5         211
IN         India  New Delhi   3.3        1380
SA  South Africa   Pretoria   1.2          58


**Sorting by multiple columns**  
Pass a list of column names to `sort_values`. For example, sort by `population` and then `area`:  

In [59]:
print(brics.sort_values(["population", "area"]))

         country    capital  area  population
SA  South Africa   Pretoria   1.2          58
RU        Russia     Moscow  17.1         144
BR        Brazil   Brasília   8.5         211
IN         India  New Delhi   3.3        1380
CH         China    Beijing   9.6        1393


**Sorting in different directions**  
Use `ascending` with a list for each column:  

In [60]:
print(brics.sort_values(["population", "area"], ascending=[True, False]))

         country    capital  area  population
SA  South Africa   Pretoria   1.2          58
RU        Russia     Moscow  17.1         144
BR        Brazil   Brasília   8.5         211
IN         India  New Delhi   3.3        1380
CH         China    Beijing   9.6        1393


# Subsetting  

**Selecting one column**  
Use square brackets with the column name:  

In [61]:
print(brics["country"])

BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object


**Selecting multiple columns**  
Use double square brackets with a list of column names:  

In [62]:
print(brics[["country", "population"]])

         country  population
BR        Brazil         211
RU        Russia         144
IN         India        1380
CH         China        1393
SA  South Africa          58


**Filtering rows based on a condition**  
Subset rows where `population` is greater than 200:  

In [63]:
print(brics[brics["population"] > 200])

   country    capital  area  population
BR  Brazil   Brasília   8.5         211
IN   India  New Delhi   3.3        1380
CH   China    Beijing   9.6        1393


**Filtering rows based on text data**  
Subset rows where the `country` is `"China"`:  

In [64]:
print(brics[brics["country"] == "China"])

   country  capital  area  population
CH   China  Beijing   9.6        1393


**Filtering rows based on multiple conditions**  
Combine conditions using logical operators:  

In [65]:
print(brics[(brics["population"] > 200) & (brics["area"] < 10)])

   country    capital  area  population
BR  Brazil   Brasília   8.5         211
IN   India  New Delhi   3.3        1380
CH   China    Beijing   9.6        1393


**Using `.isin()` for multiple values**  
Subset rows where the `country` is either `"Brazil"` or `"Russia"`:  

In [66]:
print(brics[brics["country"].isin(["Brazil", "Russia"])])

   country   capital  area  population
BR  Brazil  Brasília   8.5         211
RU  Russia    Moscow  17.1         144
