# Intro to Pandas

DataFrames provide numerous attributes and methods for data manipulation and analysis, including:

* `shape`: Returns the dimensions (number of rows and columns) of the DataFrame.
* `info()`: Provides a summary of the DataFrame, including data types and non-null counts.
* `describe()`: Generates summary statistics for numerical columns.
* `head(), tail()`: Displays the first or last n rows of the DataFrame.
* `mean(), sum(), min(), max()`: Calculate summary statistics for columns.
* `sort_values()`: Sort the DataFrame by one or more columns.
* `groupby()`: Group data based on specific columns for aggregation.
* `fillna(), drop(), rename()`: Handle missing values, drop columns, or rename columns.
* `apply()`: Apply a function to each element, row, or column of the DataFrame.

In [22]:
import pandas as pd

## Load Data to DataFrame

### CSVs

In [23]:
csv_path = "countries.csv"
df_csv = pd.read_csv(csv_path)

df_csv

Unnamed: 0,country,year,population
0,Afghanistan,1952,8425333
1,Afghanistan,1957,9240934
2,Afghanistan,1962,10267083
3,Afghanistan,1967,11537966
4,Afghanistan,1972,13079460
...,...,...,...
1699,Zimbabwe,1987,9216418
1700,Zimbabwe,1992,10704340
1701,Zimbabwe,1997,11404948
1702,Zimbabwe,2002,11926563


### Excel Spreadsheets

In [24]:
xlsx_path = "products.xlsx"

df_xlsx = pd.read_excel(xlsx_path)

df_xlsx

Unnamed: 0,product_id,price,merchant_id,brand,name
0,AVphzgbJLJeJML43fA0o,104.99,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,AVpgMuGwLJeJML43KY_c,69.00,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
3,AVpfVJXu1cnluZ0-iwTT,290.99,1001,DreamWave,DreamWave - Tremor Portable Bluetooth Speaker ...
4,AVphUeKeilAPnD_x3-Be,244.01,1004,Yamaha,NS-SP1800BL 5.1-Channel Home Theater System (B...
...,...,...,...,...,...
1240,AVphFybdLJeJML43Wnza,64.95,1110,JBL,"JBL - 6 x 9"" 3-Way Car Speakers with Polypropy..."
1241,AVpe_qIa1cnluZ0-bjrN,871.06,1002,HP,HP - ProBook 14 Laptop - Intel Core i5 - 4GB M...
1242,AVphibxI1cnluZ0-DpxG,74.95,1238,Magellan,Magellan Roadmate 5322-LM 5 Touchscreen Portab...
1243,AVpgrtW3ilAPnD_xv67M,294.35,1239,Pyle Pro,PMX840BT Bluetooth 8-Channel 800-Watt Powered ...


### Dictionaries

In [25]:
nhl_players = {
    "player_name": ["Gabriel Landeskog", "Philipp Grubauer", "Tyler Seguin", "Mikko Rantanen"],
    "player_team": ["Colorado Avalanche", "Seattle Kraken", "Dallas Stars", "Colorado Avalanche"],
    "player_position": ["Left Wing", "Goaltender", "Center", "Right Wing"],
}

df_nhl_players = pd.DataFrame(nhl_players)

df_nhl_players

Unnamed: 0,player_name,player_team,player_position
0,Gabriel Landeskog,Colorado Avalanche,Left Wing
1,Philipp Grubauer,Seattle Kraken,Goaltender
2,Tyler Seguin,Dallas Stars,Center
3,Mikko Rantanen,Colorado Avalanche,Right Wing


## See First 5 Rows of DataFrame

In [26]:
df_csv.head()

Unnamed: 0,country,year,population
0,Afghanistan,1952,8425333
1,Afghanistan,1957,9240934
2,Afghanistan,1962,10267083
3,Afghanistan,1967,11537966
4,Afghanistan,1972,13079460


## See Last 5 Rows of DataFrame

In [42]:
df_csv.tail()

Unnamed: 0,country,year,population
1699,Zimbabwe,1987,9216418
1700,Zimbabwe,1992,10704340
1701,Zimbabwe,1997,11404948
1702,Zimbabwe,2002,11926563
1703,Zimbabwe,2007,12311143


## Create New DataFrame from a Single Column

In [27]:
df_country_names = df_csv[["country"]]

df_country_names

Unnamed: 0,country
0,Afghanistan
1,Afghanistan
2,Afghanistan
3,Afghanistan
4,Afghanistan
...,...
1699,Zimbabwe
1700,Zimbabwe
1701,Zimbabwe
1702,Zimbabwe


### As well as from multiple columns

In [28]:
df_product_info_simple = df_xlsx[["name", "price"]]

df_product_info_simple

Unnamed: 0,name,price
0,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...,104.99
1,Boytone - 2500W 2.1-Ch. Home Theater System - ...,69.00
2,DENAQ - AC Adapter for TOSHIBA SATELLITE,23.99
3,DreamWave - Tremor Portable Bluetooth Speaker ...,290.99
4,NS-SP1800BL 5.1-Channel Home Theater System (B...,244.01
...,...,...
1240,"JBL - 6 x 9"" 3-Way Car Speakers with Polypropy...",64.95
1241,HP - ProBook 14 Laptop - Intel Core i5 - 4GB M...,871.06
1242,Magellan Roadmate 5322-LM 5 Touchscreen Portab...,74.95
1243,PMX840BT Bluetooth 8-Channel 800-Watt Powered ...,294.35


## Accessing DataFrame Cells

### With `iloc`

*locate cell by __index__*

`df.iloc[x, y]` where:
* x - row index
* y - column index

In [29]:
df_csv.iloc[0, 0]

'Afghanistan'

In [30]:
df_nhl_players.iloc[1, 2]

'Goaltender'

### With `loc`

*locate cell by __row index & column name__*

`df.loc[x, y]` where:
* x - row index
* y - column name

In [31]:
df_csv.loc[0, "country"]

'Afghanistan'

In [32]:
df_nhl_players.loc[1, "player_position"]

'Goaltender'

## Change DataFrame Index

In [33]:
df_players_new = df_nhl_players

df_players_new.index = ["row1", "row2", "row3", "row4"]

df_players_new

Unnamed: 0,player_name,player_team,player_position
row1,Gabriel Landeskog,Colorado Avalanche,Left Wing
row2,Philipp Grubauer,Seattle Kraken,Goaltender
row3,Tyler Seguin,Dallas Stars,Center
row4,Mikko Rantanen,Colorado Avalanche,Right Wing


### Using `loc` with new index

In [34]:
df_players_new.loc["row3", "player_name"]

'Tyler Seguin'

## Create New DataFrame with Slicing

### Using `iloc`

In [35]:
# 0:2 -- First two rows
# 0:3 -- First three columns

df_products_new = df_xlsx.iloc[0:2, 0:3]

df_products_new

Unnamed: 0,product_id,price,merchant_id
0,AVphzgbJLJeJML43fA0o,104.99,1001
1,AVpgMuGwLJeJML43KY_c,69.0,1002


### Using `loc`

__NOTE: column names needs to be in consecutive positions for slicing__

In [36]:
df_products_new2 = df_xlsx.loc[0:2, "merchant_id":"name"]

df_products_new2

Unnamed: 0,merchant_id,brand,name
0,1001,Sanus,Sanus VLF410B1 10-Inch Super Slim Full-Motion ...
1,1002,Boytone,Boytone - 2500W 2.1-Ch. Home Theater System - ...
2,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE


## Get Unique Column Values

In [37]:
df_csv["country"].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Australia', 'Austria', 'Bahrain', 'Bangladesh', 'Belgium',
       'Benin', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Czech Republic',
       'Denmark', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Ethiopia',
       'Finland', 'France', 'Gabon', 'Gambia', 'Germany', 'Ghana',
       'Greece', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Haiti',
       'Honduras', 'Hong Kong, China', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kenya', 'Korea, Dem. Rep.',
       'Korea, Rep.', 'Kuwait', 'Leba

## Create New DataFrame from Filtering

In [40]:
df_cheap_products = df_xlsx[df_xlsx["price"] < 50]

df_cheap_products

Unnamed: 0,product_id,price,merchant_id,brand,name
2,AVpe9FXeLJeJML43zHrq,23.99,1001,DENAQ,DENAQ - AC Adapter for TOSHIBA SATELLITE
16,AVpgnnzJilAPnD_xu_7Q,22.99,1001,Samsung,Samsung Universal 3100mAh Portable External Ba...
17,AVpfyGvu1cnluZ0-rMw9,49.00,1005,MEE audio,Air-Fi Runaway AF32 Stereo Bluetooth Wireless ...
19,AVpe8ZRY1cnluZ0-aY4H,21.99,1001,Peerless,Peerless - Round Ceiling Plate for Most Peerle...
20,AVpfLsb-ilAPnD_xWtDE,46.69,1002,Kenwood,Kenwood KFC-1653MRW 6.5 2-way Marine Speakers ...
...,...,...,...,...,...
1224,AVpe4HP5ilAPnD_xP1pH,43.95,1005,Fujifilm,instax mini Rainbow Instant Film (10 Exposures)
1228,AVpfjEjFilAPnD_xdvZk,25.59,1111,Bower,Bower - Electret Condenser Microphone
1234,AV1YFH27vKc47QAVgp0J,41.99,1042,Logitech,Logitech K800 Wireless Illuminated Keyboard ‚Ä...
1235,AV1YFuFbvKc47QAVgqCk,45.16,1002,LG,LG - Super-Multi 24x External USB 2.0 Double-L...


## Export DataFrame To CSV

In [41]:
df_cheap_products.to_csv("cheap_products.csv")