# 🐼 Pandas Part 1

---

### 🏷️ How Labels Work in a DataFrame
Before diving into DataFrames, it helps to understand how pandas uses *labels* to organize and access data.

- **Row labels** 
  - Row labels come from the DataFrame’s index, which is an integer range by default (0, 1, 2)
  - You can replace the default index with custom row labels (such as strings or dates) to better identify rows.
  
- **Column labels** 
  - are just the column names (usually strings)
  - these are set automatically when you define a DataFrame or load data from csv, etc. 

### ✨ Creating DataFrames in pandas

- Can create manually from dictionary, read from CSV, parquet, etc. 

#### 🧮 Create DataFrame from Dictionary 
- Use  `pd.DataFrame()` and pass a dictionary 
    - Keys = column names
    - Values = column data as a `list`
- Pandas creates a default row index (0, 1, 2) but you can set custom labels for easier row selection

- To set your own row labels, set the DataFrame’s `index` attribute to a Python list w/ the labels you want.
- Ex: `brics.index` = ["BR", "RU", "IN"]

 ▶️ Run cell below for example

In [24]:
dict = { 
    "country": ["Brazil", "Russia", "India", "China", "South Africa"], 
    "capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
    "area": [8.516, 17.10, 3.286, 9.597, 1.221],
    "population":[200.4, 143.5, 1252, 1357, 52.98]
      }
import pandas as pd 
brics = pd.DataFrame(dict)
brics.index = ["BR", "RU", "IN", "CH", "SA"]
print(brics)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


### 🗂️ Create DataFrame from CSV
- Use `pd.read_csv("filename.csv")` to load data from a CSV file into a DataFrame.
- By default, pandas adds a new column with row label being a numeric index (`0, 1, 2, ...`) unless specified otherwise.

**If the first column contains row labels**: 
- tell pandas to use the first column as the index by setting  `index_col=0` 
- IMPT! Bc using row_labels as index enables easily accessing rows, columnns, or single elements. 

**If the first column does NOT contain row labels**: 
-  no need to set `index_col`; pandas will create a numeric index automatically.

**Example: Read CSV with index_col = 0** 
- Use `pd.read_csv("brics.csv", index_col=0)` to treat the first column in brics.csv as row labels.

▶️ Run the cell below

In [28]:
import pandas as pd 
df_with_index = pd.read_csv("brics.csv", index_col= 0) 
print(df_with_index)

     country    capital
code                   
BR    Brazil   Brasilia
RU    Russia     Moscow
IN    India   New Delhi


**Example: Read CSV without `index_col`**

- Use `pd.read_csv("brics.csv")` without specifying the index.  
- In this example, the first column (`code`) already contains row labels.  
- But since we *don’t* tell pandas to use it, pandas assigns a new integer-based index automatically.  
- As a result, the DataFrame is indexed by integers *not* by the row labels (`code`) — which prevents label-based access (discussed further below).

▶️ Run the cell below

In [33]:
df_no_index = pd.read_csv("brics.csv") 
print (df_no_index)

  code country    capital
0   BR  Brazil   Brasilia
1   RU  Russia     Moscow
2   IN  India   New Delhi


### ✅ Summary
- pandas creates a numeric index by default with read_csv() 
- ✅ If your CSV has no row labels, pandas creates a numeric index — you access rows by position (e.g., `iloc`).
- ✅ If your CSV does have row labels, set `index_col=0`— this lets you access rows by label (e.g., `loc`).
- ❌ If you forget `index_col=0` in the second case, pandas treats the row labels as data and still creates numeric index — which breaks label-based access later on. You won't be able to specify a row_label to access specific data. 




---


### 🔎 **Accessing Columns & Rows in DataFrames**

The following is about how to access specific columns and rows within a DataFrame.

##### 🔹 Select a specific column 
- Use square brackets with the column name → `brics["country"]`
- Returns a Series containing data from that column.
- A `series` is foundational building block of a DataFrame 



**Example: Select 'country' column only** 


▶️ Run the cell below

In [31]:
brics["country"]

BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object

##### 🔹 Select specific rows 
- `iloc[]`: Access rows by index position. Use when you care about or know position (row 0)
- `loc[]`: Access rows by label. Use to access data by a descriptive row label (the 'code' column in brics.csv) 
