### **Reading in Files**

In [1]:
import pandas as pd 

- Load files in pandas using `pd.read_file_type(r"file_path")`
- `r` makes pandas read it literally
- Assign loaded file to a DataFrame using `df=`

In [None]:
# Importing CSV Files:
df = pd.read_csv(r"D:\New Desktop\LEARNINGS\Pandas Source Files\countries of the world.csv")
df

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
...,...,...
222,West Bank,NEAR EAST
223,Western Sahara,NORTHERN AFRICA
224,Yemen,NEAR EAST
225,Zambia,SUB-SAHARAN AFRICA


#### **Common Parameters in `pandas.read_csv()` (most widely used)**

| Parameter         | Description |
|-------------------|-------------|
| `sep` or `delimiter` | Define custom separator (e.g., `','`, `'\t'`, `';'`) |
| `header` | Row number to use as header (`0` for first row, `None` if no header) |
| `names` | Provide custom column names |
| `index_col` | Set which column to use as index |
| `usecols` | Load only specific columns (by name or index) |
| `dtype` | Force specific data types (e.g., `{'age': int}`) |
| `skiprows` | Skip N rows at the top |
| `nrows` | Read only the first N rows |
| `na_values` | Specify custom missing value representations (e.g., `['NA', 'null', '-']`) |
| `encoding` | Handle file encoding (e.g., `'utf-8'`, `'ISO-8859-1'`) |
| `parse_dates` | Combine date/time columns into datetime format |
| `error_bad_lines` *(deprecated)* | Skip bad lines (used in older versions) |
| `engine` | Set parsing engine (`'c'` is fast, `'python'` is flexible) |


In [13]:
# Importing Text Files:
df = pd.read_table(r"D:\New Desktop\LEARNINGS\Pandas Source Files\countries of the world.txt")
df

Unnamed: 0,Country,Region
0,Afghanistan,ASIA (EX. NEAR EAST)
1,Albania,EASTERN EUROPE
2,Algeria,NORTHERN AFRICA
3,American Samoa,OCEANIA
4,Andorra,WESTERN EUROPE
...,...,...
222,West Bank,NEAR EAST
223,Western Sahara,NORTHERN AFRICA
224,Yemen,NEAR EAST
225,Zambia,SUB-SAHARAN AFRICA


In [14]:
# Importing JSON Files
df = pd.read_json(r"D:\New Desktop\LEARNINGS\Pandas Source Files\json_sample.json")
df

Unnamed: 0,12 Strong,A Fantastic Woman (Una Mujer Fantástica),All The Money In The World,Bilal: A New Breed Of Hero,Call Me By Your Name,Darkest Hour,Den Of Thieves,Ferdinand,Fifty Shades Freed,Film Stars Don'T Die In Liverpool,...,The 15:17 To Paris,The Commuter,The Disaster Artist,The Greatest Showman,The Insult (L'Insulte),The Post,The Shape Of Water,"Three Billboards Outside Ebbing, Missouri",Till The End Of The World,Winchester
0,"{'Genre': 'Action', 'Gross': '$453,173', 'IMDB...","{'popcornscore': 83, 'rating': 'R', 'tomatosco...","{'popcornscore': 71, 'rating': 'R', 'tomatosco...","{'popcornscore': 91, 'rating': 'PG13', 'tomato...","{'popcornscore': 87, 'rating': 'R', 'tomatosco...","{'popcornscore': 84, 'rating': 'PG13', 'tomato...","{'Genre': 'Action', 'Gross': '$491,898', 'IMDB...","{'popcornscore': 49, 'rating': 'PG', 'tomatosc...","{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB M...","{'popcornscore': 69, 'rating': 'R', 'tomatosco...",...,"{'Genre': 'Drama', 'Gross': 'unknown', 'IMDB M...","{'popcornscore': 48, 'rating': 'PG13', 'tomato...","{'popcornscore': 89, 'rating': 'R', 'tomatosco...","{'Genre': 'Biography', 'Gross': '$627,248', 'I...","{'popcornscore': 86, 'rating': 'R', 'tomatosco...","{'Genre': 'Biography', 'Gross': '$463,228', 'I...","{'Genre': 'Adventure', 'Gross': '$448,287', 'I...","{'popcornscore': 87, 'rating': 'R', 'tomatosco...","{'popcornscore': -1, 'rating': 'NR', 'tomatosc...","{'Genre': 'Biography', 'Gross': '$696,786', 'I..."


- While importing Excel WB, pandas automatically reads first sheet.
- To import specific sheet, use `sheet_name = "sheet_name"`

In [15]:
# Importing Excel Files:
df = pd.read_excel(r"D:\New Desktop\LEARNINGS\Pandas Source Files\world_population_excel_workbook.xlsx")
df


Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (kmÂ²),Density (per kmÂ²),Growth Rate,World Population Percentage
0,36,AFG,Afghanistan,Kabul,Asia,41128771,38972230,33753499,28189672,19542982,10694796,12486631,10752971,652230,63.0587,1.0257,0.52
1,138,ALB,Albania,Tirana,Europe,2842321,2866849,2882481,2913399,3182021,3295066,2941651,2324731,28748,98.8702,0.9957,0.04
2,34,DZA,Algeria,Algiers,Africa,44903225,43451666,39543154,35856344,30774621,25518074,18739378,13795915,2381741,18.8531,1.0164,0.56
3,213,ASM,American Samoa,Pago Pago,Oceania,44273,46189,51368,54849,58230,47818,32886,27075,199,222.4774,0.9831,0.00
4,203,AND,Andorra,Andorra la Vella,Europe,79824,77700,71746,71519,66097,53569,35611,19860,468,170.5641,1.0100,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
229,226,WLF,Wallis and Futuna,Mata-Utu,Oceania,11572,11655,12182,13142,14723,13454,11315,9377,142,81.4930,0.9953,0.00
230,172,ESH,Western Sahara,El AaiÃºn,Africa,575986,556048,491824,413296,270375,178529,116775,76371,266000,2.1654,1.0184,0.01
231,46,YEM,Yemen,Sanaa,Asia,33696614,32284046,28516545,24743946,18628700,13375121,9204938,6843607,527968,63.8232,1.0217,0.42
232,63,ZMB,Zambia,Lusaka,Africa,20017675,18927715,16248230,13792086,9891136,7686401,5720438,4281671,752612,26.5976,1.0280,0.25


In [21]:
# Importing Excel Sheet from Workbook:
df_sheet = pd.read_excel(r"D:\New Desktop\LEARNINGS\Pandas Source Files\world_population_excel_workbook.xlsx", sheet_name = "Sheet1")
df_sheet

Unnamed: 0,Rank,CCA3,Country,Capital
0,36,AFG,Afghanistan,Kabul
1,138,ALB,Albania,Tirana
2,34,DZA,Algeria,Algiers
3,213,ASM,American Samoa,Pago Pago
4,203,AND,Andorra,Andorra la Vella
...,...,...,...,...
229,226,WLF,Wallis and Futuna,Mata-Utu
230,172,ESH,Western Sahara,El AaiÃºn
231,46,YEM,Yemen,Sanaa
232,63,ZMB,Zambia,Lusaka


##### **What is `pd.set_option()` in Pandas?**

`pd.set_option()` is used to **configure global settings** in Pandas, such as how data is displayed or how certain operations behave.
##### ***Common Use Cases***:

| Use Case | Code |
|----------|------|
| Show all columns | `pd.set_option('display.max_columns', None)` |
| Show all rows | `pd.set_option('display.max_rows', None)` |
| Set float precision | `pd.set_option('display.precision', 3)` |
| Expand column width | `pd.set_option('display.max_colwidth', None)` |
| Disable truncation of nested data | `pd.set_option('display.expand_frame_repr', False)` |
| Reset to default | `pd.reset_option('display.max_columns')` |



In [None]:
pd.set_option('display.max_rows', None)
pd.reset_option('display.max_rows')

**Getting Information for DF**
- Use `data_frame.info()`

In [22]:
df_sheet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Rank     234 non-null    int64 
 1   CCA3     234 non-null    object
 2   Country  234 non-null    object
 3   Capital  234 non-null    object
dtypes: int64(1), object(3)
memory usage: 7.4+ KB


**Finding Structure of DF**
- Use `data_frame.shape`

In [23]:
df_sheet.shape

(234, 4)

**Printing Desired Rows**
- Use `data_frame.head(num)`

In [None]:
# From top
df_sheet.head(10)

Unnamed: 0,Rank,CCA3,Country,Capital
0,36,AFG,Afghanistan,Kabul
1,138,ALB,Albania,Tirana
2,34,DZA,Algeria,Algiers
3,213,ASM,American Samoa,Pago Pago
4,203,AND,Andorra,Andorra la Vella
5,42,AGO,Angola,Luanda
6,224,AIA,Anguilla,The Valley
7,201,ATG,Antigua and Barbuda,Saint Johnâ€™s
8,33,ARG,Argentina,Buenos Aires
9,140,ARM,Armenia,Yerevan


In [25]:
# From bottom
df_sheet.tail(10)

Unnamed: 0,Rank,CCA3,Country,Capital
224,43,UZB,Uzbekistan,Tashkent
225,181,VUT,Vanuatu,Port-Vila
226,234,VAT,Vatican City,Vatican City
227,51,VEN,Venezuela,Caracas
228,16,VNM,Vietnam,Hanoi
229,226,WLF,Wallis and Futuna,Mata-Utu
230,172,ESH,Western Sahara,El AaiÃºn
231,46,YEM,Yemen,Sanaa
232,63,ZMB,Zambia,Lusaka
233,74,ZWE,Zimbabwe,Harare


##### **`df.loc` vs `df.iloc`**

Both are used to **access rows and columns**, but they work differently:

##### `df.loc[]` → **Label-based indexing**
- Uses **row/column labels** (names).
- Slicing is **inclusive**.
- Supports **conditional filtering**.

##### Examples:
```python
df.loc[2]                      # Row with label 2
df.loc[2:4]                    # Rows with labels 2, 3, 4
df.loc[:, 'Name']             # All rows, 'Name' column
df.loc[df['Age'] > 25]        # Filter rows where Age > 25
df.loc[0:3, ['Name', 'Age']]  # Rows 0–3, columns 'Name' and 'Age'
```

##### `df.iloc[]` → **Integer position-based indexing**

- Uses **zero-based integer positions** to select rows and columns.
- Slicing is **exclusive** on the end (`[start:end)`), just like regular Python lists.
- Does **not** use labels or names.

##### Examples:
```python
df.iloc[2]                    # Selects the 3rd row (index 2)
df.iloc[2:5]                  # Selects rows at positions 2, 3, 4
df.iloc[:, 0]                 # All rows, first column
df.iloc[1:4, [0, 2]]          # Rows 1 to 3, columns at positions 0 and 2
```


In [26]:
df_sheet.loc[134]

Rank          217
CCA3          MCO
Country    Monaco
Capital    Monaco
Name: 134, dtype: object

In [28]:
df_sheet.iloc[134]

Rank          217
CCA3          MCO
Country    Monaco
Capital    Monaco
Name: 134, dtype: object

> Same results in above two cases because row label starts from 0 therefore `loc` returns data corresponding to label 217.