In [63]:
print("hello World")

hello World


In [64]:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4],[5, 6], [7, 8]])
print(df)

   0  1
0  1  2
1  3  4
2  5  6
3  7  8


# Understanding Pandas DataFrame Creation

## Introduction
The following Python script creates a **Pandas DataFrame** using a list of lists and prints it:

Explanation
- import pandas as pd: Imports the Pandas library, which is essential for data manipulation.
- pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]]): Creates a DataFrame, a tabular data structure.
- Each inner list represents a row in the DataFrame.
- By default, Pandas assigns numeric index values (0-based) to rows and column names as numbers.

#Logic Flow
- Import the pandas library.
- Create a DataFrame from a nested list (each sub-list becomes a row).
- Print the formatted DataFrame to the console.

Concepts Covered
- DataFrame: A two-dimensional labeled data structure.
- Indexing: Default row indexing starts at 0.
- Column Naming: If omitted, column names are numbered starting from 0.
- Printing a DataFrame: Outputs a clean, tabular structure.

Explanation
- import pandas as pd: Imports the Pandas library, which is essential for data manipulation.
- pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]]): Creates a DataFrame, a tabular data structure.
- Each inner list represents a row in the DataFrame.
- By default, Pandas assigns numeric index values (0-based) to rows and column names as numbers.

Behind-the-Scenes Operations
- Memory Allocation: Pandas internally allocates memory to store the DataFrame efficiently.
- Data Conversion: The list of lists is converted into a NumPy-style array for optimized performance.
- Index & Column Assignment: If no column names are provided, Pandas auto-assigns numeric column labels.



In [65]:
import pandas as pd
df = pd.DataFrame([[1, 2,3], [3, 4,5],[5, 6,7], [7, 8,9]],columns=['a','b','c'])
print(df)

   a  b  c
0  1  2  3
1  3  4  5
2  5  6  7
3  7  8  9


# **Pandas DataFrame - Simple Explanation**

## **What is a DataFrame?**
- A **DataFrame** is like a **table**.
- It can store data from different sources like **CSV files**, **JSON (dictionary)**, etc.

## **Basic Structure**
- We create a DataFrame using a variable (e.g., `df`). 
- `pd` is a **short form** of the `pandas` library.
- `DataFrame` has **capital D and F** because it's a class in Pandas.

## **Important Rules**
- **Number of columns must match** the number of values in each row.
- Example: If each sublist has **3 values**, the DataFrame should have **3 columns**.
- If they don’t match, you'll get an **AssertionError**.
- **Rows and columns can mismatch** in count, which is fine.eg there is 10 rows and 2  columns , but each row should have 2 values only  , since we have 2 column

## **Key Takeaway**
To use a DataFrame, follow these steps:
1. **Import Pandas** (`import pandas as pd`).
2. **Create DataFrame** using `pd.DataFrame(...)`.
3. **Ensure correct column-to-value mapping**.



In [66]:
df.head()

Unnamed: 0,a,b,c
0,1,2,3
1,3,4,5
2,5,6,7
3,7,8,9


In [67]:
df.head(2)

Unnamed: 0,a,b,c
0,1,2,3
1,3,4,5


In [68]:
df.tail(2)

Unnamed: 0,a,b,c
2,5,6,7
3,7,8,9


In [69]:
df.index

RangeIndex(start=0, stop=4, step=1)

In [70]:
df = pd.DataFrame([[1, 2,3], [3, 4,5],[5, 6,7], [7, 8,9]],columns=['a','b','c'],index=['x','y','z','w'])
print(df)

   a  b  c
x  1  2  3
y  3  4  5
z  5  6  7
w  7  8  9


In [71]:
df.index

Index(['x', 'y', 'z', 'w'], dtype='object')

In [72]:
df.info

<bound method DataFrame.info of    a  b  c
x  1  2  3
y  3  4  5
z  5  6  7
w  7  8  9>

In [73]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, x to w
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       4 non-null      int64
 1   b       4 non-null      int64
 2   c       4 non-null      int64
dtypes: int64(3)
memory usage: 128.0+ bytes


In [74]:
df.describe()

Unnamed: 0,a,b,c
count,4.0,4.0,4.0
mean,4.0,5.0,6.0
std,2.581989,2.581989,2.581989
min,1.0,2.0,3.0
25%,2.5,3.5,4.5
50%,4.0,5.0,6.0
75%,5.5,6.5,7.5
max,7.0,8.0,9.0


In [75]:
df.shape

(4, 3)

give shape of our data frame
it has 4 rows and 3 column

In [76]:
df.nunique()

a    4
b    4
c    4
dtype: int64

here a is column and number of uniquer value is 4

In [77]:
df['a'].nunique()

4

here we specifically mentioned to show unique value for column a

In [78]:
df.size

12

# Understanding `df.size` in Pandas

## 📌 Overview
In **Pandas**, `df.size` returns the total number of elements in a **DataFrame** or **Series**. It is calculated as:
- **DataFrame:** `size = number_of_rows × number_of_columns`
- **Series:** `size = number_of_rows`

## 🛠️ Behind the Scenes
When you call `df.size`, the Pandas **DataFrame object** retrieves the `shape` attribute:
```python
df.shape[0] * df.shape[1]  # Equivalent computation for df.size