# Introduction to DataFrames in Pandas

In [1]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## Creating DataFrames:

**DataFrames** can be created in various ways using **pandas**. Two common approaches include using **lists** and **dictionaries**:


#### Using List

In [3]:

# using lists
student_data = [
 [100,80,10],
 [90,70,7],
 [120,100,14],
 [80,50,2]
]
pd.DataFrame(student_data,columns=['iq','marks','package'])

Unnamed: 0,iq,marks,package
0,100,80,10
1,90,70,7
2,120,100,14
3,80,50,2


#### Using Dictionary

In [5]:
# using dictionary
student_dict = {
 'name':['nitish','ankit','rupesh','rishabh','amit','ankita'],
 'iq':[100,90,120,80,0,0],
 'marks':[80,70,100,50,0,0],
 'package':[10,7,14,2,0,0]
}
students = pd.DataFrame(student_dict)
students

Unnamed: 0,name,iq,marks,package
0,nitish,100,80,10
1,ankit,90,70,7
2,rupesh,120,100,14
3,rishabh,80,50,2
4,amit,0,0,0
5,ankita,0,0,0


#### 📄 Reading Data from CSV in Pandas

You can create **DataFrames** by reading data from **CSV files** using the `pd.read_csv()` function.

## Attributes of DataFrames:

DataFrames have several attributes that provide information about their structure and
content:

## 👁️ Viewing and Exploring Data in Pandas

### 🔹 Viewing Data

Use the following methods to **view data** in a DataFrame:

```python
df.head()      # View the first 5 rows
df.tail()      # View the last 5 rows
df.sample(n=3) # View 3 random rows
```

---

### 🧾 Information About the DataFrame

Use `.info()` to get metadata about the DataFrame:

```python
df.info()
```

This provides:
- Column names and data types
- Non-null counts
- Memory usage

---

### ❓ Checking for Missing Data

Use `.isnull()` to check for **NaN values**:

```python
df.isnull()
df.isnull().sum()  # Count missing values per column
```

---

### 🔁 Detecting Duplicated Rows

Use `.duplicated()` to find duplicate rows:

```python
df.duplicated()
df.duplicated().sum()  # Count total duplicates
```

---

### 🏷️ Renaming Columns

Use `.rename()` to rename columns:

```python
df.rename(columns={'old_name': 'new_name'}, inplace=True)
```

- Use `inplace=True` to apply changes permanently.
- Without `inplace`, the method returns a modified copy.

---


# Pandas DataFrame Operations

In [7]:

import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

## 📝 Notes on the Code for Pandas DataFrame Operations

### 1. Creating DataFrames

The code demonstrates several fundamental methods to create **Pandas DataFrames**:

#### 📌 From Lists
You can create a DataFrame from a list of lists, where each inner list represents a row.

```python
import pandas as pd

data = [[1, 'Alice'], [2, 'Bob']]
df = pd.DataFrame(data, columns=['ID', 'Name'])
```

#### 📌 From Dictionaries
You can create a DataFrame from a dictionary where the keys become the column names.

```python
data = {
    'ID': [1, 2],
    'Name': ['Alice', 'Bob']
}
df = pd.DataFrame(data)
```

#### 📌 Reading from a CSV File
DataFrames can also be created by reading data from a CSV file:

```python
df = pd.read_csv('data.csv')
```

---




In [8]:
# using lists
student_data = [
 [100,80,10],
 [90,70,7],
 [120,100,14],
 [80,50,2]
]
pd.DataFrame(student_data,columns=['iq','marks','package'])

Unnamed: 0,iq,marks,package
0,100,80,10
1,90,70,7
2,120,100,14
3,80,50,2


In [9]:
# using dicts
student_dict = {
 'name':['nitish','ankit','rupesh','rishabh','amit','ankita'],
 'iq':[100,90,120,80,0,0], 
     'marks':[80,70,100,50,0,0],
 'package':[10,7,14,2,0,0]
}
students = pd.DataFrame(student_dict)
students.set_index('name',inplace=True)
students

Unnamed: 0_level_0,iq,marks,package
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
nitish,100,80,10
ankit,90,70,7
rupesh,120,100,14
rishabh,80,50,2
amit,0,0,0
ankita,0,0,0


## 📌 Selecting Data from a DataFrame in Pandas

### ✅ Selecting Columns
You can select specific columns from a DataFrame using square brackets:

```python
movies['title_x']  # Selects the 'title_x' column from the 'movies' DataFrame
```

You can also select multiple columns by passing a list of column names:

```python
movies[['title_x', 'genre']]
```

---

### ✅ Selecting Rows

#### Using `iloc` (Integer-based indexing)
Select rows by their **position**:

```python
movies.iloc[0]        # First row
movies.iloc[0:5]      # First 5 rows
```

#### Using `loc` (Label-based indexing)
Select rows by their **index label**:

```python
movies.loc[100]       # Row with index label 100
movies.loc[100:105]   # Rows from index 100 to 105
```

---

### ✅ Selecting Both Rows and Columns

You can use `.iloc[]` or `.loc[]` with row and column selection:

```python
# Using iloc: rows 0 to 2, columns 0 to 1
movies.iloc[0:3, 0:2]

# Using loc: rows with index labels 100 to 102, columns 'title_x' and 'genre'
movies.loc[100:102, ['title_x', 'genre']]
```

---

### 📌 Summary

These are **fundamental operations** when working with Pandas DataFrames. They are useful for:
- Data manipulation
- Exploratory data analysis
- Extracting specific rows/columns
- Filtering and slicing data effectively
