# Creating DataFrames in Pandas

The **DataFrame** is the **core data structure** in Pandas ‚Äî you'll use it in nearly every data science project.  
It represents **tabular data** (like an Excel sheet or SQL table) with rows and columns.

---

## üß± 1. Creating a DataFrame from Python Lists

You can create a DataFrame directly from a list of lists.

```python
import pandas as pd

data = [
    ["Alice", 25],
    ["Bob", 30],
    ["Charlie", 35]
]

df = pd.DataFrame(data, columns=["Name", "Age"])
print(df)
```
**Output:**
```
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
```

Each inner list becomes a row, and `columns` defines the column names.

---

## üßæ 2. Creating a DataFrame from a Dictionary of Lists

This is the **most common and readable** way to create a DataFrame.

```python
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
}

df = pd.DataFrame(data)
print(df)
```
**Output:**
```
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
```

üëâ Each **key** becomes a column, and each **list** provides the column data.

---

## üî¢ 3. Creating a DataFrame from NumPy Arrays

You can convert a NumPy array into a DataFrame.

```python
import numpy as np

arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr, columns=["A", "B"])
print(df)
```
**Output:**
```
   A  B
0  1  2
1  3  4
```

‚ö†Ô∏è Always provide **column names**, or Pandas will assign default integers.

---

## üìÇ 4. Creating a DataFrame from CSV Files

You can load CSV files directly using:

```python
df = pd.read_csv("data.csv")
```

### Useful options include:
- `sep` ‚Üí specify delimiter (e.g., comma, tab)  
- `header` ‚Üí specify header row  
- `names` ‚Üí custom column names  
- `index_col` ‚Üí specify index column  
- `usecols` ‚Üí select specific columns  
- `nrows` ‚Üí read limited number of rows  

**Example:**
```python
df = pd.read_csv("data.csv", usecols=["Name", "Age"])
```

---

## üìä 5. Creating a DataFrame from Excel Files

You can read Excel files using:

```python
df = pd.read_excel("data.xlsx")
```

üì¶ You might need to install additional libraries first:

```bash
pip install openpyxl
```
or
```bash
pip install xlrd
```

---

## üåê 6. Creating a DataFrame from JSON

```python
df = pd.read_json("data.json")
```
You can also read JSON directly from a **URL** or **string**.

---

## üß† 7. Creating a DataFrame from SQL Databases

You can read SQL query results directly into Pandas.

```python
import sqlite3

conn = sqlite3.connect("mydb.sqlite")
df = pd.read_sql("SELECT * FROM users", conn)
```

This allows direct integration with relational databases.

---

## üåç 8. Creating a DataFrame from the Web

You can even load datasets directly from online sources.

```python
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
df = pd.read_csv(url)
```

This is especially useful for fetching datasets for experiments and EDA.

---

## üîç EDA (Exploratory Data Analysis)

Once your DataFrame is ready, it‚Äôs time to **explore your data**.  
EDA helps you understand structure, spot patterns, detect missing data, and prepare for modeling.

### Key Methods for EDA:
```python
df.head()       # First 5 rows
df.tail()       # Last 5 rows
df.info()       # Column info: data types & null counts
df.describe()   # Summary statistics for numeric columns
df.columns      # List of column names
df.shape        # (rows, columns)
```

**Example:**
```python
print(df.head())
print(df.info())
print(df.describe())
```

üß† **EDA is essential** ‚Äî it helps you understand the dataset **before** applying transformations or models.

---

## ‚úÖ Summary

- You can create DataFrames from **lists, dictionaries, NumPy arrays, CSV, Excel, JSON, web, and SQL sources**.  
- Use `.head()`, `.info()`, `.describe()` to quickly **explore and summarize** datasets.  
- The DataFrame is the **foundation of data analysis** in Pandas ‚Äî master it early!


In [3]:
import numpy as np
import pandas as pd

In [4]:
data = [['Farhan', 76], ['Sam', 55]]                                  # Basic list of list

pd.DataFrame(data, columns = ['Names', 'Marks'])                      # Mannual columns name

Unnamed: 0,Names,Marks
0,Farhan,76
1,Sam,55


#### Using Dictionary

In [5]:
data1 = {
    "name": ['Farhan', 'Sam'],
    "marks": [33, 22]}

pd.DataFrame(data1)

Unnamed: 0,name,marks
0,Farhan,33
1,Sam,22


#### Using NumPy

In [6]:
arr = np.array([[1,2], [5,6]])

df = pd.DataFrame(arr, columns =['A', 'B'])
df

Unnamed: 0,A,B
0,1,2
1,5,6


#### Read Excel Files

In [11]:
dff = pd.read_excel('data.xlsx')
dff

Unnamed: 0,Name,School,Marks
0,Farhan,aa,45
1,Zeeshan,ff,76
2,Samiya,as,45
3,Fam,dwd,54


#### Read CSV Files

dfff = pd.read_csv('data.csv')

#### Read Json File

In [18]:
pd.read_json('data.json')               # Ensure you json data enclose in []

Unnamed: 0,fruit,size,color
0,Apple,Large,Red


#### From the Web (Example: CSV from URL)

In [19]:
df2 = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")
df2

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [20]:
df2.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [21]:
df2.tail()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.0,Female,Yes,Sat,Dinner,2
241,22.67,2.0,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
243,18.78,3.0,Female,No,Thur,Dinner,2


In [22]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-null    object 
 4   day         244 non-null    object 
 5   time        244 non-null    object 
 6   size        244 non-null    int64  
dtypes: float64(2), int64(1), object(4)
memory usage: 13.5+ KB


In [23]:
df2.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [25]:
df2.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

In [26]:
df2.shape

(244, 7)

In [1]:
import pandas as pd

In [2]:
data = 

Unnamed: 0,Name,Age,City,Gender,Email,Join Date
0,Alice,25.0,New York,F,alice@example.com,1/5/2021
1,Charlie,,Delhi,M,charlie@example,20-07-2021
2,Bob,30.0,Los Angeles,M,bob@example.com,15-06-2020
3,Charlie,,Delhi,M,charlie@example,20-07-2021
4,David,22.0,Mumbai,M,david@example.com,12/11/2019
5,,28.0,Delhi,F,eve@domain.com,
6,Alice,25.0,New York,F,alice@example.com,1/5/2021
7,Alice,25.0,New York,F,alice@example.com,1/5/2021
8,Charlie,,Delhi,M,charlie@example,20-07-2021


In [5]:
data = [
    ['Farhan',21],
    ['Samiya',22]
]
dff = pd.DataFrame(data, columns=['Name', 'Age'])

In [6]:
dff

Unnamed: 0,Name,Age
0,Farhan,21
1,Samiya,22


In [7]:
import numpy as np

In [9]:
arr = np.array([[1,2], [2,1]])

In [12]:
df1 = pd.DataFrame(arr, columns =["a", "v"])
df1

Unnamed: 0,a,v
0,1,2
1,2,1


In [13]:
dff.describe()

Unnamed: 0,Age
count,2.0
mean,21.5
std,0.707107
min,21.0
25%,21.25
50%,21.5
75%,21.75
max,22.0
