# Pandas DataFrame

A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table, or a dictionary of Series objects. DataFrames are one of the most commonly used data structures in data analysis and manipulation.

## Key Features of Pandas DataFrame

1. **Creation**: DataFrames can be created from various data structures like lists, dictionaries, and NumPy arrays.
2. **Indexing and Selection**: DataFrames support both label-based and integer-based indexing.
3. **Data Alignment**: Automatic alignment of data for arithmetic operations.
4. **Missing Data Handling**: Functions to handle missing data (NaN).
5. **Data Manipulation**: Functions for merging, joining, reshaping, and pivoting.
6. **Aggregation and Grouping**: Functions for grouping data and performing aggregate operations.
7. **Input/Output**: Functions to read from and write to various file formats (CSV, Excel, SQL, etc.).
8. **Statistical Functions**: Built-in functions for descriptive statistics and other statistical operations.

## Examples

### 1. Creating a DataFrame

In [None]:
import pandas as pd

# From a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)


### 2. Indexing and Selection

In [None]:
# Selecting a column
print(df['Name'])

# Selecting multiple columns
print(df[['Name', 'City']])

# Selecting rows by label
print(df.loc[0])

# Selecting rows by integer location
print(df.iloc[1])

### 3. Handling Missing Data

In [None]:
# Creating a DataFrame with missing values
data = {
    'A': [1, 2, None],
    'B': [None, 2, 3]
}
df = pd.DataFrame(data)

# Filling missing values
df_filled = df.fillna(0)
print(df_filled)

# Dropping rows with missing values
df_dropped = df.dropna()
print(df_dropped)

### 4. Data Manipulation

In [None]:
# Adding a new column
df['Country'] = ['USA', 'USA', 'USA']
print(df)

# Merging DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')
print(merged_df)

### 5. Aggregation and Grouping

In [None]:
grouped = df.groupby('City').mean()
print(grouped)

# Aggregating data
agg = df.agg({'Age': ['mean', 'min', 'max']})
print(agg)

### 6. Input/Output

In [None]:
# Reading from a CSV file
df_csv = pd.read_csv('data.csv')

# Writing to a CSV file
df.to_csv('output.csv', index=False)

### 7. Statistical Functions

In [None]:
# Descriptive statistics
print(df.describe())

# Correlation matrix
print(df.corr())

Pandas DataFrame is a powerful tool for data analysis and manipulation, offering a wide range of functionalities to handle and process data efficiently.