# Pandas: Overview and Examples

`pandas` is a Python library used for data manipulation, analysis, and visualization. It provides powerful data structures like `DataFrame` and `Series`.

## Key Features
- Data alignment and handling of missing data.
- Label-based slicing, indexing, and subsetting.
- File I/O (CSV, Excel, JSON, etc.).
- Data wrangling and analysis tools.


## Creating Data Structures

### Series
A one-dimensional array-like object with labeled indices.

### DataFrame
A two-dimensional tabular structure with labeled rows and columns.


In [3]:
!pip install pandas as pd
!pip install numpy as np

# Creating a Series
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print("Series:\n", series)

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)


[31mERROR: Could not find a version that satisfies the requirement as (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for as[0m[31m
[31mERROR: Could not find a version that satisfies the requirement as (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for as[0m[31m
[0m

NameError: name 'pd' is not defined

## Basic Operations

### Accessing Columns
Columns can be accessed like dictionaries or attributes.

### Adding/Removing Columns
Add new columns or remove existing ones dynamically.

### Accessing Rows
Use `.iloc` for positional indexing and `.loc` for label-based indexing.


In [None]:
# Accessing a column
print("Accessing 'Name' column:\n", df['Name'])

# Adding a new column
df['Bonus'] = df['Salary'] * 0.1
print("\nDataFrame with 'Bonus':\n", df)

# Removing a column
df = df.drop('Bonus', axis=1)
print("\nDataFrame after dropping 'Bonus':\n", df)

# Accessing rows
df.index = ['Emp1', 'Emp2', 'Emp3']
print("\nAccessing first row (positional):\n", df.iloc[0])
print("\nAccessing first row (label-based):\n", df.loc['Emp1'])


## File Operations

### CSV and Excel
Pandas allows seamless reading and writing of files.


In [None]:
# Writing to a CSV file
df.to_csv('output.csv', index=False)
print("DataFrame written to 'output.csv'.")

# Reading from a CSV file
df_from_csv = pd.read_csv('output.csv')
print("\nDataFrame read from CSV:\n", df_from_csv)

# Writing to an Excel file
df.to_excel('output.xlsx', index=False)
print("DataFrame written to 'output.xlsx'.")

# Reading from an Excel file
df_from_excel = pd.read_excel('output.xlsx')
print("\nDataFrame read from Excel:\n", df_from_excel)


## Data Manipulation

### Handling Missing Data
Handle missing values by filling or dropping them.

### Filtering and Sorting
Filter rows based on conditions and sort by columns.


In [None]:
# Adding a NaN value
df.loc[1, 'Salary'] = np.nan
print("\nDataFrame with NaN:\n", df)

# Filling missing values
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())
print("\nDataFrame after filling NaN:\n", df)

# Filtering rows
filtered_df = df[df['Age'] > 30]
print("\nFiltered DataFrame (Age > 30):\n", filtered_df)

# Sorting rows
sorted_df = df.sort_values(by='Age', ascending=False)
print("\nDataFrame sorted by Age:\n", sorted_df)


## Grouping and Aggregation

### Grouping Data
Group by a column to analyze subsets of data.

### Aggregation
Apply aggregation functions like `mean` or `sum`.


In [None]:
# Grouping data
grouped = df.groupby('Age')
for age, group in grouped:
    print(f"\nGroup for Age {age}:\n", group)

# Aggregation
print("\nAverage Salary by Age:\n", df.groupby('Age')['Salary'].mean())
print("\nTotal Salary by Age:\n", df.groupby('Age')['Salary'].sum())


## Merging and Joining

### Merge
Combine datasets with a common key.

### Join
Combine datasets on their indices.


In [None]:
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Salary': [50000, 60000]})

# Merging
merged_df = pd.merge(df1, df2, on='ID')
print("\nMerged DataFrame:\n", merged_df)

# Joining
left = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['a', 'b'])
right = pd.DataFrame({'C': [5, 6]}, index=['a', 'b'])
joined_df = left.join(right)
print("\nJoined DataFrame:\n", joined_df)


## Visualization

### Plotting
Visualize data using pandas' built-in plotting.


In [None]:
import matplotlib.pyplot as plt

# Bar chart
df['Salary'].plot(kind='bar', title='Salary Distribution')
plt.show()

# Line chart
df['Age'].plot(kind='line', title='Age Distribution')
plt.show()
