# Pandas Fundamentals for Data Processing

This notebook demonstrates essential Pandas operations for data manipulation and analysis. We'll cover basic data structures, common operations, and best practices.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np

## Basic Data Structures

### Series
A Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, Python objects, etc.).

In [None]:
# Creating Series
s = pd.Series([1, 2, 3, 4, 5])
print("Basic Series:")
print(s)

# Series with custom labels
s_with_labels = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
print("\nSeries with custom labels:")
print(s_with_labels)

# Accessing elements
print("\nAccessing elements:")
print(f"By position (s[0]): {s[0]}")
print(f"By label (s_with_labels['a']): {s_with_labels['a']}")

### DataFrame
A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Think of it as a spreadsheet or SQL table.

In [None]:
# Creating DataFrame from dictionary
data = {
    'name': ['John', 'Anna', 'Peter'],
    'age': [28, 22, 35],
    'city': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)
print("DataFrame from dictionary:")
print(df)

# Creating DataFrame from list of dictionaries
records = [
    {'name': 'John', 'age': 28},
    {'name': 'Anna', 'age': 22}
]
df_records = pd.DataFrame(records)
print("\nDataFrame from list of dictionaries:")
print(df_records)

## Basic Operations

### Viewing Data
Let's explore different ways to inspect our DataFrame.

In [None]:
# Create a larger dataset for demonstration
df = pd.DataFrame({
    'name': ['John', 'Anna', 'Peter', 'Linda', 'Bob'],
    'age': [28, 22, 35, 25, 30],
    'city': ['New York', 'Paris', 'London', 'Berlin', 'Tokyo'],
    'salary': [50000, 60000, 75000, 65000, 70000]
})

# Basic information about DataFrame
print("DataFrame Info:")
df.info()

print("\nStatistical Summary:")
print(df.describe())

print("\nFirst 3 rows:")
print(df.head(3))

print("\nColumn names:")
print(df.columns)

### Selecting Data
Different methods to access and filter data in DataFrames.

In [None]:
# Selecting columns
print("Single column:")
print(df['name'])

print("\nMultiple columns:")
print(df[['name', 'age']])

# Filtering rows
print("\nPeople older than 25:")
print(df[df['age'] > 25])

# Using loc and iloc
print("\nUsing loc (label-based):")
print(df.loc[0:2, 'name':'age'])

print("\nUsing iloc (position-based):")
print(df.iloc[0:2, 0:2])

### Data Manipulation
Adding, removing, and modifying data in DataFrames.

In [None]:
# Adding a new column
df['bonus'] = df['salary'] * 0.1
print("DataFrame with bonus column:")
print(df)

# Adding a new row
new_row = pd.Series({'name': 'Mary', 'age': 31, 'city': 'Chicago', 'salary': 80000, 'bonus': 8000})
df = df.append(new_row, ignore_index=True)
print("\nDataFrame with new row:")
print(df)

# Removing a column
df = df.drop('bonus', axis=1)
print("\nDataFrame after removing bonus column:")
print(df)

### Grouping and Aggregation
Analyzing data by groups and performing aggregate operations.

In [None]:
# Basic grouping
print("Average salary by city:")
print(df.groupby('city')['salary'].mean())

# Multiple aggregations
print("\nMultiple aggregations by city:")
print(df.groupby('city').agg({
    'salary': ['mean', 'sum'],
    'age': 'max'
}))

### Merging DataFrames
Combining multiple DataFrames using different join operations.

In [None]:
# Create two DataFrames
df1 = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['John', 'Anna', 'Peter', 'Linda']
})

df2 = pd.DataFrame({
    'id': [1, 2, 3, 5],
    'salary': [50000, 60000, 75000, 65000]
})

# Inner join
print("Inner join:")
print(pd.merge(df1, df2, on='id'))

# Left join
print("\nLeft join:")
print(pd.merge(df1, df2, on='id', how='left'))

# Outer join
print("\nOuter join:")
print(pd.merge(df1, df2, on='id', how='outer'))