# Pandas Cheat Sheet | Data Manipulation and Analysis

## Table of Contents

1. [Importing Pandas](#1)
2. [Data Structures](#2)
   * [2.1. Series](#2.1)
   * [2.2. DataFrame](#2.2)
3. [Data Manipulation](#3)
   * [3.1. Reading and Writing Data](#3.1)
   * [3.2. Data Exploration](#3.2)
   * [3.3. Data Cleaning](#3.3)
   * [3.4. Data Selection](#3.4)
   * [3.5. Data Filtering](#3.5)
   * [3.6. Data Aggregation](#3.6)
   * [3.7. Data Visualization](#3.7)
4. [DataFrame Operations](#4)
   * [4.1. Merge and Join](#4.1)
   * [4.2. Reshaping](#4.2)
   * [4.3. Pivoting](#4.3)
5. [Time Series](#5)
6. [Advanced Topics](#6)
   * [6.1. Applying Functions](#6.1)
   * [6.2. Custom Functions](#6.2)
   * [6.3. Combining DataFrames](#6.3)

# Introduction to Pandas
Pandas is a popular data manipulation library in Python. It provides data structures and functions for working with structured data, making it an essential tool for data analysis and manipulation. 

<a id = "1"></a>
# 1. Importing Pandas
Before you can start using Pandas, you need to import it into your Python environment. This can be done with a simple `import` statement.

In [1]:
import pandas as pd

<a id = "2"></a>
# 2. Data Structures
Pandas primarily deals with two main data structures: Series and DataFrame.

<a id = "2.1"></a>
### 2.1. Series
A `Series` is a one-dimensional array-like object that can hold various data types. It is essentially a column in a spreadsheet or a single dataset. You can create a `Series` from a `list`, `array` or `dictionary`.

In [2]:
import pandas as pd

data = [1, 2, 3, 4, 5]
s = pd.Series(data)

<a id = "2.2"></a>
### 2.2. DataFrame
A `DataFrame` is a two-dimensional table, similar to a spreadsheet. It consists of rows and columns, with each column being a `Series`. You can create a `DataFrame` from various data sources including `dictionaries`, `lists` or external files like CSV.

In [3]:
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

<a id = "3"></a>
# 3. Data Manipulation
Data manipulation is a crucial aspect of working with Pandas. It includes reading and writing data, data exploration, cleaning, selection, filtering, aggregation and visualization.

<a id = "3.1"></a>
### 3.1. Reading and Writing Data
Pandas can read and write data from various sources like CSV, Excel, SQL databases and more.

In [None]:
df = pd.read_csv('data.csv')   # Reading data from a CSV file
df.to_csv('new_data.csv', index=False)  # Writing data to a CSV file

<a id = "3.2"></a>
### 3.2. Data Exploration
Data exploration helps you get an overview of your dataset. It involves functions like `head()`, `tail()`, `describe()`, `info()`.

In [None]:
df.head()   # Display the first 5 rows of a DataFrame
df.tail()  # Display the last 5 rows of a DataFrame
df.describe() # Generate summary statistics of the numeric columns of a DataFrame
df.info()  # Display information about the DataFrame

<a id = "3.3"></a>
### 3.3. Data Cleaning
Data cleaning involves handling missing values, duplicate data, and other issues. `dropna()`, `fillna()`, `drop_duplicates()` are frequently used function in data cleaning.

In [None]:
df.dropna()   # Remove rows with missing values

df.fillna(0)  # Fill missing values with a specified value
df['Age'].fillna(df['Age'].mean(), inplace=True)  # Replace missing values with the mean of the column

df.drop_duplicates()  # Remove duplicate rows

<a id = "3.4"></a>
### 3.4. Data Selection
You can select specific rows and columns from a dataframe.

In [None]:
df['Column']  # Select a single column
df[['Column1', 'Column2']]  # Select multiple columns
df.iloc[row_index, col_index]  # Select by index

df[df['Column'] > 5]  # Select rows based on a condition
df[(df['Column1'] > 5) & (df['Column2'] < 10)]  # Multiple conditions

df.loc['Label']  # Label-based selection
df.iloc[3]  # Integer-based selection

# More advanced selection
df.loc['Label', 'Column']
df.iloc[2:4, 0:2]

<a id = "3.5"></a>
### 3.5. Data Filtering
Filtering allows you to create subsets of data based on conditions.

In [None]:
# Filter data based on multiple conditions
df[(df['Age'] > 30) & (df['Gender'] == 'Female')]

<a id = "3.6"></a>
### 3.6. Data Aggregation
Aggregation involves summarizing data using functions like `groupby()`, `sum()`, `mean()`, etc.

In [None]:
df.groupby('City')['Sales'].mean()     # Group data by a column and calculate the mean of another column
df.groupby('Category')['Sales'].sum()  # Group data by a column and calculate the sum

grouped['Column'].sum()

grouped['Column'].mean()

<a id = "3.7"></a>
### 3.7. Data Visualization
Pandas integrates with Matplotlib for data visualization.

In [None]:
import matplotlib.pyplot as plt


df['Column'].plot(kind='bar')  # Basic plot

df['Sales'].plot(kind='bar')   # Create a bar chart
plt.show()

df.plot(x='X', y='Y', title='Custom Plot', color='red')  # Customized plot

<a id = "4"></a>
# 4. DataFrame Operations

<a id = "4.1"></a>
### 4.1. Merge and Join
Merging and joining DataFrames is crucial for combining data from multiple sources.

In [None]:
pd.merge(df1, df2, on='common_column')  # Merge two DataFrames on a common column

<a id = "4.2"></a>
### 4.2. Reshaping
Reshaping involves pivoting, melting and stacking dataframes.

In [None]:
pd.pivot_table(df, values='Value', index='Index', columns='Column')    # Pivot a DataFrame

<a id = "4.3"></a>
### 4.3. Pivoting
Pivoting is used to transform data from long to wide format.

In [None]:
df.pivot(index='Date', columns='Variable', values='Value')   # Pivot a long-format DataFrame to wide-format

<a id = "5"></a>
# 5. Time Series
Pandas provides tools for working with time series data.

In [None]:
df['Date'] = pd.to_datetime(df['Date'])   # Create a time series DataFrame

<a id = "6"></a>
# 6. Advanced Topics

<a id = "6.1"></a>
### 6.1. Applying Functions
Apply custom functions to your data.

In [None]:
df['Age'] = df['Age'].apply(lambda x: x + 1)  # Apply a custom function to a column

<a id = "6.2"></a>
### 6.2. Custom Functions
Define custom functions for more complex data manipulation.

In [None]:
def categorize_age(age):     # Define a custom function
    if age < 30:
        return 'Young'
    else:
        return 'Old'

df['Age_Category'] = df['Age'].apply(categorize_age)   # Apply the custom function to a new column

<a id = "6.3"></a>
### 6.3. Combining DataFrames
Combine DataFrames horizontally or vertically.

In [None]:
new_df = pd.concat([df1, df2], axis=0)   # Concatenate two DataFrames vertically