This repository serves as an introduction to Pandas, which is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrame
and Series
, designed to handle and manipulate large datasets efficiently. Pandas is widely used in data science and machine learning for tasks like data cleaning, exploration, and visualization.
This repository contains solutions to various LeetCode problems, where we use Pandas to manipulate and solve problems more effectively, especially those dealing with large datasets and structured data.
- Introduction
- Installation
- Pandas Basics
- Pandas CheetSheet
- LeetCode Problems
- Problem 1: Problem Title
- Problem 2: Problem Title
- Useful Links
Pandas is an open-source data manipulation and analysis library for Python. It offers data structures like Series
(for one-dimensional data) and DataFrame
(for two-dimensional data) that make it easy to manipulate, analyze, and visualize data in various formats (CSV, Excel, SQL databases, JSON, etc.).
-
Data Structures: Pandas primarily offers two data structures:
Series
: A one-dimensional labeled array, similar to a list or array.DataFrame
: A two-dimensional labeled data structure, like a table or spreadsheet, where each column can be a different type.
-
Data Alignment: Automatically aligns data based on labels or indexes during operations.
-
Handling Missing Data: Pandas provides methods to handle missing data in datasets.
-
Data Aggregation: Allows you to group data and perform operations like sum, mean, count, etc.
-
Merging and Joining: Easily combine different datasets using joins or merges.
-
Filtering and Sorting: Provides rich functionality for filtering and sorting data based on multiple conditions.
- Efficiency: Pandas is optimized for performance and can handle large datasets much faster than traditional Python data structures.
- Easy to Use: With just a few lines of code, you can manipulate and analyze data easily.
- Integration: It integrates well with other libraries such as NumPy, Matplotlib, and Scikit-learn, making it ideal for data science and machine learning workflows.
To get started with Pandas, you need to install the library using the following command:
pip install pandas
Additionally, you might also need other dependencies, such as numpy
, which Pandas relies on for numerical computations:
pip install numpy
Here are some fundamental Pandas operations you'll use to solve problems in this repository.
import pandas as pd
A DataFrame
is the core data structure in Pandas, and it can be created from various data sources like lists, dictionaries, or external files.
# Creating a DataFrame from a dictionary
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]}
df = pd.DataFrame(data)
-
View the first few rows:
print(df.head())
-
Get DataFrame info:
print(df.info())
-
Descriptive statistics:
print(df.describe())
-
Select a single column:
df['Name']
-
Select multiple columns:
df[['Name', 'Age']]
-
Row selection by index:
df.iloc[0] # Selects the first row
- Handling missing values:
df.isna().sum() # Check for missing values df.dropna() # Drop rows with missing values df.fillna(value=0) # Replace NaN with a specific value
- Group by a column:
df.groupby('Age').mean() # Group by 'Age' and calculate the mean
- Merge two DataFrames:
df1.merge(df2, on='column_name')
---
This repository includes solutions to various LeetCode problems using Pandas. Below are some examples:
Problem 1: Problem Title
- Description: [Brief description of the problem]
- Solution:
# Pandas solution code
Problem 2: Problem Title
- Description: [Brief description of the problem]
- Solution:
# Pandas solution code
Feel free to explore the other problems and their solutions in the solutions/
folder.