# Introduction to Pandas DataFrames

![Panda hugging a python (UNIGIS, 2024)](/Pandas.webp)

[Pandas](https://pandas.pydata.org/) is a powerful and versatile library for Python, designed primarily for data manipulation and analysis. To quote from Nvidia’s website:

> Pandas is the most popular software library for data manipulation and data analysis for the Python programming language. ([www.nvidia.com](https://www.nvidia.com/en-us/glossary/pandas-python/))

Here is an (incomplete) list of some key functionalities provided by Pandas:

1. **Data Structures**
    - *Series*: One-dimensional labeled array capable of holding data of any type.
    - *DataFrame*: Two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
2. **Data Manipulation**
    - *Data Selection and Indexing*: Access data via labels, indices, or boolean masks (`.loc`, `.iloc`, `.at`, `.iat`).
    - *Filtering*: Filter data based on conditions or queries.
    - *Sorting*: Sort data by labels or values.
    - *Handling Missing Data*: Identify, fill, or drop missing values (`isnull`, `dropna`, `fillna`).
3. **Data Cleaning**
    - *Dropping Duplicates*: Remove duplicate rows or columns.
    - *Replacing Values*: Replace specific values in the DataFrame.
    - *String Operations*: Perform operations on string data, like splitting, replacing, and pattern matching (`str.split`, `str.replace`).
4. **Aggregation and Grouping**
    - *Group By*: Split data into groups based on criteria, and perform aggregate functions like sum, mean, or custom operations.
    - *Pivot Tables*: Create a pivot table to summarize data.
5. **Merging and Joining**
    - *Concatenation*: Combine multiple DataFrames along a particular axis.
    - *Merging*: Merge DataFrames similar to SQL joins (`merge`, `join`).
6. **Time Series**
    - *Datetime Conversion*: Convert date and time data to a datetime object.
    - *Resampling*: Aggregate data over a time period.
    - *Time-based Indexing*: Access and manipulate time-series data easily with date indexing.
7. **Statistical and Mathematical Operations**
    - *Descriptive Statistics*: Compute summary statistics for DataFrame columns.
    - *Correlation/ Covariance*: Calculate the pairwise correlation or covariance between columns.
    - *Cumulative Operations*: Perform cumulative operations on data.

At the heart of Pandas lies the DataFrame, a two-dimensional labeled data structure with columns of potentially different types, similar to a table in a relational database or an Excel spreadsheet. Understanding DataFrames is crucial for anyone looking to perform data analysis in Python.

## **What is a DataFrame?**

A DataFrame is a table-like structure in Pandas that consists of rows and columns, where each column can hold different data types (e.g., integers, floats, strings). You can think of it as a collection of Series objects, where each Series is a single column of data. DataFrames provide a highly efficient way to store and manipulate large datasets in memory.

## **Creating a DataFrame**

There are several ways to create a DataFrame in Pandas, but some of the most common methods are:

1. From a Dictionary
2. From a List of Lists
3. From a CSV File

Below we take a look at the first two approaches.

## **Creating DataFrame from a Dictionary**

The following code will create a DataFrame with three columns: ‘Name’, ‘Age’, and ‘City’, and three rows corresponding to the data provided in the dictionary. To be able to use Pandas, we first have to import it. This is done using the command `import pandas as pd`, introducing the alias `pd` for Pandas.

In [2]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


## **Creating DataFrame from a List of Lists**

Here, we create the DataFrame from a list of lists where each inner list represents the values for one row. Note that We explicitly specify the column names when creating the DataFrame.

In [None]:
data = [
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

## **Accessing Data in a DataFrame**

Once you have a DataFrame, you can access its data in various ways:

- By column name:

In [None]:
print(df['Name'])

- By row index:

In [None]:
print(df.iloc[0])

- Using both:

In [None]:
print(df.loc[0, 'Name'])

## **Conclusion**

Pandas DataFrames are a fundamental tool in the data analysis toolkit for Python users. They provide a powerful way to organize, manipulate, and analyze data efficiently. Whether you’re working with small datasets or handling large-scale data, mastering DataFrames will allow you to tackle a wide range of data-related tasks with ease.

In this introduction, we’ve covered the basics of what a DataFrame is, how to create one, and how to access columns, rows, and individual cells in a DataFrame. As you continue to explore Pandas, you’ll discover many more features and capabilities that make DataFrames an indispensable part of Python programming. The table below contains a number of resources related to Pandas.

| **Resource**                                                                                      | **Description**                                     |
| :------------------------------------------------------------------------------------------------ | :-------------------------------------------------- |
| [Pandas Documentation](https://pandas.pydata.org/)                                                | Official documentation for Pandas.                  |
| [Python for Data Analysis](https://www.oreilly.com/library/view/python-for-data/9781491957653/)   | Comprehensive guide by Pandas creator, Wes McKinney.|
| [Real Python: The Pandas DataFrame](https://realpython.com/pandas-dataframe/)                     | Tutorials on using Pandas for data analysis.        |
| [Kaggle Pandas](https://www.kaggle.com/learn/pandas)                                              | Free introductory course on Pandas by Kaggle.       |
| [DataCamp Pandas Tutorial](https://www.datacamp.com/tutorial/pandas-tutorial-dataframe-python)    | Detailed tutorials and exercises on Pandas.         |