# **Introduction to Pandas DataFrames**

<img src="Pandas.webp" alt="Pandas image" width="750" height="auto">

<a href="https://pandas.pydata.org/#italic" style="color: rgb(0,90,250); text-decoration: underline;text-decoration-style: dotted;">Pandas</a> is a powerful and versatile library for Python, designed primarily for data manipulation and  
analysis. To quote from Nvidia’s website:

> Pandas is the most popular software library for data manipulation and data analysis for the Python  
programming language. <a href="https://www.nvidia.com/en-us/glossary/pandas-python" style="color: rgb(0,90,250); text-decoration: underline;text-decoration-style: dotted;">www.nvidia.com</a>

Here is an (incomplete) list of some key functionalities provided by Pandas:

>#### 1. **Data Structures**
>>    &#9900; *Series*: One-dimensional labeled array capable of holding data of any type.   
>>    &#9900; *DataFrame*: Two-dimensional, size-mutable, potentially heterogeneous tabular data structure     
>>        with labeled axes (rows and columns).
>#### 2. **Data Manipulation**  
>>    &#9900; *Data Selection and Indexing*: Access data via labels, indices, or boolean masks (<code><span style='color:purple'>.loc, .iloc, .at, .iat</span></code>).      
>>   &#9900; *Filtering*: Filter data based on conditions or queries.     
>>    &#9900; *Sorting*: Sort data by labels or values.      
>>    &#9900; *Handling Missing Data*: Identify, fill, or drop missing values (<code><span style='color:purple'>isnull, dropna, fillna</span></code>). 
>#### 3. **Data Cleaning**  
>>    &#9900; *Dropping Duplicates*: Remove duplicate rows or columns.  
>>    &#9900; *Replacing Values*: Replace specific values in the DataFrame.  
>>    &#9900; *String Operations*: Perform operations on string data, like splitting, replacing, and pattern  
>>      matching (<code><span style='color:purple'>str.split, str.replace</span></code>).  
>#### 4. **Aggregation and Grouping** 
>>    &#9900; *Group By*: Split data into groups based on criteria, and perform aggregate functions like sum,  
>>      mean, or custom operations.  
>>    &#9900; *Pivot Tables*: Create a pivot table to summarize data.  
>#### 5. **Merging and Joining**  
> >   &#9900; *Concatenation*: Combine multiple DataFrames along a particular axis.  
> >&#9900; *Merging*: Merge DataFrames similar to SQL joins (<code><span style='color:purple'>merge, join</span></code>)
>#### 6. **Time Series**  
>>    &#9900; *Datetime Conversion*: Convert date and time data to a datetime object.  
>>    &#9900; *Resampling*: Aggregate data over a time period. 
>>    &#9900; *Time-based Indexing*: Access and manipulate time-series data easily with date indexing.  
>#### 7. **Statistical and Mathematical Operations**    
>>    &#9900; *Descriptive Statistics*: Compute summary statistics for DataFrame columns.  
>>    &#9900; *Correlation/ Covariance*: Calculate the pairwise correlation or covariance between columns.  
>>    &#9900; *Cumulative Operations*: Perform cumulative operations on data.  

At the heart of Pandas lies the DataFrame, a two-dimensional labeled data structure with columns of   
potentially different types, similar to a table in a relational database or an Excel spreadsheet.   
Understanding DataFrames is crucial for anyone looking to perform data analysis in Python.

# What is a DataFrame?

A DataFrame is a table-like structure in Pandas that consists of rows and columns, where each column can  
hold different data types (e.g., integers,   
floats, strings). You can think of it as a collection of Series objects,  
where each Series is a single column of data. DataFrames provide a highly efficient way to store and   
manipulate large datasets in memory.

# Creating a DataFrame

There are several ways to create a DataFrame in Pandas, but some of the most common methods are:

>    1. From a Dictionary
>    2. From a List of Lists
>    3. From a CSV File

Below we take a look at the first two approaches.

## Creating DataFrame from a Dictionary
The following code will create a DataFrame with three columns: ‘Name’, ‘Age’, and ‘City’, and three rows   
corresponding to the data provided in the dictionary. To be able to use Pandas, we first have to import it.   
This is done using the command <code><span style='color:purple'>import pandas as pd</span></code>, introducing the alias <code><span style='color:purple'>pd</span></code> for Pandas.