# Pandas in Python

**Pandas** is an open-source library in Python that provides data structures and data analysis tools for handling and manipulating structured data. It is widely used in data analysis, machine learning, data visualization, and more. Pandas is built on top of the **NumPy** library and offers two primary data structures: **Series** and **DataFrame**.

### Why do we use Pandas?

- **Data Manipulation:** Pandas makes it easy to clean, transform, and manipulate data. It offers functionalities like merging, grouping, reshaping, and aggregating data.
- **Efficient Data Handling:** It can handle large datasets efficiently, providing operations to select, filter, and transform data with minimal effort.
- **Data Analysis:** Pandas provides functions for statistical analysis, making it a crucial tool for data science and analysis.
- **Time Series Data:** It has built-in support for working with time series data, making it suitable for financial data analysis, forecasting, and more.
  
### Series

A **Series** is a one-dimensional labeled array capable of holding any data type (integer, float, string, Python objects, etc.). It is similar to a list or an array but with an index, which makes it more powerful.

#### Key Features of Series:
- A single column of data.
- Contains an **index** (labels) for each element in the series.
- Can hold data of any type (integers, strings, floats, etc.).

# Difference Between Series and DataFrame

Pandas provides two main data structures for handling data: **Series** and **DataFrame**. Both serve different purposes and have unique characteristics. Below is a detailed comparison of Series and DataFrame:

| **Feature**          | **Series**                                           | **DataFrame**                                         |
|----------------------|------------------------------------------------------|------------------------------------------------------|
| **Dimensionality**    | One-dimensional (1D) array-like structure            | Two-dimensional (2D) table-like structure            |
| **Structure**         | A single column of data with an associated index.    | Multiple columns of data, with each column potentially having a different data type. |
| **Data Types**        | Holds a single data type (e.g., integers, floats).   | Can hold multiple data types across different columns (e.g., integers, floats, strings). |
| **Indexing**          | Accessed via an index (labels for each element).     | Accessed via both row and column labels. Columns can be accessed by column name. |
| **Use Case**          | Ideal for handling one-dimensional data (e.g., list of numbers, strings). | Ideal for handling two-dimensional, tabular data (e.g., spreadsheets, database tables). |
| **Operations**        | Supports vectorized operations like filtering and transforming data. | Supports more complex operations like merging, reshaping, and aggregating data. |
| **Example**           | A list of integers with an index:                    | A table of structured data with multiple columns: |
|                      | ```python                                          | ```python                                            |
|                      | data = [1, 2, 3, 4, 5]                              | data = {'Name': ['Alice', 'Bob'],                   |
|                      | series = pd.Series(data)                            |         'Age': [25, 30]}                            |
|                      | print(series)                                       | df = pd.DataFrame(data)                             |
|                      | ```                                                | print(df)                                            |
|                      | Output:                                             | Output:                                             |
|                      | ```                                               | ```                                                |
|                      | 0    1                                             |     Name  Age                                      |
|                      | 1    2                                             | 0   Alice   25                                      |
|                      | 2    3                                             | 1     Bob   30                                      |
|                      | 3    4                                             | ```                                                |
|                      | 4    5                                             |                                                      |
|                      | dtype: int64                                        |                                                      |
|                      | ```                                                |                                                      |
| **Data Modification** | Supports operations like slicing, filtering, and transformation. | Supports similar operations plus merging, joining, and handling hierarchical data. |
| **Memory Efficiency** | More memory-efficient for smaller, single-column data. | Less memory-efficient than Series due to its multi-column structure. |

### Summary:
- **Series**: A single column of data with an index, best suited for one-dimensional data.
- **DataFrame**: A table-like structure with multiple columns and rows, best suited for two-dimensional data and complex data analysis.

Pandas' Series and DataFrame are the core data structures for working with structured data in Python. Series is for one-dimensional data, while DataFrame is for more complex, multi-dimensional data.

**Example:**

```python
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)


### installing and running Pandas

##### installation

In [2]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.



##### install an specific version


In [None]:
pip install pandas==1.0.4

##### upgrading pandas

In [4]:
pip install pandas --upgrade

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


###### Note
- to change a cell into markdown we can use esc+m and switch back to code cell esc+y
- to create a cell above a cell use esc+a or below use esc+b

#### Using pandas

##### importing

In [7]:
import pandas as pd


'2.2.3'

In [8]:
pd.__version__

'2.2.3'