<a href="https://colab.research.google.com/github/navyagona/python-practice/blob/main/26_06_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Introduction

Data manipulation is an essential skill in data science, allowing professionals to transform, clean, and reshape data for analysis and modeling. Pandas, a widely-used Python library, is central to this process. This document explores the importance of data manipulation, the role of Pandas, and the benefits of using Pandas for data cleaning and analysis.

## Importance of Data Manipulation in Data Science and Analysis

Data manipulation involves changing the structure and format of data to make it useful for analysis. It is crucial because:

- Data Quality: Raw data is often messy, incomplete, or inaccurate. Data manipulation helps clean and validate it.
- Data Reshaping: Data may not always be in the ideal format for analysis. Data manipulation allows you to reshape it for tasks like merging, aggregating, or pivoting.
- Feature Engineering: Creating new features from existing data is vital for improving machine learning models.
- Exploration and Visualization: Preparing data for exploratory data analysis (EDA) helps visualize patterns and relationships.

Without proper data manipulation, even advanced algorithms and techniques can't provide reliable results.

## Pandas: A Library for Handling Structured Data in Python

Pandas is a powerful Python library for data manipulation and analysis. It offers flexible and efficient tools for working with structured data. Key features include:

- DataFrames and Series: Pandas uses DataFrames (2D tables) and Series (1D arrays) to represent data, similar to tables in databases or Excel.
- Comprehensive Data Operations: Pandas allows you to filter, sort, group, merge, concatenate, pivot, and reshape data easily.
- Handling Missing Data: Pandas provides methods to detect and handle missing or null values.
- Integration with Other Libraries: Pandas works well with other Python libraries like NumPy, SciPy, and matplotlib.

## Creating a Pandas Series

we can create a Pandas Series from lists, arrays, and dictionaries. The process is straightforward, and we can customize the index as needed.

### From Lists

```python
import pandas as pd

# Create a Series from a list
data = [5, 10, 15, 20]
s = pd.Series(data, index=["A", "B", "C", "D"])

print("Series from list:")
print(s)
```

Here, a Series is created from a list with custom index labels.

### From Arrays

```
import numpy as np

# Create a Series from a NumPy array
array_data = np.array([2.5, 3.5, 4.5, 5.5])
s_from_array = pd.Series(array_data, index=["X", "Y", "Z", "W"])

print("Series from array:")
print(s_from_array)
```

In this example, a Series is created from a NumPy array, demonstrating integration with other libraries.

### From Dictionaries

```
# Create a Series from a dictionary
dict_data = {"Red": 1, "Green": 2, "Blue": 3}
s_from_dict = pd.Series(dict_data)

print("Series from dictionary:")
print(s_from_dict)
```

Creating a Series from a dictionary uses the dictionary keys as the index, making it easy to convert dictionary data to a Pandas structure.

## Exploring Series Attributes and Methods

Pandas Series has several useful attributes and methods for data manipulation, allowing operations on individual elements or the entire Series.

### Common Attributes

- `s.index`: Returns the index labels of the Series.
- `s.values`: Returns the underlying values in the Series as a NumPy array.
- `s.dtype`: Provides the data type of the Series elements.
- `s.size`: Returns the number of elements in the Series.

#### Example: Exploring Series attributes

```
print("Index:", s.index)
print("Values:", s.values)
print("Data type:", s.dtype)
print("Size:", s.size)
```

These attributes give you a quick overview of the Series, helping you understand its structure and basic characteristics.

### Common Methods

Pandas Series methods allow you to perform various operations, including arithmetic, data manipulation, and indexing.

- `s.head(n)`: Returns the first n elements of the Series.
- `s.tail(n)`: Returns the last n elements.
- `s.sort_values()`: Sorts the Series by its values.
- `s.mean()`, `s.median()`, `s.std()`: Compute common statistics.
- `s.str`: Provides string manipulation methods if the Series contains string data.
- `s.apply(func)`: Applies a function to each element in the Series.

#### Example: Using Series methods

```
print("First two elements:", s.head(2))
print("Last two elements:", s.tail(2))
print("Sorted values:", s.sort_values())
print("Mean of the Series:", s.mean())
```