# **Pandas: A Comprehensive Guide**

## **Introduction**

Pandas is a powerful Python library for data manipulation and analysis. It provides flexible data structures such as `Series` and `DataFrame` to work efficiently with structured data.

### **Installation**

To install Pandas, use:

```bash
pip install pandas
```

Then, import it in Python  
```python
import pandas as pd
```

### **Core Data Structures**

#### **Series**

A series is a one-dimensional labeled array capable of holding any data type.

- ```python
    data = [10, 20, 30, 40]
    s = pd.Series(data, index=['A','B','C','D'])
    print(s)
    ```

- In Pandas Series, you can use any iterable for the data. This includes Lists, Tuples, NumPy Arrays, Dictionaries, Strings, Dictionaries, and Sets.
- In Pandas Series, you can use iterables such as Lists, Tuples, NumPy Arrays, Dictionaries, Strings, and Range as Index

- ```python
    data = (10, 20, 30, 40)
    s = pd.Series(data, index=['A','B','C','D'])
    print(s)
    ```

- ```python
    import pandas as pd
    import numpy as np

    data = (1,2,3,4,5)
    index = np.array(['A','B','C','D','E'])
    s = pd.Series(data, index)
    print(s)
    ```

#### **DataFrame**

A dataframe is a two-dimensional labeled data structure, similar to a table in SQL or Excel.

- Creating a Pandas DataFrame using **list of dictionaries**:
    - Each dictionary represents a row, and the keys become the column labels.

- ```python
    s = pd.DataFrame([{'Reptiles':'scales', 'Birds':'feathers'}, {'Reptiles':'regenerate tail', 'Birds':'does not regenerate'}])
    print(s)
    ```

- Creating a Pandas DataFrame using **dictionary of lists**
    - Each key in the dictionary represents a column label, and the values are lists of column data.

- ```python
    s = pd.DataFrame({'Reptiles':['scales', 'regenerate tail'], 'Birds':['feathers', 'does not regenerate']})
    print(s)
    ```

- Creating a Pandas DataFrame using **list of lists (or Tuples)**
    - Each inner list or tuple represents a row of data, and an optional columns argument can be used to specify the column names.

- ```python
    data = [['a',2],['b',3],['c',4]]

    s = pd.DataFrame(data, columns=['alphabet','numbers'])
    print(s)
    ```

- Creating a Pandas DataFrame using **single dictionary**
    - You can also pass a single dictionary with column names as keys and the corresponding values as lists.

- ```python
    data = {'alphabet':['a','b','c','d'], 'number':[1,2,3,4]}
    s = pd.DataFrame(data)
    print(s)
    ```

- Creating a Pandas DataFrame using **Numpy Array**
    - You can create a DataFrame by passing a numpy array and defining column labels.

- ```python
    import numpy as np
    import pandas as pd

    data = np.array([['a',1],['b',2]])
    s = pd.DataFrame(data, columns=['alphabet', 'number'])
    print(s)
    ```

- Creating a Pandas DataFrame from **CSV/Excel files**
    - You can read data from external files (e.g., CSV, Excel) into a DataFrame using Pandas' built-in functions.

- ```python
    data = pd.read_csv('data.csv')
    ```


### **Reading and Writing Data**

- Reading CSV Files
    ```python
    df = pd.read_csv('data.csv')
    print(df.head())
    ```

- Writing to CSV Files
    ```python
    df.to_csv('data.csv', index=False)
    ```

### **Basic Data Exploration**

#### **Viewing Data**

```python
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.info()) # Summary of the dataset
print(df.describe()) # Descriptive Statistical summary



In [10]:
import numpy as np
import pandas as pd

data = np.array([['a',1],['b',2]])
s = pd.DataFrame(data, columns=['alphabet', 'number'])
print(s)

  alphabet number
0        a      1
1        b      2
