# Introduction to pandas

pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It provides data structures for efficiently storing large datasets and tools for reshaping, aggregating, and analyzing the data.

## Key Features of pandas:
- Data structures for 1D (`Series`) and 2D (`DataFrame`) data.
- Tools for reading and writing data from various formats (CSV, Excel, SQL databases, etc.).
- Data alignment and missing data handling.
- Reshaping and pivoting datasets.
- Label-based slicing, indexing, and subsetting.
- Grouping and aggregation.
- High-performance merging and joining of data.
- Time series functionality.

In this notebook, we'll explore the basic functionalities of pandas, along with sample code snippets and exercises to reinforce the concepts.

## 1. pandas Data Structures

### A. Series

A `Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It is akin to a column in a spreadsheet or a dataset in R. The basic method to create a series is `pd.Series(data, index=index)`, where `data` can be many different things, and `index` is a list of axis labels.

### B. DataFrame

A `DataFrame` is a two-dimensional labeled data structure with columns that can be of different types, similar to a spreadsheet, SQL table, or a dictionary of `Series` objects. It is generally understood as a table of data. You can think of it as an Excel spreadsheet or a SQL table. DataFrames can be created in various ways, one of the most common being from a dictionary of equal-length lists or NumPy arrays.

In [None]:
# Importing pandas library
import pandas as pd

# Sample Code: Series
series_data = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
series_data

In [None]:
# Sample Code: DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Occupation': ['Engineer', 'Doctor', 'Teacher', 'Lawyer']
}
df = pd.DataFrame(data)
df

### Exercise 1: Series and DataFrame

1. Create a `Series` object with the values `[10, 20, 30, 40, 50]` and indices as the first five letters of the alphabet.
2. Create a `DataFrame` from the dictionary:
   ```python
   data = {
       'Country': ['USA', 'Canada', 'Mexico', 'Brazil'],
       'Capital': ['Washington D.C.', 'Ottawa', 'Mexico City', 'Brasilia'],
       'Population': [328, 37, 126, 211]  # in millions
   }
   ```

### Answer Key for Exercise 1: Series and DataFrame

1. Creating a `Series` object:
   ```python
   series_obj = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
   series_obj
   ```

2. Creating a `DataFrame` from the dictionary:
   ```python
   data_dict = {
       'Country': ['USA', 'Canada', 'Mexico', 'Brazil'],
       'Capital': ['Washington D.C.', 'Ottawa', 'Mexico City', 'Brasilia'],
       'Population': [328, 37, 126, 211]
   }
   df_data = pd.DataFrame(data_dict)
   df_data
   ```

## 2. Basic Operations with DataFrames

### A. Indexing and Selecting Data

pandas provides various methods to have purely label-based indexing. When slicing, the start bound is also included. Integers are valid labels, but they refer to the label and not the position.

### B. Filtering Data

You can use boolean indexing to filter data in a DataFrame. This involves using a condition to return rows that meet certain criteria.

### C. Adding and Dropping Columns

Columns can be added to a DataFrame by simply defining a new column, and they can be dropped using the `drop` method.

In [None]:
# Sample Code: Indexing and Selecting Data

# Selecting the 'Name' column
name_column = df['Name']

# Selecting the first three rows
first_three_rows = df.iloc[:3]

name_column, first_three_rows

(0      Alice
 1        Bob
 2    Charlie
 3      David
 Name: Name, dtype: object,
       Name  Age Occupation
 0    Alice   25   Engineer
 1      Bob   30     Doctor
 2  Charlie   35    Teacher)

In [None]:
# Sample Code: Filtering Data

# Filtering rows where Age is greater than 30
filtered_data = df[df['Age'] > 30]

# Sample Code: Adding and Dropping Columns

# Adding a new column 'Salary'
df['Salary'] = [70000, 80000, 75000, 85000]

# Dropping the 'Salary' column
df_dropped = df.drop(columns=['Salary'])

filtered_data, df, df_dropped

(      Name  Age Occupation
 2  Charlie   35    Teacher
 3    David   40     Lawyer,
       Name  Age Occupation  Salary
 0    Alice   25   Engineer   70000
 1      Bob   30     Doctor   80000
 2  Charlie   35    Teacher   75000
 3    David   40     Lawyer   85000,
       Name  Age Occupation
 0    Alice   25   Engineer
 1      Bob   30     Doctor
 2  Charlie   35    Teacher
 3    David   40     Lawyer)