# Pandas Series Fundamentals

## Overview
A **Series** is a one-dimensional labeled array that can hold data of any type (integers, strings, floats, Python objects, etc.). It is one of the primary data structures in pandas.

## Key Characteristics:
- One-dimensional array
- Labeled index
- Homogeneous data type (per Series)
- Size-immutable (cannot change size after creation)

In [1]:
import pandas as pd
import numpy as np

## 1. Creating Series from Lists

### Basic Syntax:
```python
pd.Series(data, index=index_labels)

In [4]:

# Example 1: Basic Series from list
data_list = [10, 20, 30, 40, 50]
series1 = pd.Series(data_list)
print("Series from list (default index):")
print(series1)
print()

# Example 2: Series with custom index
custom_index = ['a', 'b', 'c', 'd', 'e']
series2 = pd.Series(data_list, index=custom_index)
print("Series from list (custom index):")
print(series2)

Series from list (default index):
0    10
1    20
2    30
3    40
4    50
dtype: int64

Series from list (custom index):
a    10
b    20
c    30
d    40
e    50
dtype: int64


## 2. Creating Series from NumPy Arrays

### Advantages:
- Efficient memory usage
- Fast operations
- Direct compatibility with NumPy functions

In [5]:
# Example 1: Series from NumPy array
np_array = np.array([1.5, 2.5, 3.5, 4.5])
series3 = pd.Series(np_array, index=['w', 'x', 'y', 'z'])
print("Series from NumPy array:")
print(series3)
print()

# Example 2: Series with special NumPy arrays
zeros_array = np.zeros(5)
series4 = pd.Series(zeros_array, index=list('ABCDE'))
print("Series from np.zeros():")
print(series4)

Series from NumPy array:
w    1.5
x    2.5
y    3.5
z    4.5
dtype: float64

Series from np.zeros():
A    0.0
B    0.0
C    0.0
D    0.0
E    0.0
dtype: float64


## 3. Creating Series from Dictionaries

### Important:
- Dictionary keys become the Series index
- Dictionary values become the Series data
- Order is preserved (Python 3.7+)

In [6]:
# Example 1: Basic dictionary to Series
data_dict = {'Jan': 100, 'Feb': 150, 'Mar': 200, 'Apr': 175}
series5 = pd.Series(data_dict)
print("Series from dictionary:")
print(series5)
print()

# Example 2: Dictionary with missing index specification
series6 = pd.Series(data_dict, index=['Jan', 'Mar', 'May'])
print("Series with selected/partial index:")
print(series6)
print("Note: 'May' gets NaN (not in original dict)")

Series from dictionary:
Jan    100
Feb    150
Mar    200
Apr    175
dtype: int64

Series with selected/partial index:
Jan    100.0
Mar    200.0
May      NaN
dtype: float64
Note: 'May' gets NaN (not in original dict)


## 4. Series Attributes

### Commonly Used Attributes:
- **.index**: Returns the index object
- **.values**: Returns the data as a NumPy array
- **.dtype**: Returns the data type
- **.shape**: Returns a tuple with dimensions
- **.size**: Returns number of elements
- **.name**: Name of the Series
- **.index.name**: Name of the index

In [7]:
# Create a sample series
sample_series = pd.Series(
    [88, 92, 79, 85, 95],
    index=['Alice', 'Bob', 'Charlie', 'Diana', 'Evan'],
    name='Math_Scores'
)
sample_series.index.name = 'Students'

print("Sample Series:")
print(sample_series)
print("\n--- Series Attributes ---")
print(f"Index: {sample_series.index}")
print(f"Values: {sample_series.values}")
print(f"Data type: {sample_series.dtype}")
print(f"Shape: {sample_series.shape}")
print(f"Size: {sample_series.size}")
print(f"Series name: {sample_series.name}")
print(f"Index name: {sample_series.index.name}")

Sample Series:
Students
Alice      88
Bob        92
Charlie    79
Diana      85
Evan       95
Name: Math_Scores, dtype: int64

--- Series Attributes ---
Index: Index(['Alice', 'Bob', 'Charlie', 'Diana', 'Evan'], dtype='object', name='Students')
Values: [88 92 79 85 95]
Data type: int64
Shape: (5,)
Size: 5
Series name: Math_Scores
Index name: Students


## 5. Indexing and Slicing Series

### Label-based Indexing:
- Use index labels to access values
- Similar to dictionary key access

In [8]:
# Create a series for demonstration
temps = pd.Series(
    [72, 68, 75, 79, 74, 70],
    index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
)
print("Temperature Series:")
print(temps)
print()

# Single element access
print(f"Wednesday temperature: {temps['Wed']}")
print(f"Friday temperature: {temps['Fri']}")

# Multiple elements access
print(f"\nMidweek temperatures:\n{temps[['Tue', 'Wed', 'Thu']]}")

# Using .loc for explicit label-based indexing
print(f"\nUsing .loc['Mon':'Thu']:\n{temps.loc['Mon':'Thu']}")

Temperature Series:
Mon    72
Tue    68
Wed    75
Thu    79
Fri    74
Sat    70
dtype: int64

Wednesday temperature: 75
Friday temperature: 74

Midweek temperatures:
Tue    68
Wed    75
Thu    79
dtype: int64

Using .loc['Mon':'Thu']:
Mon    72
Tue    68
Wed    75
Thu    79
dtype: int64


## 6. Position-based Indexing

### Using .iloc:
- Integer position based (0-based indexing)
- Similar to list/array indexing

In [10]:
# Same temperature series
print("Temperature Series:")
print(temps)
print()

# Single element by position
print(f"First element (position 0): {temps.iloc[0]}")
print(f"Third element (position 2): {temps.iloc[2]}")

# Slicing by position
print(f"\nFirst 3 days: {temps.iloc[:3]}")
print(f"Last 2 days: {temps.iloc[-2:]}")
print(f"Middle days (positions 2-4): {temps.iloc[2:5]}")

# Multiple non-consecutive positions
print(f"\nDays at positions 0, 2, 4:\n{temps.iloc[[0, 2, 4]]}")

Temperature Series:
Mon    72
Tue    68
Wed    75
Thu    79
Fri    74
Sat    70
dtype: int64

First element (position 0): 72
Third element (position 2): 75

First 3 days: Mon    72
Tue    68
Wed    75
dtype: int64
Last 2 days: Fri    74
Sat    70
dtype: int64
Middle days (positions 2-4): Wed    75
Thu    79
Fri    74
dtype: int64

Days at positions 0, 2, 4:
Mon    72
Wed    75
Fri    74
dtype: int64


## 7. Boolean Indexing

### Concept:
- Filter data using boolean conditions
- Returns a Series of True/False values
- Useful for conditional selection

In [11]:
# Create a sales series
sales = pd.Series(
    [1500, 2200, 1800, 3100, 1950, 2800],
    index=['Store_A', 'Store_B', 'Store_C', 'Store_D', 'Store_E', 'Store_F']
)
print("Sales Data:")
print(sales)
print()

# Boolean condition
high_sales = sales > 2000
print("Boolean mask (sales > 2000):")
print(high_sales)
print()

# Filter using boolean mask
print("Stores with sales > 2000:")
print(sales[high_sales])
print()

# Direct conditional filtering
print("Stores with sales between 1800 and 2500:")
print(sales[(sales >= 1800) & (sales <= 2500)])

Sales Data:
Store_A    1500
Store_B    2200
Store_C    1800
Store_D    3100
Store_E    1950
Store_F    2800
dtype: int64

Boolean mask (sales > 2000):
Store_A    False
Store_B     True
Store_C    False
Store_D     True
Store_E    False
Store_F     True
dtype: bool

Stores with sales > 2000:
Store_B    2200
Store_D    3100
Store_F    2800
dtype: int64

Stores with sales between 1800 and 2500:
Store_B    2200
Store_C    1800
Store_E    1950
dtype: int64


## 8. Basic Operations on Series

### Arithmetic Operations:
- Element-wise operations
- Broadcasting with scalars
- Alignment by index for Series-Series operations

In [13]:
# Create two series
series_a = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])
series_b = pd.Series([5, 15, 25, 35], index=['A', 'B', 'C', 'D'])

print("Series A:")
print(series_a)
print("\nSeries B:")
print(series_b)

# Arithmetic operations
print("\n--- Arithmetic Operations ---")
print(f"Addition:\n{series_a + series_b}")
print(f"\nSubtraction:\n{series_a - series_b}")
print(f"\nMultiplication:\n{series_a * series_b}")
print(f"\nDivision:\n{series_a / series_b}")

# Operations with scalar
print(f"\n--- Operations with Scalar ---")
print(f"Series A + 100:\n{series_a + 100}")
print(f"\nSeries B * 2:\n{series_b * 2}")

# Mathematical functions
print(f"\n--- Mathematical Functions ---")
print(f"Square root of Series A:\n{np.sqrt(series_a)}")
print(f"\nLogarithm of Series B:\n{np.log(series_b)}")

Series A:
A    10
B    20
C    30
D    40
dtype: int64

Series B:
A     5
B    15
C    25
D    35
dtype: int64

--- Arithmetic Operations ---
Addition:
A    15
B    35
C    55
D    75
dtype: int64

Subtraction:
A    5
B    5
C    5
D    5
dtype: int64

Multiplication:
A      50
B     300
C     750
D    1400
dtype: int64

Division:
A    2.000000
B    1.333333
C    1.200000
D    1.142857
dtype: float64

--- Operations with Scalar ---
Series A + 100:
A    110
B    120
C    130
D    140
dtype: int64

Series B * 2:
A    10
B    30
C    50
D    70
dtype: int64

--- Mathematical Functions ---
Square root of Series A:
A    3.162278
B    4.472136
C    5.477226
D    6.324555
dtype: float64

Logarithm of Series B:
A    1.609438
B    2.708050
C    3.218876
D    3.555348
dtype: float64


## 9. Index Alignment in Operations

### Important Behavior:
- When operating on two Series, pandas aligns data by index
- Missing values (NaN) appear where indices don't match

In [14]:
# Series with different indices
series_x = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series_y = pd.Series([4, 5, 6], index=['B', 'C', 'D'])

print("Series X:")
print(series_x)
print("\nSeries Y:")
print(series_y)

# Operation with index alignment
print("\n--- Addition with Index Alignment ---")
result = series_x + series_y
print(f"X + Y:\n{result}")
print(f"\nNote: Indices 'A' and 'D' result in NaN (no matching index)")

Series X:
A    1
B    2
C    3
dtype: int64

Series Y:
B    4
C    5
D    6
dtype: int64

--- Addition with Index Alignment ---
X + Y:
A    NaN
B    6.0
C    8.0
D    NaN
dtype: float64

Note: Indices 'A' and 'D' result in NaN (no matching index)


## 10. Useful Series Methods

### Common Methods:
- Statistical methods: .mean(), .sum(), .std(), .min(), .max()
- Data handling: .isna(), .fillna(), .dropna()
- Information: .head(), .tail(), .describe()

In [16]:
# Create a series with some missing values
data_with_nan = pd.Series([10, np.nan, 30, 40, np.nan, 60])
print("Original Series (with NaN):")
print(data_with_nan)
print()

# Statistical methods
print("--- Statistical Methods ---")
print(f"Mean: {data_with_nan.mean():.2f}")
print(f"Sum: {data_with_nan.sum():.2f}")
print(f"Standard deviation: {data_with_nan.std():.2f}")
print(f"Minimum: {data_with_nan.min()}")
print(f"Maximum: {data_with_nan.max()}")
print()

# Handling missing values
print("--- Handling Missing Values ---")
print(f"Is NaN?:\n{data_with_nan.isna()}")
print(f"\nFill NaN with 0:\n{data_with_nan.fillna(0)}")
print(f"\nDrop NaN values:\n{data_with_nan.dropna()}")
print()

# Information methods
print("--- Information Methods ---")
print("First 3 values:", data_with_nan.head(3).tolist())
print("Last 3 values:", data_with_nan.tail(3).tolist())
print("\nSummary statistics:")
print(data_with_nan.describe())

Original Series (with NaN):
0    10.0
1     NaN
2    30.0
3    40.0
4     NaN
5    60.0
dtype: float64

--- Statistical Methods ---
Mean: 35.00
Sum: 140.00
Standard deviation: 20.82
Minimum: 10.0
Maximum: 60.0

--- Handling Missing Values ---
Is NaN?:
0    False
1     True
2    False
3    False
4     True
5    False
dtype: bool

Fill NaN with 0:
0    10.0
1     0.0
2    30.0
3    40.0
4     0.0
5    60.0
dtype: float64

Drop NaN values:
0    10.0
2    30.0
3    40.0
5    60.0
dtype: float64

--- Information Methods ---
First 3 values: [10.0, nan, 30.0]
Last 3 values: [40.0, nan, 60.0]

Summary statistics:
count     4.00000
mean     35.00000
std      20.81666
min      10.00000
25%      25.00000
50%      35.00000
75%      45.00000
max      60.00000
dtype: float64


# Summary: Pandas Series Basics

## Key Points:

### 1. **Creating Series**
- From lists: `pd.Series([1, 2, 3])`
- From arrays: `pd.Series(np.array([1, 2, 3]))`
- From dictionaries: `pd.Series({'a': 1, 'b': 2})`

### 2. **Series Attributes**
- `.index`: Get index labels
- `.values`: Get data as NumPy array
- `.dtype`: Get data type
- `.shape`: Get dimensions (n,)
- `.size`: Get number of elements

### 3. **Indexing & Slicing**
- Label-based: `series['label']` or `series.loc['label']`
- Position-based: `series.iloc[position]`
- Boolean indexing: `series[series > value]`

### 4. **Basic Operations**
- Arithmetic: `+`, `-`, `*`, `/` work element-wise
- Index alignment in Series-Series operations
- Methods: `.mean()`, `.sum()`, `.fillna()`, etc.

## Best Practices:
1. Always be aware of your Series index
2. Use `.loc` for label-based and `.iloc` for position-based indexing
3. Check for NaN values after operations with misaligned indices
4. Leverage vectorized operations for performance

# Practice Exercises

## Exercise 1: Create and Manipulate Series
1. Create a Series from the list `[5, 10, 15, 20, 25]` with index `['A', 'B', 'C', 'D', 'E']`
2. Extract values at positions 1, 3, and 4 using `.iloc`
3. Extract values with labels 'B', 'D', and 'E' using `.loc`

## Exercise 2: Operations
1. Create two Series: `s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])` and `s2 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])`
2. Add them together and observe the index alignment
3. Fill NaN values with 0 in the result

## Exercise 3: Boolean Indexing
1. Create a Series of 10 random numbers between 1 and 100
2. Find all values greater than 50
3. Calculate the mean of values greater than 50


## Common Pitfalls to Avoid:
1. Confusing `.loc` (label-based) with `.iloc` (position-based)
2. Forgetting that operations align by index, not position
3. Not handling NaN values after operations
4. Assuming index is always sequential integers

