# <font color="#418FDE" size="6.5" uppercase>**Series and DataFrames**</font>

>Last update: 20251225.
    
By the end of this Lecture, you will be able to:
- Create Series and DataFrame objects from Python data structures and external files. 
- Explain how indexes and labels work in Series and DataFrames. 
- Perform basic inspection and selection operations on DataFrames using idiomatic Pandas syntax. 


## **1. Pandas Series Basics**

### **1.1. Building Series Objects**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_01_01.jpg?v=1766640262" width="250">



>* Series are labeled one-dimensional data containers
>* They add structure, labels, and useful behaviors

>* Lists or tuples become ordered Series objects
>* Dictionaries create labeled Series from key-value pairs

>* Create Series from files and query results
>* Use Series as simple step toward complex tables



In [None]:
#@title Python Code - Building Series Objects

# Demonstrate building simple Series objects from basic Python data structures.
# Show Series from list, tuple, and dictionary with clear printed results.
# Help beginners see how raw values become labeled Series objects.

import pandas as pd

# Create a Series from a simple Python list of daily high temperatures in Fahrenheit.
# Pandas automatically assigns integer index labels starting from zero for this Series.
# This mirrors a basic array but adds helpful labeled behavior for later analysis.
temps_fahrenheit_list = [72, 75, 71, 69]
series_from_list = pd.Series(temps_fahrenheit_list)

# Create a Series from a tuple representing weekly store sales in dollars.
# Tuples behave like lists here, becoming a one dimensional labeled data container.
# This allows quick inspection and later mathematical operations on the sales values.
weekly_sales_tuple = (250.0, 310.5, 289.0)
series_from_tuple = pd.Series(weekly_sales_tuple)

# Create a Series from a dictionary mapping city names to population counts.
# Dictionary keys become index labels, and values become the Series data values.
# This structure is powerful because labels carry real world meaning for each value.
city_populations_dict = {"Dallas": 1300000, "Austin": 975000, "Houston": 2300000}
series_from_dict = pd.Series(city_populations_dict)

# Print each Series with a short label so beginners see clear differences.
# The printed output shows indexes, values, and the underlying data type information.
print("Series from list of daily high temperatures:")
print(series_from_list)

print("\nSeries from tuple of weekly store sales:")
print(series_from_tuple)

print("\nSeries from dictionary of city populations:")
print(series_from_dict)



### **1.2. Index and Values**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_01_02.jpg?v=1766640280" width="250">



>* Series has values plus an index of labels
>* Flexible labels make Series meaningful, self-describing data

>* Pandas auto-creates integer indexes if none given
>* Custom indexes keep labels matched through operations

>* Indexes enable label-based selection and lookup
>* Indexes drive automatic alignment and preserve context



In [None]:
#@title Python Code - Index and Values

# Show how Series index and values work together.
# Compare default integer index with custom label index.
# Demonstrate selecting values using labels and positions.

import pandas as pd

# Create a Series with default integer index from a simple list.
temperatures_fahrenheit = pd.Series([68, 70, 72, 75])

# Create a Series with custom index labels representing days of week.
temperatures_labeled = pd.Series([68, 70, 72, 75], index=["Mon", "Tue", "Wed", "Thu"])

# Print both Series to compare index and values side by side.
print("Default index Series with values:")
print(temperatures_fahrenheit)

# Show labeled Series where index carries meaningful day information.
print("\nLabeled index Series with values:")
print(temperatures_labeled)

# Access a value by label, demonstrating label based selection behavior.
print("\nTemperature on Wed using label:", temperatures_labeled["Wed"])

# Access a value by position, demonstrating integer position based selection.
print("Temperature at position zero using iloc:", temperatures_labeled.iloc[0])



### **1.3. Series Data Types**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_01_03.jpg?v=1766640301" width="250">



>* Each Series has one underlying data type
>* Type affects speed, memory, and missing values

>* Pandas adds powerful datetime and categorical types
>* Right data types make analysis easier and meaningful

>* Pandas guesses dtypes, but guesses can mislead
>* Regularly verify and fix dtypes to match meaning



In [None]:
#@title Python Code - Series Data Types

# Demonstrate how Series data types are inferred and changed.
# Show numeric, string, and categorical Series type behaviors.
# Highlight why choosing correct data types really matters.

import pandas as pd

# Create a temperature Series with floats and inspect its data type.
temps_fahrenheit = pd.Series([72.5, 68.0, 75.2, 70.0])
print("Temperature Series values:")
print(temps_fahrenheit)
print("Temperature Series dtype:", temps_fahrenheit.dtype)

# Create a postal code Series that should stay as strings, not integers.
postal_codes = pd.Series(["02115", "30301", "94105", "10001"])
print("\nPostal code Series values:")
print(postal_codes)
print("Postal code Series dtype:", postal_codes.dtype)

# Create a survey response Series and convert it to ordered categorical type.
survey_raw = pd.Series(["Agree", "Strongly agree", "Disagree", "Agree"])
order = ["Disagree", "Agree", "Strongly agree"]
survey_cat = pd.Categorical(survey_raw, categories=order, ordered=True)

# Wrap categorical data inside a Series and inspect its data type.
survey_series = pd.Series(survey_cat)
print("\nSurvey response Series values:")
print(survey_series)
print("Survey response Series dtype:", survey_series.dtype)



## **2. DataFrame Structure Overview**

### **2.1. Building DataFrames**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_02_01.jpg?v=1766640321" width="250">



>* DataFrames are labeled grids of rows and columns
>* Pandas adds a default index you can customize

>* Build DataFrames from dicts mapping columns to values
>* Index aligns rows, exposing length mismatches clearly

>* Choose meaningful row labels instead of defaults
>* Good indexes simplify selection, joining, and analysis



In [None]:
#@title Python Code - Building DataFrames

# Demonstrate building simple DataFrames from dictionaries with default and custom indexes.
# Show how column labels and row indexes give structure and meaning to data.
# Compare default integer index with a custom index using employee identifiers.

import pandas as pd

# Create a DataFrame using a dictionary with column labels as keys.
employee_data = {"name": ["Alice", "Bob", "Cara"], "department": ["Sales", "HR", "IT"], "salary_usd": [60000, 55000, 70000]}

df_default_index = pd.DataFrame(employee_data)

# Display the DataFrame with the automatically created default integer index.
print("DataFrame with default integer index:")
print(df_default_index)

# Create a list of custom row labels representing employee ID codes.
custom_ids = ["EMP100", "EMP101", "EMP102"]

df_custom_index = pd.DataFrame(employee_data, index=custom_ids)

# Display the DataFrame using the custom employee ID index labels.
print("\nDataFrame with custom employee ID index:")
print(df_custom_index)

# Access a single row using the custom index label to show practical benefits.
print("\nRow selected using custom index label EMP101:")
print(df_custom_index.loc["EMP101"])



### **2.2. Importing CSV DataFrames**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_02_02.jpg?v=1766640342" width="250">



>* CSV import sets DataFrame columns and index
>* Import choices affect clarity of later analysis

>* Use meaningful CSV columns as DataFrame index
>* Index choice reflects how you identify records

>* Handle missing headers, duplicates, and messy lines
>* Use composite indexes to reflect real structure



In [None]:
#@title Python Code - Importing CSV DataFrames

# Demonstrate importing CSV data into DataFrames with different index choices.
# Show default integer index and custom index using a meaningful column.
# Help beginners see how labels affect selection and inspection.

import pandas as pd
from io import StringIO

# Create a small CSV text block representing simple daily sales data.
csv_text = """date,store_id,sales_dollars
2024-01-01,Store_A,1500
2024-01-02,Store_A,1750
2024-01-01,Store_B,1600
"""

# Use StringIO to simulate a CSV file object for pandas read_csv function.
csv_file_like = StringIO(csv_text)

# Import CSV with default integer index, letting pandas assign row labels automatically.
df_default_index = pd.read_csv(csv_file_like)

# Display the DataFrame with default index to observe automatically created row labels.
print("Default index DataFrame:")
print(df_default_index)

# Recreate file like object because previous read_csv consumed the original StringIO buffer.
csv_file_like = StringIO(csv_text)

# Import CSV using date column as index, creating meaningful row labels for time based selection.
df_date_index = pd.read_csv(csv_file_like, index_col="date")

# Display the DataFrame with date index to compare label based structure and navigation.
print("\nDate index DataFrame:")
print(df_date_index)

# Select a single day using its date label, demonstrating intuitive label based row access.
print("\nSales for 2024-01-01:")
print(df_date_index.loc["2024-01-01"])



### **2.3. Inspecting DataFrame Structure**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_02_03.jpg?v=1766640362" width="250">



>* Understand DataFrame rows, columns, indexes, and labels
>* Confirm index meaning, column meanings, and table size

>* Indexes can be simple counters or meaningful labels
>* Index uniqueness, order, hierarchy affect selection, alignment

>* Check column names are clear and consistent
>* Understand labels to avoid mistakes in analysis



In [None]:
#@title Python Code - Inspecting DataFrame Structure

# Demonstrate inspecting DataFrame structure and understanding index and column labels.
# Show how default integer index differs from a custom labeled index.
# Print small summaries to keep output readable and beginner friendly.

import pandas as pd

# Create a simple DataFrame with default integer index.
patients_default_index = pd.DataFrame(
    {
        "age_years": [25, 40, 60],
        "diagnosis": ["flu", "sprain", "asthma"],
        "attending_physician": ["Smith", "Jones", "Brown"],
    }
)

# Inspect structure using shape, index, and columns attributes.
print("Default index shape:", patients_default_index.shape)
print("Default index labels:", patients_default_index.index)
print("Default column labels:", list(patients_default_index.columns))

# Create a DataFrame with a meaningful visit_id index label.
patients_labeled_index = patients_default_index.copy()
patients_labeled_index.index = ["visit_001", "visit_002", "visit_003"]
patients_labeled_index.index.name = "visit_id"

# Inspect structure again to see labeled index meaningfully.
print("\nLabeled index shape:", patients_labeled_index.shape)
print("Labeled index labels:", patients_labeled_index.index)
print("Labeled column labels:", list(patients_labeled_index.columns))

# Show how a labeled index changes selection semantics clearly.
print("\nSelect by labeled index:")
print(patients_labeled_index.loc["visit_002", ["age_years", "diagnosis"]])



## **3. DataFrame Indexing Basics**

### **3.1. Label Based Selection**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_03_01.jpg?v=1766640399" width="250">



>* Use index and column labels to select
>* Makes code clearer, matching real-world descriptions

>* Select single or multiple labels, rows, columns
>* Use label ranges to slice ordered indexes

>* Know how missing and duplicate labels behave
>* Combine labels, filters, and indexes for clarity



In [None]:
#@title Python Code - Label Based Selection

# Demonstrate basic label based selection using a small sales DataFrame.
# Show how to select rows and columns using index and column labels.
# Keep the example simple, readable, and beginner friendly for new learners.

import pandas as pd

# Create a small DataFrame with dates as index labels.
data = {"Electronics": [120, 150, 90], "Clothing": [80, 60, 75]}
index_labels = ["2024-03-01", "2024-03-02", "2024-03-03"]

sales_df = pd.DataFrame(data=data, index=index_labels)

# Display the full DataFrame to understand its structure.
print("Full sales DataFrame with date index labels:")
print(sales_df)

# Select a single row using its index label with loc accessor.
print("\nSales for specific date label 2024-03-02:")
print(sales_df.loc["2024-03-02"])

# Select a single column using its column label with loc accessor.
print("\nAll electronics sales using column label Electronics:")
print(sales_df.loc[:, "Electronics"])

# Select a rectangular block using label ranges for rows and specific columns.
print("\nElectronics sales for first two days using label range:")
print(sales_df.loc["2024-03-01":"2024-03-02", "Electronics"])




### **3.2. Position Based Indexing**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_03_02.jpg?v=1766640418" width="250">



>* Select rows and columns by integer positions
>* Useful for quick slices, independent of labels

>* Use integer positions to select flexible data slices
>* Remember zero-based positions change after sorting or filtering

>* Combine position indexing with filters for precise subsets
>* Treat DataFrame as ordered grid, independent of labels



In [None]:
#@title Python Code - Position Based Indexing

# Demonstrate basic DataFrame position based indexing with simple integer locations.
# Show how to select rows and columns using iloc with integer positions.
# Compare full DataFrame view with several iloc selections for clear understanding.

import pandas as pd

# Create a small DataFrame representing daily temperatures in three cities.
data = {
    "City": ["Boston", "Dallas", "Denver", "Miami", "Seattle"],
    "High_F": [75, 88, 70, 90, 68],
    "Low_F": [55, 70, 48, 78, 50],
}

# Build the DataFrame and display it for reference understanding.
df = pd.DataFrame(data)
print("Full DataFrame with default integer index:")
print(df)

# Use iloc to select the first two rows and all columns by position.
print("\nFirst two rows using iloc row positions:")
print(df.iloc[0:2, :])

# Use iloc to select specific row and column positions for one value.
print("\nSingle value at row zero column one using iloc:")
print(df.iloc[0, 1])

# Use iloc to select a rectangular block of rows and columns.
print("\nRows one to three and columns one to two using iloc:")
print(df.iloc[1:4, 1:3])

# Use iloc to select non contiguous rows and columns by integer lists.
print("\nNon contiguous rows and columns using iloc lists:")
print(df.iloc[[0, 3], [0, 2]])



### **3.3. Selecting DataFrame Columns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Pandas 2.3.1 A-Z/Module_01/Lecture_C/image_03_03.jpg?v=1766640437" width="250">



>* Select specific columns to focus analysis
>* Fewer columns reduce clutter and simplify exploration

>* Columns are labeled; select them by name
>* Single column returns Series; multiple return DataFrame

>* Combine column selection with row filtering or slicing
>* Use staged queries to isolate relevant variables



In [None]:
#@title Python Code - Selecting DataFrame Columns

# Demonstrate selecting DataFrame columns using simple sales data.
# Show difference between single column and multiple column selection.
# Keep output small, clear, and beginner friendly.

import pandas as pd

# Create a small DataFrame representing simple store sales data.
data = {
    "product": ["Tshirt", "Jeans", "Hat", "Shoes"],
    "price_usd": [15.0, 40.0, 12.0, 60.0],
    "quantity": [3, 1, 4, 2],
}

sales_df = pd.DataFrame(data)

# Show the full DataFrame to understand available columns.
print("Full sales DataFrame:")
print(sales_df)

# Select a single column, which returns a Series object.
price_series = sales_df["price_usd"]
print("\nSingle column as Series:")
print(price_series)

# Select multiple columns, which returns a smaller DataFrame.
price_quantity_df = sales_df[["product", "price_usd"]]
print("\nTwo selected columns as DataFrame:")
print(price_quantity_df)



# <font color="#418FDE" size="6.5" uppercase>**Series and DataFrames**</font>


In this lecture, you learned to:
- Create Series and DataFrame objects from Python data structures and external files. 
- Explain how indexes and labels work in Series and DataFrames. 
- Perform basic inspection and selection operations on DataFrames using idiomatic Pandas syntax. 

In the next Module (Module 2), we will go over 'Data Ingestion'