# Python Fundamentals I: Data Types, Operators, and Collections

This notebook accompanies the lecture slides for Week 2. It contains all the code examples and practice problems from the presentation. You can run the code cells to see the output and work through the exercises to solidify your understanding.

---

# 1. Basic Data Types

## 1.1 Number

Python supports three main numeric types:
- `int`: Integer values (e.g., `5`, `-100`)
- `float`: Floating-point values (e.g., `3.14`, `-0.5`)
- `complex`: Complex numbers (e.g., `1+2j`)

In [53]:
# We can use the type() function to check the data type of a value.
print(type(5))       
print(type(5.0))     
print(type(1+2j))

<class 'int'>
<class 'float'>
<class 'complex'>


## 1.2 Boolean

The boolean data type, `bool`, represents one of two values: `True` or `False`. It's crucial for logical operations and control flow.

In a boolean context, certain values are considered "falsy":
- The number `0` (integer and float)
- An empty string `""`
- An empty collection (like a list `[]` or dictionary `{}`)
- The special value `None`

Almost everything else is considered "truthy".

In [54]:
# Booleans are often the result of comparisons
print(f"5 > 3 is {5 > 3}")

# Checking the boolean value of "falsy" items
print(f"The boolean value of 0 is {bool(0)}")

# Checking the boolean value of "truthy" items
print(f"The boolean value of 'hi' is {bool('hi')}")

5 > 3 is True
The boolean value of 0 is False
The boolean value of 'hi' is True


### Practice: Basic Data Types

**Challenge**: Predict the output and data type for each of the following code blocks before running the cell.

In [55]:
# Prediction 1
a = 15
print(f"Value of a: {a}")
print(f"Type of a: {type(a)}")

# Prediction 2
# Note: Standard division (/) in Python always results in a float.
b = a / 2
print(f"\nValue of b: {b}")
print(f"Type of b: {type(b)}")

# Prediction 3
# A string with a space is not empty, so it's truthy.
c = " "
print(f"\nValue of c: '{c}'")
print(f"Boolean value of c: {bool(c)}")

# Prediction 4
# 0.0 is a numeric zero, so it's falsy.
d = 0.0
print(f"\nValue of d: {d}")
print(f"Boolean value of d: {bool(d)}")

Value of a: 15
Type of a: <class 'int'>

Value of b: 7.5
Type of b: <class 'float'>

Value of c: ' '
Boolean value of c: True

Value of d: 0.0
Boolean value of d: False


---

# 2. Operators

Operators are special symbols that perform operations on values (operands).

## 2.1 Arithmetic Operators

Used for mathematical calculations.

In [56]:
a = 10
b = 3

print(f"{a} + {b}  = {a + b}   (Addition)")
print(f"{a} - {b}  = {a - b}   (Subtraction)")
print(f"{a} * {b}  = {a * b}   (Multiplication)")
print(f"{a} / {b}  = {a / b} (Float Division)")
print(f"{a} // {b} = {a // b}  (Floor Division - discards remainder)")
print(f"{a} % {b}  = {a % b}   (Modulus - returns remainder)")
print(f"{a} ** {b} = {a ** b} (Exponentiation)")

10 + 3  = 13   (Addition)
10 - 3  = 7   (Subtraction)
10 * 3  = 30   (Multiplication)
10 / 3  = 3.3333333333333335 (Float Division)
10 // 3 = 3  (Floor Division - discards remainder)
10 % 3  = 1   (Modulus - returns remainder)
10 ** 3 = 1000 (Exponentiation)


## 2.2 Relational Operators

Used to compare values. They always return a boolean (`True` or `False`).

In [57]:
a = 5
b = 10

print(f"{a} == {b} is {a == b}")
print(f"{a} != {b} is {a != b}")
print(f"{a} < {b} is {a < b}")
print(f"{a} > {b} is {a > b}")
print(f"{a} <= {b} is {a <= 5}")
print(f"{a} >= {b} is {a >= 5}")

# Python allows for chained comparisons
print(f"Is 15 < 25 < 30? {15 < 25 < 30}")

5 == 10 is False
5 != 10 is True
5 < 10 is True
5 > 10 is False
5 <= 10 is True
5 >= 10 is True
Is 15 < 25 < 30? True


## 2.3 Logical Operators

Used to combine boolean expressions.

In [58]:
x = 4
print(f"x = {x}")

# and: True only if both conditions are true
print(f"(x < 6) and (x > 2) is {(x < 6) and (x > 2)}")

# or: True if at least one condition is true
print(f"(x > 10) or (x % 2 == 0) is {(x > 10) or (x % 2 == 0)}")

# not: Inverts the boolean value
print(f"not (x < 5) is {not (x < 5)}")

x = 4
(x < 6) and (x > 2) is True
(x > 10) or (x % 2 == 0) is True
not (x < 5) is False


## 2.4 Bitwise Operators

Bitwise operators act on numbers at the binary (bit) level. To understand them, we can use the `bin()` function to see the binary representation of a number.

In [59]:
a = 6  # Binary: 0b110
b = 8  # Binary: 0b1000

print(f"a = {a}, binary = {bin(a)}")
print(f"b = {b}, binary = {bin(b)}")

# Bitwise OR (|): Sets each bit to 1 if one of two bits is 1
#   0110 (6)
# | 1000 (8)
# ------
# = 1110 (14)
result_or = a | b
print(f"\n{a} | {b} = {result_or} (binary: {bin(result_or)})")

# Bitwise AND (&): Sets each bit to 1 only if both bits are 1
#   0110 (6)
# & 1000 (8)
# ------
# = 0000 (0)
result_and = a & b
print(f"{a} & {b} = {result_and} (binary: {bin(result_and)})")

# Bitwise XOR (^): Sets each bit to 1 only if the two bits are different
result_xor = a ^ b
print(f"{a} ^ {b} = {result_xor} (binary: {bin(result_xor)})")

# Bitwise Right Shift (>>): Shifts bits to the right, chopping off the end
val = 20 # 10100
shift_right = val >> 2 # 101 -> 5
print(f"\n{val} ({bin(val)}) >> 2 = {shift_right} ({bin(shift_right)})")

# Bitwise Left Shift (<<): Shifts bits to the left, padding with zeros
shift_left = val << 2 # 1010000 -> 80
print(f"{val} ({bin(val)}) << 2 = {shift_left} ({bin(shift_left)})")

a = 6, binary = 0b110
b = 8, binary = 0b1000

6 | 8 = 14 (binary: 0b1110)
6 & 8 = 0 (binary: 0b0)
6 ^ 8 = 14 (binary: 0b1110)

20 (0b10100) >> 2 = 5 (0b101)
20 (0b10100) << 2 = 80 (0b1010000)


## 2.5 Assignment Operators

Used to assign values to variables, often as a shortcut for another operation.

In [60]:
a = 3
print(f"Initial value of a: {a}")

# a += 2 is the same as a = a + 2
a += 2
print(f"After 'a += 2', a is: {a}")

# a *= 4 is the same as a = a * 4
a *= 4
print(f"After 'a *= 4', a is: {a}")

Initial value of a: 3
After 'a += 2', a is: 5
After 'a *= 4', a is: 20


### Practice: Assignment Operators

**Challenge**: What's the final output of the following code? Trace the value of `a` carefully.

In [61]:
a = 1
a += 2  # a becomes 1 + 2 = 3
a * 4   # This line calculates 3 * 4 = 12, but doesn't assign it back to 'a'!
a - 1   # This line calculates 3 - 1 = 2, but also doesn't assign it.
a /= 2  # This is a = a / 2, so a becomes 3 / 2 = 1.5
print(a)

# The key takeaway is that an operation like `a * 4` doesn't change `a` unless you use an assignment operator like `a *= 4` or `a = a * 4`.

1.5


## 2.6 Identity and Membership Operators

- **Identity (`is`, `is not`)**: Check if two variables refer to the *exact same object* in memory.
- **Membership (`in`, `not in`)**: Check if a value exists within a sequence (like a list or string).

In [62]:
# Identity Operators
a = [1, 2] # a is a list
b = a      # b now points to the *same* list as a
c = [1, 2] # c is a *new* list that happens to have the same content

print("--- Identity Operators ---")
print(f"a is b: {a is b}")   # True, they are the same object
print(f"a == b: {a == b}")   # True, their contents are equal
print(f"a is c: {a is c}")   # False, they are different objects in memory
print(f"a == c: {a == c}")   # True, their contents are equal

# Membership Operators
nums = [1, 2, 3, 4, 5]
print("\n--- Membership Operators ---")
print(f"Is 3 in nums? {3 in nums}")
print(f"Is 10 not in nums? {10 not in nums}")

--- Identity Operators ---
a is b: True
a == b: True
a is c: False
a == c: True

--- Membership Operators ---
Is 3 in nums? True
Is 10 not in nums? True


## 2.7 Operator Precedence

The order in which operations are performed. Parentheses `()` are always evaluated first.

In [63]:
# Multiplication/Division happens before Addition/Subtraction
print(f"10 - 4 / 2 = {10 - 4 / 2}") # 4/2 is calculated first
print(f"(10 - 4) / 2 = {(10 - 4) / 2}") # Parentheses force subtraction first

print(f"2 + 3 * 4 = {2 + 3 * 4}") # 3*4 is calculated first
print(f"(2 + 3) * 4 = {(2 + 3) * 4}") # Parentheses force addition first

10 - 4 / 2 = 8.0
(10 - 4) / 2 = 3.0
2 + 3 * 4 = 14
(2 + 3) * 4 = 20


### Practice: Operators

**Challenge**: What is the final value of `result`? Trace the code step-by-step.

In [64]:
x = 10
y = 4
z = 2

# Step 1
# Precedence: y * z is calculated first (4 * 2 = 8).
# Then x += 8 means x = x + 8, so x becomes 10 + 8 = 18.
x += y * z
print(f"After Step 1, x is: {x}")

# Step 2
# y = x % 9 means y = 18 % 9.
# 18 divided by 9 has a remainder of 0.
# So y is now 0.
y = x % 9
print(f"After Step 2, y is: {y}")

# Step 3
# (x > 15) is (18 > 15), which is True.
# (y == 0) is (0 == 0), which is True.
# result = True and True, which is True.
result = (x > 15) and (y == 0)

print(f"\nThe final result is: {result}")

After Step 1, x is: 18
After Step 2, y is: 0

The final result is: True


---

# 3. Collections

Collections (or data structures) are used to store and organize groups of related data.

## 3.1 Mutable vs. Immutable

- A **mutable** object can be changed after it is created (e.g., `list`, `dict`, `set`).
- An **immutable** object cannot be changed after it is created (e.g., `int`, `float`, `str`, `tuple`).

## 3.2 List

An ordered, mutable collection of items. Defined with square brackets `[]`.

In [65]:
nums = [1, 2, 3]
print(f"nums is a {type(nums)}")

# Lists can contain items of different data types
mix_list = [1, "two", 3.0, False]
print(f"A mixed-type list: {mix_list}")

nums is a <class 'list'>
A mixed-type list: [1, 'two', 3.0, False]


### List Operators

In [66]:
my_list = ['P', 'Y', 'T', 'H', 'O', 'N']

# Indexing: Access elements by position (0-based)
print(f"Index 0: {my_list[0]}")   # Positive indexing from the start
print(f"Index -1: {my_list[-1]}") # Negative indexing from the end

# Slicing: Extract a sub-list [start:stop:step]
# The 'stop' index is not included in the result.
print(f"Slice [1:4]: {my_list[1:4]}") # Elements at index 1, 2, 3
print(f"Slice from start to 3 [:3]: {my_list[:3]}")
print(f"Slice from 2 to end [2:]: {my_list[2:]}")
print(f"Slice with step 2 [::2]: {my_list[::2]}") # Every second element

# Concatenation (+)
list1 = [1, 2]
list2 = [3, 4]
print(f"Concatenation: {list1 + list2}")

# Repetition (*)
print(f"Repetition: {list1 * 3}")

# Membership (in)
print(f"Is 'H' in my_list? {'H' in my_list}")

Index 0: P
Index -1: N
Slice [1:4]: ['Y', 'T', 'H']
Slice from start to 3 [:3]: ['P', 'Y', 'T']
Slice from 2 to end [2:]: ['T', 'H', 'O', 'N']
Slice with step 2 [::2]: ['P', 'T', 'O']
Concatenation: [1, 2, 3, 4]
Repetition: [1, 2, 1, 2, 1, 2]
Is 'H' in my_list? True


### List Methods

Methods are functions that belong to an object. Since lists are mutable, many methods modify the list in-place.

In [67]:
fruits = ['apple', 'banana', 'cherry']
print(f"Original list: {fruits}")

# append(): Adds an element to the end
fruits.append('orange')
print(f"After append('orange'): {fruits}")

# insert(): Inserts an element at a specific position
fruits.insert(1, 'blueberry')
print(f"After insert(1, 'blueberry'): {fruits}")

# remove(): Removes the first occurrence of a value
fruits.remove('cherry')
print(f"After remove('cherry'): {fruits}")

# pop(): Removes and returns the element at an index (default is the last)
removed_fruit = fruits.pop()
print(f"Popped item: {removed_fruit}")
print(f"List after pop(): {fruits}")

# sort(): Sorts the list in-place
fruits.sort()
print(f"After sort(): {fruits}")

# reverse(): Reverses the list in-place
fruits.reverse()
print(f"After reverse(): {fruits}")

Original list: ['apple', 'banana', 'cherry']
After append('orange'): ['apple', 'banana', 'cherry', 'orange']
After insert(1, 'blueberry'): ['apple', 'blueberry', 'banana', 'cherry', 'orange']
After remove('cherry'): ['apple', 'blueberry', 'banana', 'orange']
Popped item: orange
List after pop(): ['apple', 'blueberry', 'banana']
After sort(): ['apple', 'banana', 'blueberry']
After reverse(): ['blueberry', 'banana', 'apple']


## 3.3 Tuple

An ordered, **immutable** collection of items. Defined with parentheses `()`.

In [68]:
# Tuples support the same operators as lists (indexing, slicing, etc.)
letters = ('a', 'b', 'c', 'b', 'd')
print(f"letters is a {type(letters)}")
print(f"Slice [1:4]: {letters[1:4]}")

# But they are immutable, so you cannot change them
try:
    letters[0] = 'z'
except TypeError as e:
    print(f"Error: {e}")

# Tuples have fewer methods because they can't be modified.
# count(): Returns the number of times a value appears.
print(f"The letter 'b' appears {letters.count('b')} times.")

# index(): Returns the index of the first occurrence of a value.
print(f"The first index of 'c' is {letters.index('c')}.")

letters is a <class 'tuple'>
Slice [1:4]: ('b', 'c', 'b')
Error: 'tuple' object does not support item assignment
The letter 'b' appears 2 times.
The first index of 'c' is 2.


## 3.4 String

A sequence of characters, which is also **immutable**. Defined with single `'`, double `"`, or triple `'''`/`"""` quotes.

In [69]:
s1 = "Hello, World!"

# Strings also support sequence operators
print(f"Index 1: {s1[1]}")      # 'e'
print(f"Slice [7:12]: {s1[7:12]}")    # 'World'
print(f"Concatenation: {'Hello' + ' Python'}")
print(f"Repetition: {'Go' * 3}")

Index 1: e
Slice [7:12]: World
Concatenation: Hello Python
Repetition: GoGoGo


### String Methods

String methods are very powerful. Since strings are immutable, they always return a **new** string.

In [70]:
text = "  Python Programming is Fun!  "
print(f"Original: '{text}'")

print(f"lower(): '{text.lower()}'")
print(f"upper(): '{text.upper()}'")
print(f"strip(): '{text.strip()}'") # Removes leading/trailing whitespace
print(f"replace('Fun', 'Awesome'): '{text.replace('Fun', 'Awesome')}'")

# split() creates a list of strings
words = text.strip().split(' ')
print(f"split(' '): {words}")

# join() combines a list of strings into one string
joined_string = "--".join(words)
print(f"'--'.join(words): '{joined_string}'")

Original: '  Python Programming is Fun!  '
lower(): '  python programming is fun!  '
upper(): '  PYTHON PROGRAMMING IS FUN!  '
strip(): 'Python Programming is Fun!'
replace('Fun', 'Awesome'): '  Python Programming is Awesome!  '
split(' '): ['Python', 'Programming', 'is', 'Fun!']
'--'.join(words): 'Python--Programming--is--Fun!'


### String Formatting

In [71]:
language = "Python"
year = 1991

# 1. Using the .format() method
format_str = "{} was developed in {}.".format(language, year)
print(f".format() method: {format_str}")

# 2. Using the % operator (older style)
percent_str = "%s was developed in %d." % (language, year)
print(f"% operator: {percent_str}")

# 3. Using f-strings (modern and preferred)
f_string = f"{language} was developed in {year}."
print(f"f-string: {f_string}")

.format() method: Python was developed in 1991.
% operator: Python was developed in 1991.
f-string: Python was developed in 1991.


## 3.5 Set
An **unordered** collection of **unique** items. Defined with curly braces `{}`.

In [72]:
# Duplicates are automatically removed
my_set = {1, 2, 5, 4, 2, 1}
print(f"my_set is a {type(my_set)}")
print(f"Set with duplicates removed: {my_set}")

# Note: {} creates an empty DICTIONARY. To create an empty set, use set().
empty_set = set()
print(f"empty_set is a {type(empty_set)}")

my_set is a <class 'set'>
Set with duplicates removed: {1, 2, 4, 5}
empty_set is a <class 'set'>


### Set Operations
Sets support mathematical operations.

In [73]:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

print(f"A = {A}")
print(f"B = {B}")

# Union (|): All unique elements from both sets
print(f"Union (A | B): {A | B}")

# Intersection (&): Elements that are in both sets
print(f"Intersection (A & B): {A & B}")

# Difference (-): Elements in A but not in B
print(f"Difference (A - B): {A - B}")

# Symmetric Difference (^): Elements in either A or B, but not both
print(f"Symmetric Difference (A ^ B): {A ^ B}")

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
Union (A | B): {1, 2, 3, 4, 5, 6}
Intersection (A & B): {3, 4}
Difference (A - B): {1, 2}
Symmetric Difference (A ^ B): {1, 2, 5, 6}


## 3.6 Dictionary

A mutable collection of **key-value pairs**. Keys must be unique and immutable. Defined with `{}`.

In [74]:
person_age = {"John": 21, "Kim": 35, "Alex": 26}
print(f"person_age is a {type(person_age)}")
print(f"Dictionary: {person_age}")

# Accessing values using keys
print(f"Kim's age is {person_age['Kim']}")

# Use .get() to safely access keys that may not exist
print(f"Steve's age is {person_age.get('Steve')}")
print(f"Steve's age (with default) is {person_age.get('Steve', 'Unknown')}")

# Adding or updating entries
person_age['Steve'] = 32 # Adding a new key-value pair
person_age['Kim'] = 36   # Updating an existing key
print(f"Updated dictionary: {person_age}")

# Removing entries
del person_age['John']
print(f"After deleting John: {person_age}")

person_age is a <class 'dict'>
Dictionary: {'John': 21, 'Kim': 35, 'Alex': 26}
Kim's age is 35
Steve's age is None
Steve's age (with default) is Unknown
Updated dictionary: {'John': 21, 'Kim': 36, 'Alex': 26, 'Steve': 32}
After deleting John: {'Kim': 36, 'Alex': 26, 'Steve': 32}


## 3.7 General Collection Functions
These functions work on most collection types.

In [75]:
data = [5, 2, 8, 1, 8]

print(f"len(data) = {len(data)}")   # Length
print(f"max(data) = {max(data)}")   # Maximum value
print(f"min(data) = {min(data)}")   # Minimum value
print(f"sum(data) = {sum(data)}")   # Sum of all values

# sorted() returns a new sorted list, it does not modify the original
sorted_data = sorted(data)
print(f"sorted(data) = {sorted_data}")
print(f"Original data is unchanged: {data}")

len(data) = 5
max(data) = 8
min(data) = 1
sum(data) = 24
sorted(data) = [1, 2, 5, 8, 8]
Original data is unchanged: [5, 2, 8, 1, 8]


### Practice: Collections (NLP Challenge)

**Challenge**: Analyze the `document` string to understand its vocabulary and word frequencies.

1.  **Tokenize**: Split the `document` string into a list of individual words (tokens). Convert all words to lowercase.
2.  **Vocabulary**: Create a set of unique words from the token list to find the vocabulary size.
3.  **Frequencies**: Create a dictionary to store the count of how many times each unique word appears.

<span style="color:red">Please submit your code to [this url](https://docs.qq.com/form/page/DV0pmWUdyc0R4a3J3), which will serve as a part of your attendance record</span>.

In [86]:
document = "The quick brown fox jumps over the lazy dog The dog was happy"
words = document.lower().split()
voc : set = set()
freq : dict = {}
for word in words :
    voc.add(word)
print(f"size of vocabulary : {len(voc)}, vocabulary : {voc}")
for word in voc :
    freq[word] = 0
for word in words :
    freq[word] += 1
print(f"frequencies: {freq}")  

size of vocabulary : 10, vocabulary : {'dog', 'happy', 'quick', 'brown', 'fox', 'over', 'was', 'lazy', 'the', 'jumps'}
frequencies: {'dog': 2, 'happy': 1, 'quick': 1, 'brown': 1, 'fox': 1, 'over': 1, 'was': 1, 'lazy': 1, 'the': 3, 'jumps': 1}


---

# 4. Type Conversion

The process of converting an object from one data type to another.

## 4.1 Implicit vs. Explicit Conversion

In [76]:
# Implicit Conversion (Coercion)
# Python automatically converts int to float to avoid data loss.
implicit_result = 2 + 4.5
print(f"Implicit conversion: 2 + 4.5 = {implicit_result} (type: {type(implicit_result)})")

# Explicit Conversion (Casting)
# We manually convert types using functions like int(), float(), str(), etc.
int_val = int(3.9) # The decimal part is truncated, not rounded
str_val = str(123)
print(f"Explicit conversion: int(3.9) = {int_val} (type: {type(int_val)})")
print(f"Explicit conversion: str(123) = '{str_val}' (type: {type(str_val)})")

# Converting a list to other collection types
my_list = [1, 2, 2, 3]
my_tuple = tuple(my_list)
my_set = set(my_list) # Duplicates are removed

print(f"\nList {my_list} converted to tuple: {my_tuple}")
print(f"List {my_list} converted to set: {my_set}")

Implicit conversion: 2 + 4.5 = 6.5 (type: <class 'float'>)
Explicit conversion: int(3.9) = 3 (type: <class 'int'>)
Explicit conversion: str(123) = '123' (type: <class 'str'>)

List [1, 2, 2, 3] converted to tuple: (1, 2, 2, 3)
List [1, 2, 2, 3] converted to set: {1, 2, 3}


### Practice: Type Conversion

**Challenge**: You receive a list of product prices as strings. Calculate the total cost.

- Convert each price to a number.
- Sum the numbers to find the total.
- Format the output as a user-friendly string.

In [77]:
prices_str = ["29.99", "15.50", "100.00", "7.25"]
total = 0

# Your conversion and calculation code here...

---

# Recap: Connecting the Concepts

- **Basic Data Types**: The fundamental building blocks (integers, floats, booleans).
- **Operators**: The "verbs" that allow us to manipulate and compare data.
- **Collections**: Essential for organizing and storing groups of data (lists, tuples, strings, sets, dictionaries).
- **Type Conversion**: Crucial for adapting data to different needs and leveraging various data type features.

These concepts form the bedrock of Python programming, enabling you to build increasingly complex and powerful applications.