## Module 2: Thinking Like a Data Scientist
> This module helps you organize and work with data like a pro. You’ll master lists, tuples, dictionaries, and sets to store and retrieve data efficiently. Then, you’ll learn to write functions so you’re not rewriting the same code over and over—because real data scientists keep it clean and reusable!
### Day 5 - Dictionaries & Sets: When Indexes Aren’t Enough!
----

##### Overview:

You’re now officially best friends with lists and tuples, but sometimes a simple list isn’t enough. Imagine you’re working on a project and need to store information like this:

- Name: "Ada Lovelace"
- Field: "Mathematics, Computer Science"
- Contributions: "First Programmer"

Using a list to store this information feels a bit awkward, doesn’t it? How do you know which value is the "name" and which is the "field"? This is exactly where **dictionaries** come in—they organize data in a way that makes sense.

And then there’s their quieter cousin, **sets**, which exist to keep things unique—a highly underrated trait when dealing with messy data.


#### 1. What Are Dictionaries?

A **dictionary** is all about creating key-value pairs. Think of it as a real-life dictionary where words are keys, and their definitions are values:

- **Key**: Something unique, like "Name."
- **Value**: The data associated with that key, like "Ada Lovelace."

- Example usage in data science: storing student IDs (keys) mapped to names (values), or feature names (keys) mapped to data lists (values).


**Creating Dictionaries**

In [1]:
# Example: A Dictionary of Ada Lovelace
person_info = {
    "Name": "Ada Lovelace",
    "Field": "Mathematics, Computer Science",
    "Contributions": "First Programmer"
}

print(person_info)


{'Name': 'Ada Lovelace', 'Field': 'Mathematics, Computer Science', 'Contributions': 'First Programmer'}


**Accessing Values in a Dictionary**
- Use a dictionary's key to get its value.

In [2]:
# Access a value by its key
print(person_info["Name"])  # Outputs: Ada Lovelace
print(person_info["Field"])  # Outputs: Mathematics, Computer Science

Ada Lovelace
Mathematics, Computer Science


**Listing all the keys/values**

In [3]:
print(person_info.keys())

dict_keys(['Name', 'Field', 'Contributions'])


In [4]:
print(person_info.values())

dict_values(['Ada Lovelace', 'Mathematics, Computer Science', 'First Programmer'])


#### 2: Adding, Updating, and Removing Key-Value Pairs

Dictionaries are mutable—you can add, modify, and remove items.

**Adding a New Key-Value Pair**
- Simply assign a value to a new key:

In [5]:
person_info["Born"] = 1815
print(person_info)


{'Name': 'Ada Lovelace', 'Field': 'Mathematics, Computer Science', 'Contributions': 'First Programmer', 'Born': 1815}


**Updating an Existing Key’s Value**
- Just reassign the value:

In [6]:
person_info["Field"] = "Mathematics & Computing"
print(person_info["Field"]) 


Mathematics & Computing


In [7]:
print(person_info)

{'Name': 'Ada Lovelace', 'Field': 'Mathematics & Computing', 'Contributions': 'First Programmer', 'Born': 1815}


**Removing a Key-Value Pair**
- Use the `del` keyword:

In [8]:
del person_info["Born"]
print(person_info)

{'Name': 'Ada Lovelace', 'Field': 'Mathematics & Computing', 'Contributions': 'First Programmer'}


**Checking If a Key Exists**
- Want to verify that a key exists before accessing it? Use the `in` keyword:

In [9]:
if "Name" in person_info:
    print("Yes, there's a Name key!")

Yes, there's a Name key!


#### 3: Looping Through a Dictionary

You’ll often need to loop through dictionaries to process the keys, values, or both.

**Iterating Through Keys**

In [10]:
for key in person_info: 
    print(key)

Name
Field
Contributions


**Iterating Through Values**

In [11]:
for value in person_info.values():
    print(value)


Ada Lovelace
Mathematics & Computing
First Programmer


**Iterating Through Keys and Values**

In [12]:
for key, value in person_info.items():
    print(f"{key}: {value}")



Name: Ada Lovelace
Field: Mathematics & Computing
Contributions: First Programmer


#### 4. What are Sets?

A **set** is like a list but with two special properties:
- **No Duplicate Entries:** Sets automatically remove duplicates for you.
- **Unordered:** Sets don’t guarantee a specific order of elements.


- Imagine you need to quickly clean up duplicate entries in a dataset column—sets are your new best friend!

**Creating a Set**

Sets are defined with curly braces `{}`, but unlike dictionaries, they contain individual elements (no key-value pairs).

In [13]:
unique_numbers = {1, 2, 3, 4, 4, 5}
print(unique_numbers) 



{1, 2, 3, 4, 5}


**Adding Elements to a Set**

- Use `.add()` to add an element:

In [14]:
unique_numbers.add(7)
print(unique_numbers) 

{1, 2, 3, 4, 5, 7}


**Removing Elements**
- Use `.remove(item)` to delete specific items


In [15]:
unique_numbers.remove(4)
print(unique_numbers)  

{1, 2, 3, 5, 7}


**Checking Membership**
- Use the `in` keyword to check if an element exists

In [16]:
if 3 in unique_numbers:
    print("3 is in the set!")


3 is in the set!


**Set Operations**
- Sets excel at comparing data. You can find intersections, unions, and differences between two sets:

In [17]:
set_a = {1, 2, 3}
set_b = {3, 4, 5}

# Union: Combine all elements from both sets (no duplicates)
print(set_a | set_b)  

# Intersection: Keep only elements found in BOTH sets
print(set_a & set_b)  

# Difference: Elements in set_a but not set_b
print(set_a - set_b)  


{1, 2, 3, 4, 5}
{3}
{1, 2}


#### 5: Dictionaries vs. Sets in Data Science

In Data Science workflows:

- **Dictionaries** are great for mappings, such as column name descriptions or metadata (e.g., Feature: Description).
- **Sets** are useful for deduplication and quick comparisons when you care only about unique values.

For example:

- **Dictionary use case:** Mapping features to their meanings

In [18]:
feature_descriptions = {
    "Age": "The age of the customer in years",
    "Income": "Annual income of the customer in thousands",
    "Churn": "Whether the customer left the service",
}
print(feature_descriptions)

{'Age': 'The age of the customer in years', 'Income': 'Annual income of the customer in thousands', 'Churn': 'Whether the customer left the service'}


- **Set use case:** Cleaning a list of cities

In [19]:
messy_cities = ["New York", "Berlin", "New York", "Paris", "Berlin"]
unique_cities = set(messy_cities)
print(unique_cities)  


{'Paris', 'New York', 'Berlin'}


- Using a set to remove duplicates is a neat trick, but remember: sets don’t preserve the order of items.
    - Sets are useful for removing duplicates from a list or other iterable since they automatically discard repeated entries. However, sets do not maintain any specific order of the items.

In [20]:
my_list = [3, 1, 2, 3, 4, 2, 1]
unique_items = set(my_list)
print("Unique items:", unique_items)  # Outputs in arbitrary order, like {1, 2, 3, 4}

Unique items: {1, 2, 3, 4}


**Quick Tips: When to Use Dictionaries and Sets**
- Dictionaries: Use when you need to associate one piece of data with another (key-value mapping).
- Sets: Use when you need unique items or want to perform mathematical operations like union or intersection.

----
#### Quick Exercises
1. Create a dictionary to store some metadata about a dataset:
    - Total rows, total columns, and the type of analysis done (e.g., regression or classification).
    - Add a new key-value pair to store the dataset source (e.g., 'CSV file', 'database', or a URL).
    - Update the analysis type to "clustering."


2. Use a set to remove duplicates from the following list:
    - `sample_data = [1, 2, 3, 1, 4, 2, 5, 3]`

3. Write a dictionary that maps some feature names to their descriptions (e.g., "Age" → "Customer Age in Years"). Loop through and print all the features with their descriptions.

4. Create two sets of numbers (set_a = {10, 20, 30}, set_b = {20, 30, 40}), and find:

    - Their union
    - Their intersection
    - The numbers in set_a but not set_b



**Please Note:** The solutions to above questions will be present at the end of next session's (Day 6: Functions) Notebook.

----

### Day 4 Exercise Solution

1. Create a list of your favorite data science buzzwords (like "AI," "Deep Learning," etc.).
- Add one using .append().
- Remove the second one.
- Replace the third with another buzzword.


In [21]:
# Create a list of your favorite data science buzzwords (like "AI," "Deep Learning," etc.)
buzzwords = ["AI", "Deep Learning", "Machine Learning", "Big Data", "AI"]
print("Buzzwords:", buzzwords) 


Buzzwords: ['AI', 'Deep Learning', 'Machine Learning', 'Big Data', 'AI']


In [22]:
# Add one using .append().
buzzwords.append("Neural Networks")
print("After append:", buzzwords)  

After append: ['AI', 'Deep Learning', 'Machine Learning', 'Big Data', 'AI', 'Neural Networks']


In [23]:
# Remove the second one.
buzzwords.remove("Deep Learning")
print("After remove:", buzzwords)

After remove: ['AI', 'Machine Learning', 'Big Data', 'AI', 'Neural Networks']


In [24]:
# Replace the third with another buzzword.
buzzwords[2] = "Data Mining"
print("After replace:", buzzwords)

After replace: ['AI', 'Machine Learning', 'Data Mining', 'AI', 'Neural Networks']


2. If fruit is a list of ['apple', 'banana', 'dates', 'cherry', 'dragonfruit'], what is the difference between the below 2 statements

    `del fruits[2]`

    `fruits.pop(2)`

Both `del fruits[2]` and `fruits.pop(2)` remove the element at index 2 ('dates') from the list, but there’s a key difference in how they behave:


1️. del fruits[2]
- Removes the element at index 2 without returning it.

- The item is deleted permanently, and you cannot retrieve its value.


In [25]:
fruits = ['apple', 'banana', 'dates', 'cherry', 'dragonfruit']
del fruits[2]  # Removes 'dates'
print(fruits)  # Output: ['apple', 'banana', 'cherry', 'dragonfruit']


['apple', 'banana', 'cherry', 'dragonfruit']


In [26]:
# removed_item = del fruits[2]  # SyntaxError: invalid syntax

2️. fruits.pop(2)
- Removes the element at index 2 and returns it.

- You can store the removed value for later use.

In [27]:
fruits = ['apple', 'banana', 'dates', 'cherry', 'dragonfruit']
removed_item = fruits.pop(2)  # Removes 'dates' and stores it
print(fruits)  # Output: ['apple', 'banana', 'cherry', 'dragonfruit']
print(removed_item)  # Output: 'dates'


['apple', 'banana', 'cherry', 'dragonfruit']
dates


3. Write a program to print only the even numbers from this list:
- `numbers = [11, 22, 33, 44, 55, 66, 77, 88]`

2.  Loop through a list of random numbers and print only the even ones.

In [28]:
# Write a program to print only the even numbers from this list: 'numbers = [11, 22, 33, 44, 55, 66, 77, 88]'
numbers = [11, 22, 33, 44, 55, 66, 77, 88]

# Loop through the list and print only even numbers
for num in numbers:
    if num % 2 == 0:  # Check if the number is even
        print(num)




22
44
66
88


4. Create a tuple of three immutable constants (e.g., Pi, e, and the speed of light).

- Try accessing each item.
- Then try to change one. (Spoiler: Python won't let you!)

In [29]:
# Tuple of three immutable constants
constants = (3.14159, 2.71828, 299792458)  # (Pi, e, speed of light in m/s)

In [30]:
print("Pi:", constants[0])
print("Euler's number (e):", constants[1])
print("Speed of light (m/s):", constants[2])

Pi: 3.14159
Euler's number (e): 2.71828
Speed of light (m/s): 299792458


In [31]:
constants[0] = 3.14  # TypeError: 'tuple' object does not support item assignment

TypeError: 'tuple' object does not support item assignment

5. Create a list named features, `features = ["age", "income", "education", "gender", "city"]`. Use slicing to grab:
- The first three words from this list.
- The last two words from this list.

In [32]:
features = ["age", "income", "education", "gender", "city"]
print("Features:", features)

Features: ['age', 'income', 'education', 'gender', 'city']


In [33]:
# First three words
first_three = features[:3]  # Slicing from start to index 3 (not included)

# Last two words
last_two = features[-2:]  # Slicing from the second last item to the end

# Print results
print("First three words:", first_three)
print("Last two words:", last_two)

First three words: ['age', 'income', 'education']
Last two words: ['gender', 'city']


# HAPPY LEARNING