![alt text](<../images/just enough.png>)

# Just Enough Python for AI/Data Science
## Module 2: Thinking Like a Data Scientist
> This module helps you organize and work with data like a pro. You’ll master lists, tuples, dictionaries, and sets to store and retrieve data efficiently. Then, you’ll learn to write functions so you’re not rewriting the same code over and over—because real data scientists keep it clean and reusable!
### Day 5 - Dictionaries & Sets: When Indexes Aren’t Enough!
----

##### Overview:

You’re now officially best friends with lists and tuples, but sometimes a simple list isn’t enough. Imagine you’re working on a project and need to store information like this:

- Name: "Ada Lovelace"
- Field: "Mathematics, Computer Science"
- Contributions: "First Programmer"

Using a list to store this information feels a bit awkward, doesn’t it? How do you know which value is the "name" and which is the "field"? This is exactly where **dictionaries** come in—they organize data in a way that makes sense.

And then there’s their quieter cousin, **sets**, which exist to keep things unique—a highly underrated trait when dealing with messy data.


#### 1.  What Are Dictionaries?

A **dictionary** is all about creating key-value pairs. Think of it as a real-life dictionary where words are keys, and their definitions are values:

- **Key**: Something unique, like "Name."
- **Value**: The data associated with that key, like "Ada Lovelace."

- Example usage in data science: storing student IDs (keys) mapped to names (values), or feature names (keys) mapped to data lists (values).


**Creating Dictionaries**

In [1]:
# Example: A Dictionary of Ada Lovelace
person_info = {
    "Name": "Ada Lovelace",
    "Field": "Mathematics, Computer Science",
    "Contributions": "First Programmer"
}

print(person_info)


{'Name': 'Ada Lovelace', 'Field': 'Mathematics, Computer Science', 'Contributions': 'First Programmer'}


**Accessing Values in a Dictionary**
- Use a dictionary's key to get its value.

In [2]:
# Access a value by its key
print(person_info["Name"])  # Outputs: Ada Lovelace
print(person_info["Field"])  # Outputs: Mathematics, Computer Science

Ada Lovelace
Mathematics, Computer Science


**Listing all the keys/values**

In [10]:
print(person_info.keys())

dict_keys(['Name', 'Field', 'Contributions'])


In [11]:
print(person_info.values())

dict_values(['Ada Lovelace', 'Mathematics & Computing', 'First Programmer'])


#### 2: Adding, Updating, and Removing Key-Value Pairs

Dictionaries are mutable—you can add, modify, and remove items.

**Adding a New Key-Value Pair**
- Simply assign a value to a new key:

In [3]:
person_info["Born"] = 1815
print(person_info)


{'Name': 'Ada Lovelace', 'Field': 'Mathematics, Computer Science', 'Contributions': 'First Programmer', 'Born': 1815}


**Updating an Existing Key’s Value**
- Just reassign the value:

In [4]:
person_info["Field"] = "Mathematics & Computing"
print(person_info["Field"]) 


Mathematics & Computing


**Removing a Key-Value Pair**
- Use the `del` keyword:

In [5]:
del person_info["Born"]
print(person_info)

{'Name': 'Ada Lovelace', 'Field': 'Mathematics & Computing', 'Contributions': 'First Programmer'}


**Checking If a Key Exists**
- Want to verify that a key exists before accessing it? Use the `in` keyword:

In [6]:
if "Name" in person_info:
    print("Yes, there's a Name key!")

Yes, there's a Name key!


#### 3: Looping Through a Dictionary

You’ll often need to loop through dictionaries to process the keys, values, or both.

**Iterating Through Keys**

In [7]:
for key in person_info:
    print(key)

Name
Field
Contributions


**Iterating Through Values**

In [8]:
for value in person_info.values():
    print(value)


Ada Lovelace
Mathematics & Computing
First Programmer


**Iterating Through Keys and Values**

In [9]:
for key, value in person_info.items():
    print(f"{key}: {value}")



Name: Ada Lovelace
Field: Mathematics & Computing
Contributions: First Programmer


#### 4. What are Sets?

A **set** is like a list but with two special properties:
- **No Duplicate Entries:** Sets automatically remove duplicates for you.
- **Unordered:** Sets don’t guarantee a specific order of elements.


- Imagine you need to quickly clean up duplicate entries in a dataset column—sets are your new best friend!

**Creating a Set**

Sets are defined with curly braces `{}`, but unlike dictionaries, they contain individual elements (no key-value pairs).

In [12]:
unique_numbers = {1, 2, 3, 4, 4, 5}
print(unique_numbers) 



{1, 2, 3, 4, 5}


**Adding Elements to a Set**

- Use `.add()` to add an element:

In [13]:
unique_numbers.add(6)
print(unique_numbers) 

{1, 2, 3, 4, 5, 6}


**Removing Elements**
- Use `.remove(item)` to delete specific items


In [14]:
unique_numbers.remove(4)
print(unique_numbers)  

{1, 2, 3, 5, 6}


**Checking Membership**
- Use the `in` keyword to check if an element exists

In [15]:
if 3 in unique_numbers:
    print("3 is in the set!")


3 is in the set!


**Set Operations**
- Sets excel at comparing data. You can find intersections, unions, and differences between two sets:

In [16]:
set_a = {1, 2, 3}
set_b = {3, 4, 5}

# Union: Combine all elements from both sets (no duplicates)
print(set_a | set_b)  

# Intersection: Keep only elements found in BOTH sets
print(set_a & set_b)  

# Difference: Elements in set_a but not set_b
print(set_a - set_b)  


{1, 2, 3, 4, 5}
{3}
{1, 2}


## 5: Dictionaries vs. Sets in Data Science

In Data Science workflows:

- **Dictionaries** are great for mappings, such as column name descriptions or metadata (e.g., Feature: Description).
- **Sets** are useful for deduplication and quick comparisons when you care only about unique values.
For example:

- **Dictionary use case:** Mapping features to their meanings

In [17]:
feature_descriptions = {
    "Age": "The age of the customer in years",
    "Income": "Annual income of the customer in thousands",
    "Churn": "Whether the customer left the service",
}


- **Set use case:** Cleaning a list of cities

In [None]:
messy_cities = ["New York", "Berlin", "New York", "Paris", "Berlin"]
unique_cities = set(messy_cities)
print(unique_cities)  


- Using a set to remove duplicates is a neat trick, but remember: sets don’t preserve the order of items.
    - Sets are useful for removing duplicates from a list or other iterable since they automatically discard repeated entries. However, sets do not maintain any specific order of the items.

In [18]:
my_list = [3, 1, 2, 3, 4, 2, 1]
unique_items = set(my_list)
print("Unique items:", unique_items)  # Outputs in arbitrary order, like {1, 2, 3, 4}

Unique items: {1, 2, 3, 4}


**Quick Tips: When to Use Dictionaries and Sets**
- Dictionaries: Use when you need to associate one piece of data with another (key-value mapping).
- Sets: Use when you need unique items or want to perform mathematical operations like union or intersection.

----
#### Quick Exercises
1. Create a dictionary to store some metadata about a dataset:
- Total rows, total columns, and the type of analysis done (e.g., regression or classification).
- Add a new key to include the dataset source.
- Update the analysis type to "clustering."
- Use a set to remove duplicates from the following list:

2. Use a set to remove duplicates from the following list:
- `sample_data = [1, 2, 3, 1, 4, 2, 5, 3]`

3. Write a dictionary that maps some feature names to their descriptions (e.g., "Age" → "Customer Age in Years"). Loop through and print all the features with their descriptions.

4. Create two sets of numbers (set_a = {10, 20, 30}, set_b = {20, 30, 40}), and find:

- Their union
- Their intersection
- The numbers in set_a but not set_b



**Please Note:** The solutions to above questions will be present at the end of next module's (Module 6:) Notebook.

----

### Module 2 Exercise Solution

1.  Write a small program that uses conditionals to check if a user’s input age is old enough to vote (voting age = 18 years).

In [3]:
# Write a small program that uses conditionals to check if a user’s input age is old enough to vote (voting age = 18 years).

age = int(input("Enter your age: "))
# age = 35

if age >= 18:
    print("You are old enough to vote!")
else:
    print("You are not old enough to vote!")


You are old enough to vote!


2.  Loop through a list of random numbers and print only the even ones.

In [1]:
# Loop through a list of random numbers and print only the even ones.
random_numbers = [3, 1, 6, 7, 8, 2, 4, 5]

for number in random_numbers:
    if number % 2 == 0:
        print(number)



6
8
2
4


3.  Create a while loop that prints numbers from 1 to 10, make sure you do not enter an infinte loop.

In [5]:
# Create a while loop that prints numbers from 1 to 10
i = 1
while i <= 10:
    print(i)
    i += 1


1
2
3
4
5
6
7
8
9
10


4.  Loop through this list of strings and print only the strings longer than 5 characters: 
     words = ["Python", "AI", "Machine", "Science", "Wow"]

In [6]:
#Loop through this list of strings and print only the strings longer than 5 characters: words = ["Python", "AI", "Machine", "Science", "Wow"]

words = ["Python", "AI", "Machine", "Science", "Wow"]
for word in words:
    if len(word) > 5:
        print(word)
        

Python
Machine
Science


# HAPPY LEARNING