<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Python_Data_Analytics_Course/blob/main/1_Basics/13_Loops.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Loops

## Overview

* `while` loops
* `for` loops

## Importance

Crucial for automating repetitive tasks, like data processing in pandas or creating multiple plots in Matplotlib.

A `for` loops iterates over an iterable object.

In [1]:
numbers = [1, 2, 3, 4, 5]
for number in numbers:
    print(number)

1
2
3
4
5


A `while` loop repeats an action until a condition is no longer true.

In [2]:
count = 1
while count <= 5:
    print(count)
    count += 1

1
2
3
4
5


## While

### Notes

* A `while` loop executes a set of statements as long as the condition is true.
* Remember to increment `i`, or else the loop will continue forever.

### Examples

Prints out all the years of experience required for the 3 data science jobs. 

* Initializes variables to track the total number of job positions (`total_positions`), and the current position being checked (`position_index`).
* Uses a `while` loop to iterate through each job position's minimum experience requirement (`position_experience_requirements`) as long as `position_index` is less than `total_positions`.
* Within the loop it prints out the specific number of years of experience `position_experience_requirement` that position requires.
* Increments (increases) `position_index` by 1 in each loop iteration to move to the next job position. This makes sure the loop terminates after checking all positions.

In [3]:
# Total number of job positions to check
total_positions = 3
position_index = 0

# Minimum years of experience required for each data science job position
position_experience_requirements = [1, 3, 2]

In [4]:
while position_index < total_positions:
    print('Position requires', position_experience_requirements[position_index], 'years of experience.')
    position_index += 1

Position requires 1 years of experience.
Position requires 3 years of experience.
Position requires 2 years of experience.


### Practical Example

Here's a more complicated example of how you can use `while` loops.

We'll add in the `user_years_of_experience`.

* Initializes variables to track the user's experience (`user_years_of_experience`), the total number of job positions (`total_positions`), and the current position being checked (`position_index`).
* Uses a `while` loop to iterate through each job position's minimum experience requirement (`position_experience_requirements`) as long as `position_index` is less than `total_positions`.
* Within the loop, compares `user_years_of_experience` to the requirement for the current position (`required_years`): prints 'Qualified' if the user meets or exceeds the requirement, otherwise prints 'Not qualified'.
* Increments (increases) `position_index` by 1 in each loop iteration to move to the next job position. This makes sure the loop terminates after checking all positions.

In [5]:
# User's years of experience
user_years_of_experience = 2

user_years_of_experience

2

🪲 **Debugging**

**These are intentional mistakes**

This is used to demonstrate debugging.

Error 1:
- Forgot a `:` after the if statement. 

```python
if user_years_of_experience >= required_years
```

Error 2:
- Forgot to indent the statement after `else` keyword.

```python
'cloud': 'aws'
```

Steps to Debug:

1. Look at the actual error, can you tell what the problem is?
2. If not, then look it up:
  1. Use a chatbot like ChatGPT or Claude
  2. Look it up using Google

In [6]:
while position_index < total_positions:
    required_years = position_experience_requirements[position_index]
    if user_years_of_experience >= required_years
        print('Qualified')
    else:
    print('Not qualified')
    position_index += 1

SyntaxError: invalid syntax (126596802.py, line 3)

This is the correct code ✅. I noticed if you try to rerun this right after this error you will need to re-run the variable cells again.

In [7]:
while position_index < total_positions:
    required_years = position_experience_requirements[position_index]
    if user_years_of_experience >= required_years:
        print('Qualified')
    else:
        print('Not qualified')
    position_index += 1

## For

### Notes

* A `for` loop iterates over a sequence (e.g. list, tuple, dictionary, set, or string).
* Execute a set of statements, one for each item in a list, tuple, set, etc.

### Example

This loop checks the user's qualification for each position based on their years of experience and returns the result.

We're using the same variable as before: `position_experience_requirements`.

* A `for` loop iterates (goes through) through each item in the position_experience_requirements list.
* Inside the loop it prints the years of experience required for the position.

In [8]:
for x in position_experience_requirements:
    print(f'Position requires {x} years of experience.')

Position requires 1 years of experience.
Position requires 3 years of experience.
Position requires 2 years of experience.


### Enumerate 

#### Notes

* Primary used in loops, especially in `for` loops. 
* Good for when you need the index of an iterable object for operations like:
    * updating the elements based on their position
    * comparing items at different positions
    * displaying the position along with the item
* Syntax: `enumerate(iterable, start = 0)`

#### Example

Instead of simply iterating over the list with `x in list` to get the years of experience required for each position, we can use `enumerate` to get both the index (which represents the position number) and the years of experience from `position_experience_requirements`. 

We'll also use f-strings to format our string.

In [9]:
# Enhanced example using enumerate to include the position index
for index, years in enumerate(position_experience_requirements, start=1):
    print(f'Position {index} requires {years} years of experience.')

Position 1 requires 1 years of experience.
Position 2 requires 3 years of experience.
Position 3 requires 2 years of experience.


### Pass

#### Notes

* `pass` is used when a `for` has nothing in it, so it avoids and error

#### Example

Below we updated our example and added `pass` in it.

In [10]:
for x in position_experience_requirements:
    pass

### Nested Loops

You can also have a loop inside another loop. Below we're going through two lists.

In [11]:
roles = ['Data Scientist', 'Machine Learning Engineer']
skills = ['Python', 'SQL', 'Machine Learning']

for role in roles:
    print(f'For the role of {role}, you need experience in:')
    for skill in skills:
        print(f'  - {skill}')

For the role of Data Scientist, you need experience in:
  - Python
  - SQL
  - Machine Learning
For the role of Machine Learning Engineer, you need experience in:
  - Python
  - SQL
  - Machine Learning


### Practical Example

Here's a more complicated example of how you can use `for` loops.

This loop checks the user's qualification for each position based on their years of experience and returns the result.

* The code defines a list named position_experience_requirements with integers representing the minimum years of experience required for various data science job positions.
* It sets a variable `user_years_of_experience` to 2.
* A for loop iterates (goes through) through each item in the position_experience_requirements list.
* Inside the loop, an if statement checks if user_experience is greater than or equal to required_experience for a position.
* If the condition is met, it prints 'Qualified'; otherwise, it prints 'Not qualified'.

In [12]:
# For loop to check qualification for each job
for required_experience in position_experience_requirements:
    if user_years_of_experience >= required_experience:
        print('Qualified')
    else:
        print('Not qualified')

Qualified
Not qualified
Qualified


### Break

The `break` statement can stop the loop before it's looped through all of the items. Here the break exits the loop early if the condition is met (aka if the users experience is greater than or equal to the required experience). 

In [13]:
# For loop to check qualification for each job
for required_experience in position_experience_requirements:
    if user_years_of_experience >= required_experience:
        break
    else:
        print('Not qualified')

What if you didn't have a specific list to run through? And instead want to just go through a specific number of times? Then you'd use `range()`.

### Example #2

Problem: Make a list of job titles containing "Data Analyst".

The following Code block is only to import the data.

This includes concepts we haven't covered yet; so just ignore it for now.

In [17]:
from datasets import load_dataset

# Load the dataset
dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()

# Create a list of job titles from the dataset
job_list = df['job_title'].tolist()

# Remove any non-string values from the list
job_list = [job for job in job_list if isinstance(job, str)]

# Display the first 10 job titles
job_list[:10]

['Data Analytics',
 'Data Scientist Intern',
 'Manager, Data Analytics',
 'Data Engineer',
 'Technical Data Analyst',
 'Data Engineer',
 'Research Data Scientist - Now Hiring',
 'Diversity and Inclusion Workforce Data Scientist with Security...',
 'Data Scientist',
 'Big Data Senior Work']

In [18]:
# Create an empty list to store the job titles that contain "Data Analyst"
analyst_list = []

# Loop through the job titles and add any that contain "Data Analyst" to the analyst_list
for job in job_list:
  if "Data Analyst" in job:
    analyst_list.append(job)

# Display the job titles that contain "Data Analyst"
analyst_list[:10]

['Technical Data Analyst',
 'Sr. Data Analyst - Full-time / Part-time',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst Junior settore Logistica',
 'Senior Data Analyst - Now Hiring',
 'Health Technology Data Analyst',
 'Data Analyst',
 'Love Excel? Junior Data Analyst for Real Estate',
 'Data Analyst']

## `range()` function

#### Notes

* `range()` runs a `for` loop a specific number of times
* It returns a sequence of numbers, starting from 0 be default, and increments by 1 (by default), and ends at the specified number

#### Examples

First we'll print numbers from 0 to 3 since `range(4)` goes from 0 to 4 (not including 4).

In [14]:
for x in range(4):
    print(x)

0
1
2
3


Here you can specify the starting value. This will start at 1 and go till 5 (since it doesn't include 6).

In [15]:
for x in range(1,6):
    print(x)

1
2
3
4
5
