<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Python_Data_Analytics_Course/blob/main/1_Basics/14_List_Comprehensions.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# List Comprehensions

## Notes

* A way way to create a new list (with shorter syntax) based on the values of an existing list.

Not limited to only `list` comprehension: 
- `set` comprehension
- `tuple` comprehension
- `dictionary` comprehension

## Importance

Provide a concise way to create lists. Useful for data manipulation and filtering in pandas.

In [1]:
# Creating a list of numbers from 0 to 9
numbers = [x for x in range(10)]
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## Example # 1

We're going to modify our example that we used in our `for` loop. Intsead of having the whole print statement with "Position requires X years of experience". We are just going to print out the experience required. This is a simplified version of our code earlier.

In [2]:
# Minimum experience required for job positions
position_experience_requirements = [1, 2, 3]

# Iterate over each experience requirement in the list of job positions
for x in position_experience_requirements:
    print(x)

1
2
3


Now let's use list comprehension to shorten this.

- The code defines `position_experience_requirements` as a list of integers representing minimum years of experience required for various job positions.
- The for loop goes through each list item in `postion_experience_requirements` and prints out the `requirement`.

In [3]:
# Create a list of job positions 
experience = [x for x in position_experience_requirements]

# The result will be a list of job positions 
experience

[1, 2, 3]

This is pretty basic. So let's make it a bit more useful. I'm going to add in a variable that stores the user's years of experience.

In [5]:
user_experience = 2
user_experience

2

Now, we are adding an if condition to our list comprehension. This condition checks if the user's experience (`user_experience`) is greater than or equal to each item (`x`) in the `position_experience_requirements` list.

```python
if user_experience >= x
```

It returns only the jobs where the requirement is met or is lower than the user's experience. 

In [12]:
# Create a list of job positions for which the user is qualified
    
qualified_positions= [x for x in position_experience_requirements if user_experience>= x]

qualified_positions

[1, 2]

## Example # 2

This first code block extracts the data we need for this exercise; we'll dive into this later in the course.

For now just understand I'm extracting the list of `job_titles` form our dataset.

In [6]:
from datasets import load_dataset

# Load the dataset
dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()

# Create a list of job titles from the dataset
job_list = df['job_title'].tolist()

# Remove any non-string values from the list
job_list = [job for job in job_list if isinstance(job, str)]

Let's modify our previous `for` loop into a list comp!

In [13]:
# previous for loop
analyst_list = []

for job in job_list:
  if "Data Analyst" in job:
    analyst_list.append(job)

# show first 10 values
analyst_list[:10]

['Technical Data Analyst',
 'Sr. Data Analyst - Full-time / Part-time',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst Junior settore Logistica',
 'Senior Data Analyst - Now Hiring',
 'Health Technology Data Analyst',
 'Data Analyst',
 'Love Excel? Junior Data Analyst for Real Estate',
 'Data Analyst']

However that was 4 lines of code! 

With list comprehension we can do it in only 1.

In [14]:
analyst_list = [job for job in job_list if "Data Analyst" in job]

# show first 10 values
analyst_list[:10]

['Technical Data Analyst',
 'Sr. Data Analyst - Full-time / Part-time',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst Junior settore Logistica',
 'Senior Data Analyst - Now Hiring',
 'Health Technology Data Analyst',
 'Data Analyst',
 'Love Excel? Junior Data Analyst for Real Estate',
 'Data Analyst']

In [18]:
print("Job list is:     " , len(job_list), "jobs")
print("Analyst list is: ", len(analyst_list), "jobs")

Job list is:      787685 jobs
Analyst list is:  163124 jobs
