# Week 4: Loops, Functions, Dictionaries and List Comprehension

## Load the example data

In [189]:
# Importing the os module to get the current working directory
import os
os.getcwd()
os.chdir('/Users/jancg/Library/CloudStorage/OneDrive-StellenboschUniversity/3_LE/3_Courses/AE_772_892')

In [190]:
import pandas as pd
# Load the example DataFrame
df = pd.read_csv('data/raw/week4_example_dataset.csv')
print(df)

#replace spaces with '' in age column and make float
df['age'] = df['age'].str.replace(' ', '').astype(float)


      Name  Age Occupation
0    Alice   25   Engineer
1      Bob   30     Doctor
2  Charlie   35     Artist
3    David   40     Lawyer
4      Eva   45  Scientist


### What are loops, and How Do They Work?

A `for` loop is a control flow statement that allows us to execute a block of code multiple times. The basic structure is, note the indentation:

```python
for element in iterable:
    # code to execute
```

**Example 1: Basic For Loop**

In this example, we use a basic `for` loop to iterate over the 'Name' column of our DataFrame. Inside the loop, we print a greeting for each name.

In [191]:
for i in range(5):
    print(f"These are numbers in a range, {i}")
print("")
for i in range(1,5):
    print(f"These are numbers in a range, {i}")
print("")
for i in range(0,10,2):
    print(f"These are numbers in a range, {i}")

These are numbers in a range, 0
These are numbers in a range, 1
These are numbers in a range, 2
These are numbers in a range, 3
These are numbers in a range, 4

These are numbers in a range, 1
These are numbers in a range, 2
These are numbers in a range, 3
These are numbers in a range, 4

These are numbers in a range, 0
These are numbers in a range, 2
These are numbers in a range, 4
These are numbers in a range, 6
These are numbers in a range, 8


In [192]:
# Example of a basic for loop to print names from the dataset
for name in df['Name']:
    print(f"Hello, {name}!")

Hello, Alice!
Hello, Bob!
Hello, Charlie!
Hello, David!
Hello, Eva!


#### Example 2: Nested For Loop

In this example, we use a nested `for` loop: the outer loop iterates through names, and the inner loop iterates through each character of the current name. This demonstrates how loops can be nested within each other for more complex operations.

In [193]:
# Example of a nested for loop to print each character of each name
for name in df['Name']:
    print(f"Name: {name}")
    for char in name:
        print(f"  Character: {char}")

Name: Alice
  Character: A
  Character: l
  Character: i
  Character: c
  Character: e
Name: Bob
  Character: B
  Character: o
  Character: b
Name: Charlie
  Character: C
  Character: h
  Character: a
  Character: r
  Character: l
  Character: i
  Character: e
Name: David
  Character: D
  Character: a
  Character: v
  Character: i
  Character: d
Name: Eva
  Character: E
  Character: v
  Character: a


## Introduction to Functions

### Basics of Functions

Functions in Python are a way to encapsulate a block of code so that it can be reused and organized more effectively. They serve as "mini-programs" within a larger program, performing specific tasks. Functions are crucial in making code more modular, maintainable, and testable.

### Anatomy of a Function
A function in Python is defined using the `def` keyword, followed by the function name, parentheses, and a colon. The code block inside the function is indented. The function may take zero or more parameters as input and optionally return an output.

Here is a simple function definition - note the indentation:

```python
def greet(name):
    return f"Hello, {name}!"
```

- `def` starts the function definition.
- `greet` is the function name.
- `name` is a parameter that the function accepts.
- The `return` statement specifies what the function outputs.

### Calling a Function
To execute a function, you "call" it by using its name followed by parentheses, inside which you can place any arguments that the function expects:

```python
result = greet("Alice")
```

In this example, the `greet` function is called with the argument "Alice", and the returned string "Hello, Alice!" is stored in the variable `result`.

### Parameters and Arguments
- Parameters are variables listed inside the parentheses in the function definition.
- Arguments are values that are sent to the function when it is called.

### Function Scope
Variables defined within a function are local to that function and cannot be accessed outside of it. However, functions can access variables defined in their containing scope.

### Types of Functions
1. **Built-in Functions**: Python comes with many built-in functions like `print()`, `len()`, `type()`, etc.
2. **User-defined Functions**: Functions defined by the users themselves.
3. **Anonymous Functions**: Also known as lambda functions, they are defined using the `lambda` keyword.
  
### Function with Multiple Parameters and Default Values
You can define a function with multiple parameters and even set default values for them:

```python
def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"

greet("Alice") # Prints "Hello, Alice!"
greet("Alice", "Hi") # Prints "Hi, Alice!"
```

Here, `greeting` has a default value of "Hello".

### Returning Multiple Values
A function can return multiple values as a tuple or another collection type:

```python
def min_max(numbers):
    return min(numbers), max(numbers)

min_max([1, 2, 3, 4, 5]) # Returns (1, 5)
```

### Docstrings
It's a good practice to include a documentation string (docstring) to describe what the function does:

```python
def greet(name):
    """This function greets the person passed in as a parameter."""
    return f"Hello, {name}!"

greet("Alice") # Prints "Hello, Alice!"
```

Functions can also return items defined in the function, such as variables, lists, dictionaries, etc. Lets apply a function to the example data:

In [194]:
def filter_by_age(df,var,min_age, max_age):
    
    df_filter = df[(df[var] >= min_age) & (df[var] <= max_age)]

    return df_filter

# Apply the function to the example dataset
filter_by_age(df,'Age',30, 40)


Unnamed: 0,Name,Age,Occupation
1,Bob,30,Doctor
2,Charlie,35,Artist
3,David,40,Lawyer


You have the option to specify the function components by position, but its safer to specify them explicitly, like this:

In [195]:
filter_by_age(df,var = 'Age',min_age = 30, max_age = 40)

Unnamed: 0,Name,Age,Occupation
1,Bob,30,Doctor
2,Charlie,35,Artist
3,David,40,Lawyer


## Introduction to Lambda Functions

### Basics of Lambda Functions

Lambda functions are often used for quick, inline operations on DataFrames. They are defined using the `lambda` keyword. Lambda functions can take any number of arguments, but they can only have one expression. They are often used with the apply method. 

The `apply` function is often associated with data manipulation in Python, particularly when working with the `pandas` library. It's used for applying a function along the axis of a DataFrame (either rows or columns) or to elements in a Series. The `apply` function is highly flexible and can be used for a variety of data transformation tasks.

### Basic Syntax

```python
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
```

- `func`: The function to apply.
- `axis`: Axis along which the function is applied. `0` for applying function to each column, and `1` for applying function to each row.
- `raw`: Determines if the function should receive ndarray objects instead of Series.
- `result_type`: Control the type of output (like `expand`, `reduce`, `broadcast`, or `None`).
- `args`: Tuple, positional arguments to pass to the function.

### Use-cases

1. **Data Transformation**: Apply complex calculations to data.
2. **Aggregation**: Aggregate data according to some criteria.
3. **Cleaning**: Apply cleaning operations to data, like filling `NaN` values or converting types.

### Limitations

- **Performance**: Applying operations row-wise can be computationally expensive.
- **Debugging**: Errors can be harder to debug because you're applying a function in a single line of code.

### Alternatives

- Vectorized operations: Faster but not always possible.
- `applymap()` for element-wise operations on a DataFrame.
- `map()` for element-wise operations on a Series.

### How to Use Them

In [196]:
# Example 1: Using lambda to square the 'Age' column
df['Age_squared'] = df['Age'].apply(lambda x: x ** 2)
print(df)
print("")
# Example 2: Using lambda with `filter` to get Names of people older than 30
older_than_30 = list(filter(lambda x: x > 30, df['Age']))
print(older_than_30)
print("")
# Example 3: Using lambda to create a new column with length of names
df['Name_length'] = df['Name'].apply(lambda x: len(x))
print(df)

      Name  Age Occupation  Age_squared
0    Alice   25   Engineer          625
1      Bob   30     Doctor          900
2  Charlie   35     Artist         1225
3    David   40     Lawyer         1600
4      Eva   45  Scientist         2025

[35, 40, 45]

      Name  Age Occupation  Age_squared  Name_length
0    Alice   25   Engineer          625            5
1      Bob   30     Doctor          900            3
2  Charlie   35     Artist         1225            7
3    David   40     Lawyer         1600            5
4      Eva   45  Scientist         2025            3


## Introduction to Dictionaries

### Basics of Dictionaries in the Context of DataFrames

Dictionaries in Python are a collection of key-value pairs enclosed in curly braces (`{}`). In the context of DataFrames, which are a core part of the `pandas` library, dictionaries are frequently used for various tasks like renaming columns, mapping values, aggregating data, and more.

#### Syntax of Dictionaries

The basic syntax for a dictionary is:

```python
my_dict = {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
```

Here, `'key1', 'key2', 'key3'` are the keys, and `'value1', 'value2', 'value3'` are their corresponding values. Keys must be unique and immutable (often strings, numbers, or tuples), while values can be of any data type.

#### Renaming Columns

You can use dictionaries to rename the columns of a DataFrame. The keys in the dictionary are the current column names, and the corresponding values are the new names you want to set.

In [197]:
# Create a DataFrame
df3 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Rename columns
rename_dict = {'A': 'X', 'B': 'Y'}
df3.rename(columns=rename_dict, inplace=True)

print(df3)

   X  Y
0  1  4
1  2  5
2  3  6




#### Mapping Values

You can use a dictionary to map existing values in a column to new values. This is particularly useful for categorical data.


In [198]:
# Create a DataFrame
df2 = pd.DataFrame({'Grade': ['A', 'B', 'C', 'D']})

# Create mapping dictionary
grade_mapping = {'A': 4, 'B': 3, 'C': 2, 'D': 1}

# Map values
df2['Numeric_Grade'] = df2['Grade'].map(grade_mapping)

print(df2)

  Grade  Numeric_Grade
0     A              4
1     B              3
2     C              2
3     D              1


In [199]:
# Create a DataFrame
df0 = pd.DataFrame({
    'Category': ['Fruit', 'Vegetable', 'Fruit', 'Vegetable'],
    'Item': ['Apple', 'Carrot', 'Banana', 'Broccoli'],
    'Price': [1, 0.8, 0.5, 1.2]
})

# Aggregation rules
agg_rules = {
    'Price': ['mean', 'sum']
}

# Group and aggregate
grouped_df = df0.groupby('Category').agg(agg_rules)

print(grouped_df)

          Price     
           mean  sum
Category            
Fruit      0.75  1.5
Vegetable  1.00  2.0


In this example, the dictionary `agg_rules` specifies that we want to find the mean and sum of the 'Price' for each 'Category'.

#### Replacing Values

You can use dictionaries to replace values in a DataFrame with other values.

In [200]:
# Create a DataFrame
df1 = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})

# Replacement dictionary
replace_dict = {1: 'one', 2: 'two', 'a': 'alpha', 'b': 'beta'}

# Replace values
df1.replace(replace_dict, inplace=True)

print(df1)

  col1   col2
0  one  alpha
1  two   beta
2    3      c




Using dictionaries in the context of DataFrames allows for efficient and readable code. Whether you're renaming columns, mapping values, or aggregating data, dictionaries provide a versatile way to manage your DataFrame transformations.

### Creating and Manipulating Dictionaries

In [201]:
# Example 1: Renaming columns using a dictionary
df = df.rename(columns={'Name': 'Full Name', 'Age': 'Age in Years'})
print(df)

  Full Name  Age in Years Occupation  Age_squared  Name_length
0     Alice            25   Engineer          625            5
1       Bob            30     Doctor          900            3
2   Charlie            35     Artist         1225            7
3     David            40     Lawyer         1600            5
4       Eva            45  Scientist         2025            3


In [202]:
# Example 2: Replacing values using a dictionary
df['Occupation'].replace({'Engineer': 'Mechanical Engineer', 'Doctor': 'Physician'}, inplace=True)
print(df)

  Full Name  Age in Years           Occupation  Age_squared  Name_length
0     Alice            25  Mechanical Engineer          625            5
1       Bob            30            Physician          900            3
2   Charlie            35               Artist         1225            7
3     David            40               Lawyer         1600            5
4       Eva            45            Scientist         2025            3


In [203]:
# Example 3: Aggregating using a dictionary
agg_rules = {'Age in Years': ['sum','mean']}
result = df.agg(agg_rules)
print(result)

      Age in Years
sum          175.0
mean          35.0


## Introduction to List Comprehension

### Basics of List Comprehension

List comprehension is a powerful feature in Python that allows for a concise and readable way to create lists. It's often used in data manipulation tasks, including when working with pandas DataFrames. Essentially, list comprehensions let you create a new list by applying an expression to each element in an existing collection (or satisfying some condition).

#### Basic Syntax

The basic syntax of a list comprehension looks like this:

```python
[expression for item in iterable]
```

- `expression`: The value you want to include in the new list.
- `item`: A variable that takes the value of each element in the iterable.
- `iterable`: The collection you are iterating over.

#### Simple Example

For example, to create a list of the squares of numbers from 0 to 9, you can use:

In [204]:
squares = [x**2 for x in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

#### Conditional Statements

In [205]:
even_squares = [x**2 for x in range(10) if x > 5]
even_squares

[36, 49, 64, 81]

Here we calculate the squares of the numbers in the range greater than 5.

### When and How to Use It

In [206]:
# Example 1: Creating a new list of names in uppercase
upper_names = [name.upper() for name in df['Full Name']]
print(upper_names)

['ALICE', 'BOB', 'CHARLIE', 'DAVID', 'EVA']


In [207]:
# Example 2: Creating a new DataFrame column using list comprehension
df['Is_Elderly'] = ['Yes' if age > 40 else 'No' for age in df['Age in Years']]
print(df)

  Full Name  Age in Years           Occupation  Age_squared  Name_length  \
0     Alice            25  Mechanical Engineer          625            5   
1       Bob            30            Physician          900            3   
2   Charlie            35               Artist         1225            7   
3     David            40               Lawyer         1600            5   
4       Eva            45            Scientist         2025            3   

  Is_Elderly  
0         No  
1         No  
2         No  
3         No  
4        Yes  


In [208]:
# Example 3: Using list comprehension with multiple conditions
df['Life_Stage'] = ['Young' if age < 30 else 'Middle-aged' if age < 50 else 'Old' for age in df['Age in Years']]
print(df)

  Full Name  Age in Years           Occupation  Age_squared  Name_length  \
0     Alice            25  Mechanical Engineer          625            5   
1       Bob            30            Physician          900            3   
2   Charlie            35               Artist         1225            7   
3     David            40               Lawyer         1600            5   
4       Eva            45            Scientist         2025            3   

  Is_Elderly   Life_Stage  
0         No        Young  
1         No  Middle-aged  
2         No  Middle-aged  
3         No  Middle-aged  
4        Yes  Middle-aged  


## Merging and Joining Data

### Basics of Joining Data

Pandas provides various ways to combine DataFrames including `.merge()` for database-style joins.

In [209]:
df

Unnamed: 0,Full Name,Age in Years,Occupation,Age_squared,Name_length,Is_Elderly,Life_Stage
0,Alice,25,Mechanical Engineer,625,5,No,Young
1,Bob,30,Physician,900,3,No,Middle-aged
2,Charlie,35,Artist,1225,7,No,Middle-aged
3,David,40,Lawyer,1600,5,No,Middle-aged
4,Eva,45,Scientist,2025,3,Yes,Middle-aged


In [210]:
# Example 1: Creating another DataFrame to join with the original
data2 = {'Full Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]}
df_add = pd.DataFrame(data2)

# Merging the two DataFrames on 'Full Name'
merged_df = pd.merge(df, df_add, on='Full Name',how = 'inner')
print(merged_df)

  Full Name  Age in Years           Occupation  Age_squared  Name_length  \
0     Alice            25  Mechanical Engineer          625            5   
1       Bob            30            Physician          900            3   

  Is_Elderly   Life_Stage  Salary  
0         No        Young   50000  
1         No  Middle-aged   60000  


In [211]:
# Example 2: Left join
left_joined_df = pd.merge(df, df_add, on='Full Name', how='left')
print(left_joined_df)

  Full Name  Age in Years           Occupation  Age_squared  Name_length  \
0     Alice            25  Mechanical Engineer          625            5   
1       Bob            30            Physician          900            3   
2   Charlie            35               Artist         1225            7   
3     David            40               Lawyer         1600            5   
4       Eva            45            Scientist         2025            3   

  Is_Elderly   Life_Stage   Salary  
0         No        Young  50000.0  
1         No  Middle-aged  60000.0  
2         No  Middle-aged      NaN  
3         No  Middle-aged      NaN  
4        Yes  Middle-aged      NaN  


In [212]:
# Example 3: Outer join
outer_joined_df = pd.merge(df, df_add, on='Full Name', how='outer')
print(outer_joined_df)

  Full Name  Age in Years           Occupation  Age_squared  Name_length  \
0     Alice            25  Mechanical Engineer          625            5   
1       Bob            30            Physician          900            3   
2   Charlie            35               Artist         1225            7   
3     David            40               Lawyer         1600            5   
4       Eva            45            Scientist         2025            3   

  Is_Elderly   Life_Stage   Salary  
0         No        Young  50000.0  
1         No  Middle-aged  60000.0  
2         No  Middle-aged      NaN  
3         No  Middle-aged      NaN  
4        Yes  Middle-aged      NaN  


# Tutorial 4

First
- Use your `setup.ipynb` to pull the latest version of the class repo
- Then create a notebook for this tutorial and rename it to  to \<your_name>\<Lecture_4_Tutorial>
- Share with me: jan5020@gmail.com

Then do the following
- import the SA_maize data and call it df_mz
- import the SA_soybean data and call it df_sb

Now do the following
- Using loops 
    - use a loop to add "mz_" to all the column names in df_mz except for the first column
    - use a loop to add "sb_" to all the column names in df_sb except for the first column
- Using applying functions
    - create a function to fix remove the spaces in the numerical columns and convert them to floats
    - Apply it to the df_mz and df_sb
- Using lambda functions
    - calculate the average maize and soybean yield using lambda functions
- Using list comprehension
    - Use list comprehension to check if the white maize price is higher then the yellow maize price and create a new column called white_higher. Use boolean values for this column.
- Using dictionaries
    - create a dictionary to rename the columns in df_mz to the following: 
        - mz_area_planted: mz_arp
        - mz_production: mz_prod
        - sb_area_planted: sb_arp
        - sb_production: sb_prod
- Merging two dataframes
    - Using the prod_year column as the key, join the two dataframes together with an outer join and call it df
- Plotting
    - Plot the average maize and soybean yield using a line plot


