# 12. List Comprehension

List Comprehension allows us to create new list from the value of an existing iterable.

\\
What is an iterable? \\
- An iterable is any object that can return its members one at a time.
E.g. lists, tuples, strings, sets, dictionaries, etc.

- A simple way to think of iterable is an object that can be used in `for` loop.

\\
The general format for a list comprehension is:
``` python
new_list = [expression for item in iterable]
```

Let's see an example. Suppose, we have a list `numbers = [1, 2, 3]` . Now, we want to create a `new list` that contains the square of the elements of the list `numbers`.

\\
 Now, how should we approach this?

In [None]:
# General approach

numbers = [1, 2, 3]
squared_numbers = []

for num in numbers:
  sq_num = num ** 2
  squared_numbers.append(sq_num)

print(squared_numbers)

[1, 4, 9]


We can observe that this approach takes a lot of steps and code.

\\
We can shorten these steps using list comprehension.

```python
new_list = [new_item for item in list]
```

In [None]:
# List Comprehension Approach

numbers = [1, 2, 3]

squared_numbers = [num ** 2 for num in numbers]

print(squared_numbers)

[1, 4, 9]


### Conditional List Comprehension
We can also use `if` condition in a list comprehension for additional filtering.

The format for conditional list comprehension is:
```python
new_list = [expression for item in iterable if condition]
```

Say we have a list `numbers = [1, 2, 3, 4]`. We want to create a `new list` that contains the square of the elements of the list `numbers` that are `even`.

In [None]:
numbers = [1, 2, 3, 4]

# Apply conditional list comprehension for square of even numbers in `numbers` list
even_square_numbers = [num ** 2 for num in numbers if (num % 2) == 0]

print(even_square_numbers)

[4, 16]


### QUESTION
You have a list of integers. Create a new list that contains only the multiples of 3 from the original list using list comprehension.

In [None]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# code here

# 13. Functions in Python

* Functions in Python are reusable blocks of code designed to perform a specific task.

* They help in organizing code, reducing redundancy, and improving readability.

* Return:

    * Functions can return a value using the return statement.
    
    * A function without a return statement returns None by default.

Syntax:
```
# Standard Function definition
def my_function(param1:int, param:str) -> bool:
    """
    This is a docstring.

    Args:
        param1 (int): The first parameter.
        param2 (str): The second parameter.

    Returns:
        bool: The return value. True for success, False otherwise.
    """
    pass

# Function call
my_function(4, "A")
```

In [None]:
# Example function to add two numbers

def add_two_numbers(x, y):
    return x + y

# Positional function call
sum1 = add_two_numbers(10, 20)
print(sum1)

#keyword argument function call
sum2 = add_two_numbers(y = 12, x = 231)
print(sum2)

30
243


## Returning Multiple Values in Python

### Using Tuples

In Python, you can return multiple values from a function by returning them as a tuple. A tuple is a collection which is ordered and immutable

In [None]:
def test2():
    return 'abc', 100, [0, 1, 2]

result = test2()
print(result)
print(type(result), "\n")

# unpacking
a,b,c = test2()
print(a)
print(b)
print(c)

('abc', 100, [0, 1, 2])
<class 'tuple'> 

abc
100
[0, 1, 2]


### Return a list from a function
By using [], you can return a list instead of a tuple.

In [None]:
def test_list():
    return ['abc', 100]

result = test_list()
print(result)
print(type(result))

['abc', 100]
<class 'list'>


### QUESTION
Write a function named split_name that takes a full name and returns the first and last names.

HINT: use `string.split()` function to split the name

In [None]:
def split_name(fullname):
  '''split first and last name'''
  # code here
  return '''return first and last name'''

first, last = split_name('Tech Axis')
print(first)
print(last)

### Topics you can explore !!
There is more to explore in this topic. Some Topics include:
1. High Order Function
2. Decorator
3. Lambda Functions

# 15. Exception Handling

### Errors and Exceptions

Generally, there are two types of errors in Python: `syntax errors` and `exceptions`.

- Syntax Errors:
  - Errors in the syntax of the code, which are detected by the Python interpreter before the program is run.

  - Also known as parsing errors.

\\
- Exceptions:
  - Even if the code is syntactically correct, errors may occur while the program is being executed.

  - Errors detected during execution are called exceptions.

  - These can be handled by the programmer to prevent the program from crashing.

  - Examples include `ZeroDivisionError`, `TypeError`, `KeyError`, `NameError`, `IndexError`, etc.

In [None]:
# Syntax Error
if True
  print("Hello")


# Note: output will be SyntaxError

In [None]:
# Exceptions
result = 10 / 0
print(result)


# Note: output will be ZeroDivisionError

### Handling Exceptions
The `try` and `except` block is used to handle exceptions.

- The `try` block contains code that might raise an exception.

- The `except` block is executed if the exception type (that occured during execution of `try` block) matches the exception named after `except` keyword.

In [None]:
def divide(x, y):
  try:
      result = x / y                  # This will raise a ZeroDivisionError if y = 0
      return result

  except ZeroDivisionError:
      print("Cannot divide by zero!") # Code that runs if a ZeroDivisionError exception occurs

# function call
divide(5, 1)

5.0

Now what if there were no exceptions raised in the `try` block. For this case, `else` keyword is used.

- The `else` block runs if no exceptions were raised in the `try` block.

\\
What if we want some code to be executed whether an exception is raised or not? For this case `finally` keywrod is used.

- The `finally` block always runs, regardless of whether an exception was raised or not.

In [None]:
try:
  result = 10 / 2
  print(result)

except ZeroDivisionError:
  print("Cannot divide by zero!")

else:
  # Code that runs if no exceptions were raised in the try block
  print("Division was successful.")

finally:
  # Code that always runs, whether an exception is raised or not
  print("This will always be printed.")


5.0
Division was successful.
This will always be printed.


#16. Pandas and Numpy Introduction

### NumPy

- NumPy (Numerical Python) is a library used for fast and efficient computations of arrays.

- It provides support for large multidimensional array objects and various tools to work with these arrays.

- NumPy arrays calculate advanced mathematical and other types of operations on large numbers of data with
ease.

### Why is a NumPy array preferable to a list?

- NumPy arrays are more efficient than Python lists for numerical operations due to their fixed size and continuous memory allocation.

- NumPy operations are implemented in C, leading to faster execution.

- NumPy provides many built-in functions for array manipulation, making it easier to perform complex operations.

- Element-wise operations can be performed without explicit loops.

##### Creating Arrays

We can define NumPy arrays using lists, tuples, sets, dictionaries, etc.  Lists are commonly used to create arrays.

In [None]:
# Import numpy module
import numpy as np    # np as alias

# Creating a NumPy array using list
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

Array: [1 2 3 4 5]


To create an array filled with zeros, np.zeros function can be used.

---



In [None]:
# Creates a 1D array containing all zeroes
zeros_vector = np.zeros(3)
print(f"Zero Vector: {zeros_vector}")
# Creates a 2D array containing all zeroes
zeros_array = np.zeros((3, 4))
print(f"Zero 2D Matrix:\n {zeros_array}")

Zero Vector: [0. 0. 0.]
Zero 2D Matrix:
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


### Activity

Create a vector filled with 1 and a vector matrix of order 3x3 containing one as its elements.


Hint:
- Use np.ones function

In [None]:
# Creating identity matrix
identity_array = np.identity(3)
identity_array

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

NumPy provides different functions to create random arrays.

In [None]:
# Creates a random array with elements between 0 and 1
random_array = np.random.rand(3, 3)
random_array

array([[0.84909396, 0.34682015, 0.77012702],
       [0.29245703, 0.62021521, 0.59197935],
       [0.80884906, 0.52623864, 0.05346917]])

### Properties of NumPy Array object

- Size: Total number of elements in the array.

- Shape: Tuple containing number of rows and columns of the array.

- ndim: Number of dimensions (axes) of the array.

In [None]:
# Create 2D array
array1 = np.array([[1, 2, 3], [4, 5, 6]])
print(array1)

# Check size of an array
print("Size of Array: ", array1.size)

# Check shape of array
print("Shape of Array: ", array1.shape)

# Check dimension of array
print("Dimension of Array: ", array1.ndim)

[[1 2 3]
 [4 5 6]]
Size of Array:  6
Shape of Array:  (2, 3)
Dimension of Array:  2


#### Activity

Create a 2D NumPy array from lists, and explore basic properties of arrays.

### Array Manipulation

We often need to reshape an array to a new shape without changing the data.

In [None]:
# Create 2D array
array1 = np.array([[1, 2, 3], [4, 5, 6]])
print(array1)

# Check shape of array
print("Shape of Array: ", array1.shape)

# Reshape array to 3x2
array1 = array1.reshape(3, 2)
print(array1)
print("Shape of Array: ", array1.shape)

[[1 2 3]
 [4 5 6]]
Shape of Array:  (2, 3)
[[1 2]
 [3 4]
 [5 6]]
Shape of Array:  (3, 2)


We can obtain transpose of array using `T` property

In [None]:
# Create 2D array
array1 = np.array([[1, 2, 3], [4, 5, 6]])
print(array1)
# Check shape of array
print("Shape of Array: ", array1.shape)

# Transpose
array1 = array1.T
print(array1)
# Check shape of array
print("Shape of Array: ", array1.shape)

[[1 2 3]
 [4 5 6]]
Shape of Array:  (2, 3)
[[1 4]
 [2 5]
 [3 6]]
Shape of Array:  (3, 2)


#### Activity

Create a 1D array and reshape it into a 2D array.

#### Matrix Multiplication

We can use np.matmul for matrix multiplication.

In [None]:
import numpy as np

# Create two 2D matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication
C = np.matmul(A, B)

print("Matrix A:\n", A)
print("Matrix B:\n", B)
print("Matrix C (A @ B):\n", C)


Matrix A:
 [[1 2]
 [3 4]]
Matrix B:
 [[5 6]
 [7 8]]
Matrix C (A @ B):
 [[19 22]
 [43 50]]


We can observe that, Numpy provides huge variety of built-in functions making it easier to perform complex operations.

#### Activity

Create two 2D NumPy arrays and perform a matrix multiplication between them.

#### Performance of NumPy array vs List

In [None]:
import time

# Performance comparison
large_list = list(range(1_000_000))
large_array = np.array(large_list)

start_time = time.time()
list_result = [x * 2 for x in large_list]
print("List operation time:", time.time() - start_time)

start_time = time.time()
array_result = large_array * 2
print("NumPy array operation time:", time.time() - start_time)


List operation time: 0.36269235610961914
NumPy array operation time: 0.009519100189208984


We can observe that the operation time for list operation is very large (slow) as compared to NumPy array operation.

### Pandas

- Pandas is a powerful library for data manipulation and analysis.

- It provides data structures like `DataFrame` and `Series` that are built on top of NumPy arrays.

- A `DataFrame` is a table with rows and columns.

- A `Series` is a sequence of values (1-D array). It is a column of the `dataframe`.

### Why use Pandas?

- Pandas can process data with different data types.

- It provides powerful tools for data cleaning, filtering, and transformation.

- It offers numerous functions for statistical analysis and data aggregation.

- It integrates well with other data science libraries like Matplotlib and Seaborn for visualization, and SciPy for scientific computation.



#### Basic Operations in Pandas

In [None]:
# Importing pandas module
import pandas as pd

In [None]:
# Create a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Nicole', 'John', 'James'],
    'Age': [23, 30, 29, 40, 21, 33, 44],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Texas', 'Ohio', 'Vegas']
}

df = pd.DataFrame(data)

# Display the DataFrame
df

Unnamed: 0,Name,Age,City
0,Alice,23,New York
1,Bob,30,Los Angeles
2,Charlie,29,Chicago
3,David,40,Houston
4,Nicole,21,Texas
5,John,33,Ohio
6,James,44,Vegas


In [None]:
# Display the shape of the DataFrame
df.shape

(7, 3)

In [None]:
# Display the first 5 rows of the DataFrame
df.head()

Unnamed: 0,Name,Age,City
0,Alice,23,New York
1,Bob,30,Los Angeles
2,Charlie,29,Chicago
3,David,40,Houston
4,Nicole,21,Texas


In [None]:
# Display the last 5 rows of the DataFrame
df.tail()


Unnamed: 0,Name,Age,City
2,Charlie,29,Chicago
3,David,40,Houston
4,Nicole,21,Texas
5,John,33,Ohio
6,James,44,Vegas


In [None]:
# Display DataFrame information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    7 non-null      object
 1   Age     7 non-null      int64 
 2   City    7 non-null      object
dtypes: int64(1), object(2)
memory usage: 296.0+ bytes


In [None]:
# Display summary statistics
df.describe()

Unnamed: 0,Age
count,7.0
mean,31.428571
std,8.383658
min,21.0
25%,26.0
50%,30.0
75%,36.5
max,44.0


#### Activity

Create a DataFrame using a dictionary and explore its contents using above inspection methods (df.info(), df.shape, df.head(), df.describe() etc).

In [None]:
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ian', 'Jack'],
    'Age': [24, 27, 22, 32, 29, 31, 28, 24, 26, 30],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose'],
    'Salary': [70000, 80000, 65000, 72000, 69000, 75000, 68000, 73000, 71000, 77000]
}

# Create a DataFrame from the dictionary
df_data = ...

##### Selecting Data

In [None]:
# Select a column
df['Name']

0      Alice
1        Bob
2    Charlie
3      David
4     Nicole
5       John
6      James
Name: Name, dtype: object

In [None]:
print(type(df['Name']))

<class 'pandas.core.series.Series'>


In [None]:
# Select multiple columns
df[['Name', 'City']]

Unnamed: 0,Name,City
0,Alice,New York
1,Bob,Los Angeles
2,Charlie,Chicago
3,David,Houston
4,Nicole,Texas
5,John,Ohio
6,James,Vegas


In [None]:
print(type(df[['Name', 'City']]))

<class 'pandas.core.frame.DataFrame'>


In [None]:
# Select a row by index
df.loc[0]


Name       Alice
Age           23
City    New York
Name: 0, dtype: object

##### Filtering Data

In [None]:
# Filter rows where Age > 30
df[df['Age'] > 30]

Unnamed: 0,Name,Age,City
3,David,40,Houston
5,John,33,Ohio
6,James,44,Vegas


#### Activity

Filter the dataframe from previous activity for salary greater than 70000.

In [None]:
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ian', 'Jack'],
    'Age': [24, 27, 22, 32, 29, 31, 28, 24, 26, 30],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose'],
    'Salary': [70000, 80000, 65000, 72000, 69000, 75000, 68000, 73000, 71000, 77000]
}

# Create a DataFrame from the dictionary
df_data = pd.DataFrame(data_dict)

# Filter data


##### Add new columns

In [None]:
# Add a new column with default values
df['Salary'] = [70000, 80000, 90000, 100000, 500000, 250000, 85000]
df

Unnamed: 0,Name,Age,City,Salary
0,Alice,23,New York,70000
1,Bob,30,Los Angeles,80000
2,Charlie,29,Chicago,90000
3,David,40,Houston,100000
4,Nicole,21,Texas,500000
5,John,33,Ohio,250000
6,James,44,Vegas,85000


##### Renaming Columns

In [None]:
# Rename columns
df.rename(columns={'Name': 'Full Name', 'City': 'Location'}, inplace=True)
df

Unnamed: 0,Full Name,Age,Location,Salary
0,Alice,23,New York,70000
1,Bob,30,Los Angeles,80000
2,Charlie,29,Chicago,90000
3,David,40,Houston,100000
4,Nicole,21,Texas,500000
5,John,33,Ohio,250000
6,James,44,Vegas,85000


##### Dropping Columns

In [None]:
# Drop a column
df.drop(columns=['Salary'], inplace=True)
df


Unnamed: 0,Full Name,Age,Location
0,Alice,23,New York
1,Bob,30,Los Angeles
2,Charlie,29,Chicago
3,David,40,Houston
4,Nicole,21,Texas
5,John,33,Ohio
6,James,44,Vegas


##### Sorting Data

In [None]:
# Sort by Age in ascending order
df.sort_values(by='Age')

Unnamed: 0,Full Name,Age,Location
4,Nicole,21,Texas
0,Alice,23,New York
2,Charlie,29,Chicago
1,Bob,30,Los Angeles
5,John,33,Ohio
3,David,40,Houston
6,James,44,Vegas


In [None]:
# Sort by Age in descending order
df.sort_values(by='Age', ascending=False)


Unnamed: 0,Full Name,Age,Location
6,James,44,Vegas
3,David,40,Houston
5,John,33,Ohio
1,Bob,30,Los Angeles
2,Charlie,29,Chicago
0,Alice,23,New York
4,Nicole,21,Texas


#### Activity

Sort the dataframe from previous activity based on salary (both ascending and descending)

In [None]:
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ian', 'Jack'],
    'Age': [24, 27, 22, 32, 29, 31, 28, 24, 26, 30],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose'],
    'Salary': [70000, 80000, 65000, 72000, 69000, 75000, 68000, 73000, 71000, 77000]
}

# Create a DataFrame from the dictionary
df_data = pd.DataFrame(data_dict)

# Sort data


#### Pandas vs. Plain Python Approach

In [None]:
# Plain Python Approach
# Create a dataset
sales_data = [
    {'Product': 'A', 'Sales': 100},
    {'Product': 'B', 'Sales': 150},
    {'Product': 'C', 'Sales': 200},
    {'Product': 'D', 'Sales': 250},
    {'Product': 'E', 'Sales': 300},
]
print(sales_data)

# Compute the total sales
total_sales = sum(item['Sales'] for item in sales_data)
print(f"Total Sales: {total_sales}")

# Compute the average sales
average_sales = total_sales / len(sales_data)
print(f"Average Sales: {average_sales}")


[{'Product': 'A', 'Sales': 100}, {'Product': 'B', 'Sales': 150}, {'Product': 'C', 'Sales': 200}, {'Product': 'D', 'Sales': 250}, {'Product': 'E', 'Sales': 300}]
Total Sales: 1000
Average Sales: 200.0


In [None]:
# Data transformation
for item in sales_data:
    item['Sales'] = round(int(item['Sales']) * 1.10, 2)
print(sales_data)

[{'Product': 'A', 'Sales': 110.0}, {'Product': 'B', 'Sales': 165.0}, {'Product': 'C', 'Sales': 220.0}, {'Product': 'D', 'Sales': 275.0}, {'Product': 'E', 'Sales': 330.0}]


In [None]:
# Pandas Approach
import pandas as pd

# Create a dataset using a Pandas DataFrame
sales_data = pd.DataFrame({
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Sales': [100, 150, 200, 250, 300]
})

# Compute the total sales
total_sales = sales_data['Sales'].sum()
print(f"Total Sales: {total_sales}")

# Compute the average sales
average_sales = sales_data['Sales'].mean()
print(f"Average Sales: {average_sales}")

sales_data

Total Sales: 1000
Average Sales: 200.0


Unnamed: 0,Product,Sales
0,A,100
1,B,150
2,C,200
3,D,250
4,E,300


In [None]:
# Data Tranformation
sales_data['Sales'] = sales_data['Sales'] * 1.10
sales_data

Unnamed: 0,Product,Sales
0,A,121.0
1,B,181.5
2,C,242.0
3,D,302.5
4,E,363.0


Python also provides a built-in module `csv` to work with csv files.


In [None]:
# We need to use csv module to read and write CSV files
import csv

# Sample data
csv_data = [
    ['Name', 'Age', 'City'],
    ['Ram', '36', 'Bhaktapur'],
    ['Sita', '25', 'Kathmandu'],
    ['Laxman', '35', 'Butwal']
]

# Writing to a CSV file
with open('data.csv', 'w') as file:
    writer = csv.writer(file)
    writer.writerows(csv_data)


In [None]:
# Read the CSV file and display its contents
with open('data.csv', 'r') as file:
    reader = csv.reader(file, delimiter=",")
    for row in reader:
        print(row)

['Name', 'Age', 'City']
['Ram', '36', 'Bhaktapur']
['Sita', '25', 'Kathmandu']
['Laxman', '35', 'Butwal']


In [None]:
import pandas as pd

# Load data from a CSV file
df = pd.read_csv('data.csv')
df

Unnamed: 0,Name,Age,City
0,Ram,36,Bhaktapur
1,Sita,25,Kathmandu
2,Laxman,35,Butwal


In [None]:
df["Gender"] = ["M", "F", "M"]
df

Unnamed: 0,Name,Age,City,Gender
0,Ram,36,Bhaktapur,M
1,Sita,25,Kathmandu,F
2,Laxman,35,Butwal,M


In [None]:
# Save data to a CSV File
df.to_csv("new_data.csv", index=False)

- We can observe that, Pandas simplifies data loading and saving operations.

- It provides powerful tools for data filtering, transforming, cleaning, aggregating, merging and so on.