# Intro to Python

Click the little play button in the cell below!

In [None]:
print("Hello world!")

Congratulations! You just ran your first python program!

Python is a high-level, interpreted programming language known for its simplicity, readability, and versatility. As an interpreted language, Python has an interpreter that reads the code and executes it line by line, converting it into bytecode, which is then executed by the Python Virtual Machine (R and MatLab are also interpreted languages). This process allows for flexibility and ease of development. It was created by [Guido van Rossum](https://pythoninstitute.org/about-python#:~:text=Python%20was%20created%20by%20Guido,called%20Monty%20Python's%20Flying%20Circus.) and first released in 1991. Python emphasizes code readability with its clear and concise syntax, which resembles natural language, making it accessible to beginners and experienced programmers alike.

Python's popularity stems from several key factors:

* **Ease of Learning and Use:** Python's simple syntax and readability make it easy to learn, understand, and write code. Its beginner-friendly nature attracts a wide range of developers, from novices to seasoned professionals.

* **Versatility:** Python is a multipurpose language suitable for various applications, including web development, data analysis, artificial intelligence, machine learning, scientific computing, automation, and more. Its extensive standard library and vast ecosystem of third-party packages contribute to its versatility.

* **Community and Ecosystem:** Python boasts a large and active community of developers worldwide. This vibrant community contributes to the development of libraries, frameworks, and tools, enriching the Python ecosystem. The availability of numerous resources, forums, and tutorials facilitates learning and collaboration.

* **Scalability and Performance:** While Python is often criticized for its performance compared to lower-level languages like C or C++, its performance has improved significantly over the years. Additionally, Python's simplicity allows for rapid development, making it *well-suited for prototyping and iterating on ideas quickly*.

* **Cross-Platform Compatibility:** Python is platform-independent, meaning code written in Python can run on various operating systems, including Windows, macOS, and Linux, with minimal modifications.

Overall, Python's combination of simplicity, versatility, community support, and scalability has contributed to its widespread adoption and enduring popularity among developers across different industries and domains.



## Using Python and Jupyter Notebooks


For this Demo we make use of Google Colab and Jupyter Notebooks.

Jupyter Notebook is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text making it popular among researchers, educators, and data scientists.

Some key features of Jupyter Notebooks include:

* **Interactive Computing:** Jupyter Notebook provides an interactive computing environment where you can write and execute code in cells, allowing for rapid experimentation and iterative development.

* **Rich Output:** Jupyter Notebook supports rich output formats, including HTML, images, videos, and interactive widgets, enabling the creation of dynamic and visually appealing documents.

* **Markdown Support:** In addition to code cells, Jupyter Notebook allows you to include markdown cells for writing formatted text, equations, and documentation alongside your code. e.g

$$
\int_{a}^{b} x^2 \, dx
$$
(A nice Markdown cheat sheet from kaggle can be [here](https://www.kaggle.com/code/cuecacuela/the-ultimate-markdown-cheat-sheet))
* **Data Visualization:** Jupyter Notebook integrates with libraries such as Matplotlib, Seaborn, and Plotly for creating interactive visualizations directly within the notebook.

* **Collaboration and Sharing:** Jupyter Notebook documents can be easily shared with others via email, Dropbox, GitHub, or the Jupyter Notebook Viewer, facilitating collaboration and reproducibility of research findings.

**Google Colab** (Colaboratory) is a cloud-based Jupyter Notebook environment provided by Google, which allows you to write, execute, and share Python code directly in your web browser. It offers free access to GPU and TPU resources, making it particularly useful for training machine learning models and conducting data analysis tasks that require significant computational power. For more resources on using use this Google Colab and Markdown check out this great [cheat sheet](https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Cheat_sheet_for_Google_Colab.ipynb) by Tanu-N-Prabhu.

If you dont want to use Google colab you can also [install Juypter](https://jupyter.org/install) locally. Here is little [cheet sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Jupyter_Notebook_Cheat_Sheet.pdf) on using Jupyter locally.
## Python's Basic Syntax

* **Comments**:

Single-line comments start with a `#`.
Multi-line comments are enclosed within triple quotes (`'''` or `"""`).

* **Indentation**:

Python uses indentation to define blocks of code (e.g., loops, conditional statements, functions).
Indentation is typically four spaces or one tab.

* **Variables and Data Types**:

  * Variables are containers for storing data.
  * Python is dynamically typed, meaning you don't need to declare the data type explicitly.
  * Common data types include integers (`int`), floats (`float`), strings (`str`), booleans (`bool`), lists (`list`), dictionaries (`dict`), and sets (`sets`). *Use the function* `type( )` *to check the data type of an object.*
  
* **Basic Arithmetic Operations**:

  * Addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`), modulus (`%`), exponentiation (`**`), and floor division (`//`).
* **Strings**:

Strings are sequences of characters enclosed in single (') or double (") quotes.
String concatenation using +, repetition using *, and string interpolation with f-strings (formatted strings). For example

```python
 x = 2
 print(f" x = {x}")
```
ouputs
```
x=2
```




In [None]:
# Basic Syntax
# Variable assignment
x = 3
y = 10

# Arithmetic operations
sum_result = x + y
difference_result = x - y
product_result = x * y
quotient_result = x / y
modulo_result = y%x
exponentiation_result = x**y
# Print results
print(f"Sum: {sum_result}")
print(f"Difference: {difference_result}")
print(f"Product: {product_result}")
print(f"Quotient: {quotient_result}")
print(f"Modulus: {modulo_result}")
print(f"Exponentiation: {exponentiation_result}")


**Exercise.**

Compute $$
36\  \text{mod}\  5^2
$$

In [None]:
# Write code here.

In [None]:
# @title Solution
ans = 36 % (5**2)
print(f"answer = {ans}")


## Data Structures
In Python, data structures are specialized formats for organizing, storing, and managing data efficiently. They provide a way to store and manipulate collections of data, and Python offers several built-in data structures, each with its own characteristics and use cases.Understanding when and how to use each data structure is essential for writing efficient and effective Python code.

**Lists**:

* Lists are *ordered collections of items*, represented by square brackets `[ ]`.
* Lists are mutable, meaning they can be modified after creation by adding, removing, or changing elements.
* Lists can contain elements of different data types, including integers, floats, strings, and even other lists.
* Elements in a list are accessed using zero-based indexing, and negative indices can be used to access elements from the end of the list.


In [None]:
#Example use of lists

# Creating a list
fruits = ["apple", "banana", "cherry", "date"]
print(fruits)

In [None]:
# Accessing elements
print("First:", fruits[0])
print("Last:", fruits[-1])

In [None]:
# Modifying elements
fruits[1] = "orange"
print(fruits)

In [None]:
# Adding elements
fruits.append("grape")
print(fruits)

In [None]:
# Removing elements
del fruits[2]

In [None]:
# Iterating over elements
print("Items in list:")
for fruit in fruits:
    print(fruit)

In [None]:
# Length of the list
print("Length of list:", len(fruits))


**Sets**:

* Sets are *unordered collections of unique elements*, represented by curly braces `{ }`.
* Sets are mutable
* Sets support mathematical set operations such as union, intersection, difference, and symmetric difference.
* Elements in a set are not indexed, so you cannot access elements by index or slice a set.
Sets are commonly used for membership testing, removing duplicates from lists, and performing set operations.

In [None]:
# Example use of sets

# Creating a set
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

print(set1)
print(set2)

In [None]:
# Adding elements to a set
set1.add(6)

# Removing elements from a set
set2.remove(8)




In [None]:
# Membership testing
# Is 8 still in set2?
print("Is 8 in set2?", 8 in set2)

# Is 3 in set1?
print("Is 3 in set1?", 3 in set1)


In [None]:
# Set operations
union_of_sets = set1.union(set2)
intersection_of_sets = set1.intersection(set2)
difference_of_sets = set1.difference(set2)
print("Union of set1 and set2:", union_of_sets )
print("Intersection of set1 and set2:", intersection_of_sets )
print("Difference of set1 and set2:", difference_of_sets  )

**Exercise**:
Remove duplicates from the list `[1, 2, 3, 4, 1, 2, 5]`

In [None]:
# Write your code here

In [None]:
# @title Solution

# Removing duplicates from a list
list_with_duplicates = [1, 2, 3, 4, 1, 2, 5]
set_of_unique = set(list_with_duplicates)
list_of_unique = list(set_of_unique)
print(list_of_unique)

# Another possible solution
temp_set = set()
for item in list_with_duplicates:
  temp_set.add(item)

list_of_unique = list(temp_set)
print(list_of_unique)

**Strings**:

* Strings are sequences of characters, enclosed in single (`'`) or double (`"`) quotes.
* They are immutable, meaning you cannot change them after creation.
* Strings support various operations like concatenation, slicing, and formatting.

In [None]:
# Example use of strings

# Define a string
greeting = "Hello, World!"

# Accessing characters in a string
print("First character:", greeting[0])
print("Last character:", greeting[-1])



In [None]:
# Slicing strings
print("Substring:", greeting[7:12])

In [None]:

# String concatenation
name = "Alice"
message = "Welcome, " + name + "!"
print(message)

In [None]:
# String formatting with f-strings
age = 25
formatted_message = f"Hello, {name}! You are {age} years old."
print(formatted_message)



In [None]:
# String methods
print("Uppercase:", greeting.upper())
print("Lowercase:", greeting.lower())
print("Length:", len(greeting))
print("Split:", greeting.split(","))

In [None]:
# Checking substrings
print("Contains 'World'?", "World" in greeting)


In [None]:
# String repetition
stars = "*" * 10
print(stars)


In [None]:
# Removing whitespace
text_with_spaces = "  Hello, World!  "
print("Stripped:", text_with_spaces.strip())


In [None]:
# Finding substring
index = greeting.find("World")
print("Index of 'World':", index)



**Dictionaries**:

* Dictionaries are unordered collections of key-value pairs, represented by curly braces `{ }`.
Each key-value pair in a dictionary is separated by a colon : and keys and values are separated by commas.
* Keys in a dictionary must be unique and immutable (such as strings, numbers, or tuples), while values can be of any data type and mutable.
* Dictionaries are mutable, meaning you can add, remove, or modify key-value pairs after creation.
* Elements in a dictionary are accessed using keys rather than indices.




In [None]:
# Example of a dictionary

# Creating a dictionary to represent information about a person
person = {
    "name": "John Doe",
    "age": 30,
    "city": "New Orleans",
    "email": "john.doe@example.com"
}


In [None]:

# Accessing elements of the dictionary
print(f"Name:", person["name"])
print("Age:", person["age"])
print("City:", person["city"])
print("Email:", person["email"])


In [None]:
# Modifying elements of the dictionary
person["age"] = 35
person["city"] = "San Francisco"
print(person)

In [None]:
# Adding a new key-value pair to the dictionary
person["phone"] = "123-456-7890"
print(person)


In [None]:
# Deleting a key-value pair from the dictionary
del person["email"]
print(person)

In [None]:
# Iterating over keys of the dictionary
print("Keys:")
for key in person.keys():
    print(key)

# Alternativly we dont need to specify .keys()
print("\nKeys:")
for key in person:
    print(key)

In [None]:
# Iterating over values of the dictionary
print("Values:")
for value in person.values():
    print(value)

In [None]:
# Iterating over key-value pairs of the dictionary
print("Key-Value Pairs:")
for key, value in person.items():
    print(key, ":", value)


## Control Flows


Control flow refers to the order in which the individual statements and instructions in a program are executed. In Python, control flow is managed through conditional statements (`if`, `elif`, `else`) and loops (`for`, `while`).

**Conditional Statements:**


* `if`: Executes a block of code if a specified condition is true.
* `elif`: Stands for "else if". Allows you to check multiple conditions after the initial "if" statement.
* `else`: Executes a block of code if none of the preceding conditions are true.

**Loops:**

* `for` Iterates over a sequence (such as a list or range) and executes a block of code for each item in the sequence.
* `while`: Executes a block of code repeatedly as long as a specified condition is true. The loop terminates when the condition becomes false.

**Control Flow Keywords:**

* `break`: Exits the loop prematurely, skipping the remaining iterations.
* `continue`: Skips the current iteration of a loop and proceeds to the next iteration.
* `pass`: Acts as a placeholder and does nothing when executed. Used to avoid syntax errors in empty code blocks.


Control flow allows you to make decisions and repeat actions based on conditions, enabling your programs to respond dynamically to different scenarios and inputs. Understanding and mastering control flow is essential for writing flexible and efficient Python code.

In [None]:
# Control flow example

# Conditional statement (if-elif-else)
num = 10
if num > 0:
    print("Number is positive")
elif num == 0:
    print("Number is zero")
else:
    print("Number is negative")

In [None]:

# Loop example (for loop)
fruits = ["apple", "banana", "cherry"]
print("Fruits:")
for fruit in fruits:
    print(fruit)

In [None]:

# Loop example (while loop)
count = 1
print("Counting from 1 to 5:")
while count <= 5:
    print(count)
    count += 1 # This is equivalant to as count = count + 1

In [None]:

# Control flow keywords (break and continue)
print("Skipping even numbers and stopping at 7:")
for i in range(1, 11):
    if i % 2 == 0:
        continue  # Skip even numbers
    elif i == 7:
        print(i)
        break  # Stop at 7
    print(i)

*Note. A `range` object is distinct from a `list` object. A range object represents an immutable sequence of numbers generated dynamically based on start, stop, and step parameters. Consequently, range objects are memory-efficient because they only store the start, stop, and step parameters, rather than the entire sequence of numbers. You can iterate over a range object using a for loop or convert it to a list or tuple if you need to access all the elements.*

In [None]:
# Create a range object
my_range = range(1, 10, 2)
print(my_range)

In [None]:
# It is common place to iterate over the range object
for num in my_range:
    print(num)

Observe that `range(1,10,2)` excludes $10$. We can think of range taking values in the clopen interval. That is, `range(a,b)` takes values $[a,b)$.

In [None]:
# Convert the range object to a list
list_range = list(my_range)
print(list_range)


#### List comprehension

List comprehension in Python is a concise way to create lists. It allows you to generate a new list by applying an expression to each item in an existing iterable (like a list or range), optionally filtering items with a condition. They have the form

```python
[expression for item in iterable if condition]
```
* expression: The value or computation you want to include in the new list.
* item: The current item from the iterable.
* iterable: The collection of items you're iterating over.
* condition (optional): A filter that decides which items to include in the new list.

List comprehension provides concise code equivalent to the following:
```python
my_list =[]
for item in iterable:
  if condition:
    my_list.append(expression)
```

Let's take a look at some examples. First we create a lists of squares of numbers in two ways.

In [None]:
squares = []
for x in range(10):
  squares.append(x**2)
print(squares)

We can reduce the above code to one line using list comprehension.

In [None]:
# Example list comprehension

squares = [x**2 for x in range(10)]
print(squares)

**Exercise:**
Using list comprehension construct a list of the even numbers up to 10 (and including).

In [None]:
# Write your code here

In [None]:
# @title Solution
even_numbers = [ x for x in range(11) if x%2==0]
print(even_numbers)


## Putting it all together: Functions
Functions in programming are blocks of code that perform a specific task or computation. They are used to *organize code into reusable modules*, making it easier to manage and maintain.

In [None]:
# Example function
def mystery(x, y):
    return (x + y) * (x + y)

**Exercise**: Write a function that computes $n!$
Remember that $n!=n(n-1)!$ and $0!=1$.

In [None]:
def factorial(n):
  #Write your code here

In [None]:
# @title Solution

# Recursive way
def factorial(n):
  if n==0:
    return 1
  else:
    return n*factorial(n-1)
print(factorial(6))

# Another way
def factorial(n):
  if n==0:
    return 1
  else:
    value = 1
    for x in range(1,n+1):
      value= x*value
    return(value)

print(factorial(6))

**Exercise**:
Write a function that computes the $n$th [fibonacci number](https://en.wikipedia.org/wiki/Fibonacci_sequence).
$$
a_n = a_{n-1} + a_{n-2}
$$
with $a_0 = 0$ and $a_1=1$.

In [None]:
def fibonacci(n):
  # Wite your code here

In [None]:
# @title Solution

# Recursive way
def fibonacci(n):
  if n <=1:
    return n
  else:
    return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(6))

# Dynamic progamming way
def fibonacci(n):
  arr = [0,1]
  for i in range(2,n+1):
    arr.append(arr[i-1]+arr[i-2])
  return(arr[n])

print(fibonacci(6))




**Exercise**: Write a function that takes in two DNA sequences and computes the [Hamming distance](https://chatgpt.com/c/a401a1e4-67f5-4e49-accd-34e4a1db8ff7) between the two. Recall the Hamming distance is given by the minimum number of substitutions required to change one string into the other.

In [None]:
# Hint: You this zip(,) function is helpful

def Hamming(str_1,str_2):
  for x,y in zip(str_1,str_2):
    # Write your code here
    print(x,y)

dna_seq1 = "ATCGATCGATCG"
dna_seq2 = "ATAGATCGATCG"

Hamming(dna_seq1,dna_seq2)


In [None]:
# @title Solution

def Hamming(str_1,str_2):
  diff_count = 0
  for x,y in zip(str_1,str_2):
    if x != y:
      diff_count+=1
  return(diff_count)

dna_seq1 = "ATCGATCGATCG"
dna_seq2 = "ATAGATCGATCG"

Hamming(dna_seq1,dna_seq2)


**Exercise**:

You are given a list of words. Write a function `word_indices` that takes a list of words and returns a dictionary where the keys are the words themselves, and the values are lists containing the indices (positions) of occurrences of each word in the list.

Input:
```python
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
```
Output:
```python
{
    "apple": [0, 2, 5],
    "banana": [1, 4],
    "orange": [3]
}
```

*Hint:  This `enumerate(,)` function is helpful. We need to add keys and values to a dictionary as well as update values given a key try `.append()`.*


In [None]:

def word_indicies(list_of_words):
  indicies = {}
  for idx, word in enumerate(list_of_words):
    # Write code here
    print(idx,word)

words = ["apple", "banana", "apple", "orange", "banana", "apple"]

word_indicies(words)


In [None]:
# @title Solution

def word_indicies(list_of_words):
  indicies = {}
  for idx, word in enumerate(list_of_words):
    if word in indicies.keys():
      indicies[word].append(idx)
    else:
      indicies[word] = [idx]
  return indicies

words = ["apple", "banana", "apple", "orange", "banana", "apple"]

word_indicies(words)


**Exercise**: Write a function that checks if a number is prime or not.


In [None]:
def is_prime(n):
  # Write your code here

In [None]:
# @title Solution

def is_prime(n):
  if n<=1:
    return True
  for x in range(2,n):
    if n%x ==0:
      return False
  return True



Using this function and list comprehension create a function that returns a list of primes up to `n`.

In [None]:
# Write your code here

In [None]:
# @title Solution

def primes(n):
  return [ x for x in range(n+1) if is_prime(x)]

print(primes(15))


## Classes

A class in Python is a blueprint for creating objects with similar properties and behaviors. Think, *custom data structure.* It serves as a template that defines the structure and behavior of objects, including attributes (data) and methods (functions). Objects created from a class are instances of that class and possess the attributes and behaviors defined by the class.

Let's create a class that represents complex numbers.

*Note: [cmath](https://docs.python.org/3/library/cmath.html) already exists as a package mathematical functions for complex numbers.*

In [None]:
import math

class ComplexNumber:
    def __init__(self, real, imag):
        self.real = real
        self.imag = imag

    def __add__(self, other):
        return ComplexNumber(self.real + other.real, self.imag + other.imag)

    def __sub__(self, other):
        return ComplexNumber(self.real - other.real, self.imag - other.imag)

    def __mul__(self, other):
        real_part = self.real * other.real - self.imag * other.imag
        imag_part = self.real * other.imag + self.imag * other.real
        return ComplexNumber(real_part, imag_part)

    def __truediv__(self, other):
        denom = other.real**2 + other.imag**2
        real_part = (self.real * other.real + self.imag * other.imag) / denom
        imag_part = (self.imag * other.real - self.real * other.imag) / denom
        return ComplexNumber(real_part, imag_part)

    def norm(self):
        return math.sqrt(self.real**2 + self.imag**2)

    def __str__(self):
        if self.imag >= 0:
            return f"{self.real} + {self.imag}i"
        else:
            return f"{self.real} - {abs(self.imag)}i"


* Class Definition: We define a class named ComplexNumber to represent complex numbers with real and imaginary parts.
* Initializer: The `__init__ `method initializes the real and imaginary parts of the complex number.
* Operator Overloading: We define methods `__add__`, `__sub__`, `__mul__`, and `__truediv__` to overload the `+`, `-`, `*`, and `/` operators for complex number arithmetic.
* `norm` method that calculates the norm (magnitude) of the complex number using the Euclidean distance formula.
* String Representation: The `__str__` method provides a human-readable string representation of the complex number.


In [None]:

# Create instances of ComplexNumber
c1 = ComplexNumber(3, 2)
c2 = ComplexNumber(1, 7)

print("c1:", c1)
print("c2:", c2)
print("Norm of c1:", norm_c1)
print("Norm of c2:", norm_c2)




In [None]:

# Perform operations
sum_result = c1 + c2
diff_result = c1 - c2
product_result = c1 * c2
quotient_result = c1 / c2

# Print results
print("Sum:", sum_result)
print("Difference:", diff_result)
print("Product:", product_result)
print("Quotient:", quotient_result)

## Packages

One of the main benefits of python is the rich eco-system of open source content. Importing packages in Python is allows you to extend the functionality of your programs by using code written by others and empowers you to build sophisticated, feature-rich applications quickly and efficiently by leveraging existing code, specialized functionality, and the collective expertise of the Python community. It's a fundamental aspect of Python programming that enables you to achieve more with less effort. Many packages provide specialized functionality that would be challenging or time-consuming to implement on your own.

There are many great python packages out there. To name a few:
* [`numpy`](https://numpy.org/doc/stable/) -  is a powerful library for numerical computing in Python, providing support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays efficiently. (Think matlab)
* [`matplotlib`](https://matplotlib.org/stable/users/index) -  is a comprehensive library for creating static, animated, and interactive visualizations, offering a wide variety of plotting functions and customization options to visualize data in various formats and styles. (Again, think matlab. Its called "*mat* " plotlib!)
* [`sklearn`](https://scikit-learn.org/stable/user_guide.html) - is a versatile machine learning library in Python, offering simple and efficient tools for data mining and data analysis, including classification, regression, clustering, dimensionality reduction, model selection, and preprocessing techniques.
* [`pandas`](https://pandas.pydata.org/docs/user_guide/10min.html) - is a powerful data manipulation and analysis library providing high-performance, easy-to-use data structures and functions for working with structured data.
* [`tensorflow`](https://www.tensorflow.org/learn) - is an open-source deep learning framework developed by Google, designed for building and training neural networks and other machine learning models efficiently, offering support for both CPU and GPU computation, distributed computing, and production deployment.

It's impossible to know everything so your best friend when coding is [google](google.com) (and sometimes an [LLM](https://en.wikipedia.org/wiki/Large_language_model)). However, if you really want to know what is going on (and you should!) **always read the documentation** when using packages.



### Numpy

Let's jump into using numpy.

Run the following code cell to import the `numpy` module.

In [None]:
import numpy as np

#### Vector and matrix operations

We can think of the object `np.array` as either vectors and matrices depending the shape.

Call `np.array` to create a NumPy array with your own values.

In [None]:
x = np.array([1,-2])
print(x)

 With `x = np.array([1,-2])` we can think of $x = \begin{bmatrix}1 \\ -2  \end{bmatrix} \in \mathbb{R}^2$. We can check the dimension with `x.shape`.

In [None]:
print(f"Dimension: {x.shape}")

We can also create matrices by passing a list of lists into `np.array()`.
For example, with
`A = np.array([[a,b],[c,d]])` we can think of $A = \begin{bmatrix}a&b \\ c&d  \end{bmatrix} \in \mathbb{R}^{2\times 2}$.

In [None]:
A = np.array([[1, 1], [0, 1]])
print(A)
print(f"Dimension: {A.shape}")

We can compute $Ax=b\in\mathbb{R}^2$ using `np.matmul(A,x)` or `A.dot(x)`.

In [None]:
b=np.matmul(A,x)
print(b)
print(f"Dimension: {b.shape}")
print(np.matmul(A,x) == A.dot(x))

Given $A\in \mathbb{R}^{2\times 2}$ and $b\in \mathbb{R}^2$ we can solve $Ax=b$ using `np.linalg.solve(A,b)`.

In [None]:
A = np.array([[1, 2], [3, 5]])
b = np.array([1, 2])
x = np.linalg.solve(A, b)
print(x)

Compute eigenvalues and eigenvectors of a matrix `A` using `np.linalg.eig(A)`.

In [None]:
A = np.array([[1, 0], [0, -2]])
eigen_info = np.linalg.eig(A)

# Check what type np.linalg.eig returns
print(type(eigen_info))

values = eigen_info.eigenvalues
vectors =  eigen_info.eigenvectors

print(values)

print(vectors[0])

A.dot(vectors[0])

We can compute the matrix product $AB$ with `np.matmul(A,B)` or `A.dot(B)`.

In [None]:
# Example matrix multiplication
A = np.array([[1, 1], [0, 1]])

B = np.array([[0,1],[1,0]])

AB = np.matmul(A,B)
print(AB)

print(AB==A.dot(B))

We can obtain $A^T$ with  `A.transpose`.

In [None]:
print(A.transpose())

We can add two vectors of the same size with (`+`), and we can also scale vectors with `*`.

In [None]:
x = np.array([1,2,-1,3])
y = np.array([1,0,3,-2])

#Add two vectors
vector_sum = x + y
print(vector_sum)

#Scale vector
scaled_vector= 2*x
print(scaled_vector)

*Note. The* (`*`) *and (`+`) operators in numpy preforms an type of operation called [broadcasting](https://developers.google.com/machine-learning/glossary/#broadcasting). You can simply think of broadcasting as an entry wise operation between two different arrays of some compatable dimension. Broadcasting enables operation by virtually expanding the vector of length and replicating the same values down each column.* For example.

In [None]:
A = np.array([[0,0,0,0],[1,1,1,1]])
x = np.array([1,2,-1,3])

print(A + 2)

print(A*x)

#### Populating generic and random arrays

To populate an array with all zero, call `np.zeros`. To populate an array with all ones, call `np.ones`.

In [None]:
zeros = np.zeros(5)
print(zeros)

ones = np.ones(5)
print(ones)

To populate an array as a sequence of numbers from `a` to `b` (excluding `b`), call `np.arange(a,b)`. This is just like range in vanila python, but with numpy.

In [None]:
sequence = np.arange(3,10)
print(sequence)

To populate an array with `n` equidistant values ranging from `a` to `b` use `np.linspace(a,b,n)`.

In [None]:
sequence = np.linspace(-1,1,10)
print(sequence)

We can also populate arrays with random values.

Suppose we wanted to generate the random vector $\mathbf{X}= \begin{bmatrix}X_1\\ \vdots \\ X_{10} \end{bmatrix}$ where $X_i \stackrel{i.i.d}\sim \text{Unif}[0,1]$. Then we can use `np.random.random(size = (10,))`

In [None]:
X = np.random.random(size = (10,))
print(X)

**Exercise**:
Generate a matrix $A\in \mathbb{R}^{2\times 4}$ where $A_{i,j}\stackrel{i.i.d}\sim \text{Unif}(5,10)$.

*Hint: If*  $X\sim\text{Unif}(a,b)$, *then* $\frac{X-a}{b-a}\sim\text{Unif}(0,1).$

In [None]:
# Write your code here

In [None]:
# @title Solution
U = np.random.random(size = (2,4))* 5 + 5
print(U)

#### Slicing Arrays

Slicing multidimensional NumPy arrays allows you to access specific elements or ranges along different axes of the array.

*For more information no slicing ndarrays check [here](https://numpy.org/doc/stable/user/basics.indexing.html).*



In [None]:
import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])
print(arr)

In [None]:
element = arr[1,1]
print(element)

In [None]:
# Select the first row
row_1 = arr[0, :]
print(row_1)


In [None]:
# Select the second column
col_2 = arr[:, 1]
print(col_2)

In [None]:
# Select a subarray
sub_arr = arr[0:2, 0:2]
print(sub_arr)

We can also pass a boolean mask to select elements.

In [None]:
# Select elements greater than 5
bool_idx = arr > 5
print(bool_idx)


In [None]:
# Use boolean indexing to get the values greater than 5
values = arr[bool_idx]
print(values)

**Exercise:**

You are given a square 2D NumPy array arr of size $n x n$, representing a matrix of integers. Your task is to implement a function `diagonal_difference(arr)` that calculates the absolute difference between the diagonal sums.

e.g

input:
```python
arr = np.array([[11, 2, 4],
                [4, 5, 6],
                [10, 8, -12]])
```
output:
```python
15
```

In [None]:
def diagonal_difference(arr):
  #write your code here
  return abs_diff

In [None]:
# Test the function
arr = np.array([[11, 2, 4],
                [4, 5, 6],
                [10, 8, -12]])

result = diagonal_difference(arr)
print(result)


In [None]:
# @title Solution

def diagonal_difference(arr):
  n = len(arr)
  diag_1 = np.sum([ arr[i,i] for i in range(n)])
  diag_2 = np.sum([ arr[i,n-i-1] for i in range(n)])
  abs_diff= abs(diag_1-diag_2)
  return abs_diff

arr = np.array([[11, 2, 4],
                [4, 5, 6],
                [10, 8, -12]])

result = diagonal_difference(arr)
print(result)

### Matplotlib

Let's jump into using Matplotlib. We will use [Pyplot](https://matplotlib.org/stable/tutorials/pyplot.html) which effectivly makes matplotlib work like MATLAB.

Run the following code cell to import the `matplotlib.pyplot` module.

In [None]:
import matplotlib.pyplot as plt

Lets plot a simple histogram of samples drawn from a normal distribution with mean $3$ and variance $2$.

In [None]:
#Generate a sample
sample = np.random.normal(loc = 3,scale = np.sqrt(2),size=100)
print(sample)

In [None]:
plt.hist(sample,density=True)
plt.title('My Sample')
plt.show()

We can now overlay the density function of the normal distribution with mean $\mu$ and variance $\sigma^2$ given by
$$
f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
$$

In [None]:
# Generate data to evaluate the density
import numpy as np
X = np.linspace(-5, 10, 1000)

# Define density function
def normal_density(x,sigma,mu):
   return np.exp(-(x-mu)**2/(2*sigma**2))/(np.sqrt(2*np.pi)*sigma)

y = normal_density(X,np.sqrt(2),3)
# Equivalently y = np.array([normal_density(x,np.sqrt(2),3) for x in X ])

# Plot the density function
plt.plot(X, y,color = 'orange', label="Normal distribution")

# Plot the histogram of samples
plt.hist(sample,density=True, label="Samples")

# Add labels and legend
plt.xlabel("x")
plt.ylabel("Density")
plt.legend()

# Show the plot
plt.show()


**Exercise**

Plot the eigenvalues of a large (500 dimensional) Wigner matrix and add the density of the semi-circle distribution. *For more information see Anderson, et. al. An introduction to random matrices, Cambridge University Press, page 6.*

*Hint: Use `np.linalg.eig` and `plt.hist`*.

In [None]:
# Use this to plot the semi-cicle density
x = np.linspace(-2,2,1000)
y = np.sqrt(4-x**2)/2/np.pi

n=500
A = np.random.randn(n,n)
W = 1/np.sqrt(n)*((1/np.sqrt(2))*(A+A.T) - np.sqrt(2)*np.diag(np.diag(A)))

# Write your code here

In [None]:
# @title Solution

x = np.linspace(-2,2,1000)
y = np.sqrt(4-x**2)/2/np.pi


n=500

A = np.random.randn(n,n)
W = 1/np.sqrt(n)*((1/np.sqrt(2))*(A+A.T) - np.sqrt(2)*np.diag(np.diag(A)))

eigen_vals = np.linalg.eigvals(W)

plt.hist(eigen_vals,30, range=[-2.5, 2.5],density=True)
plt.plot(x,y)
plt.show()

### Scikit learn

Scikit learn has many a plethora of machine learning and statistics functionality. The process simply consists of constructing a model and applying a fit function. Lets look at a few examples.



#### Linear Regression

Given observations $X_i$ and $Y_i$ where
$$
Y_i = X_i\beta_1 + \beta_0 + ɛ_i
$$

where $ɛ_i$ are independent mean zero normal random varaibles. We like to estimate $\beta_1$ and $\beta_0$. This is done by computing the $\beta_1$ and $\beta_0$ which minimize sum of squared errors,
$$
SSE:=\sum \big(Y_i - (X_i\beta_1 + \beta_0)\big)^2.
$$

(for more info on [linear regression](https://en.wikipedia.org/wiki/Linear_regression))

In [None]:
# Generate some random data
X = np.linspace(0,10,100)
y = 2 * X + 3 + np.random.randn(100)

In [None]:
from sklearn.linear_model import LinearRegression

# Create the linear regression model
model = LinearRegression()


# Fit the model to the data

# We need to first properly reshape the data otherwise we get an error.
X = X.reshape(-1,1)
model.fit(X, y)

print(f"Coef estimate: {model.coef_}")
print(f"Intercept estimate: {model.intercept_}")


**Exercise:** Use `matplotlib` to plot the synthetic data `X` and `y` and the regression line.

*Hint: try `plt.scatter` and `plt.plot`*

In [None]:
# Write your code here

In [None]:
# @title Solution

y_predict = X*model.coef_[0] + model.intercept_
plt.scatter(X,y)
plt.plot(X,y_predict,color="orange")
plt.show()


#### k-means
k-means is a simple and popular algorithm for clustering convex data sets.
The algorithm is an interative processes and is as follows:
```
input: k, data, max_num_of_iterations
c_1,...,c_k = k randomly chosen data points
i = 0
while i <= max_num_of_iterations:
  C_1,...,C_k = data points closest to c_1,...,c_k
  c_1,...,c_k = mean(C_1),...,mean(C_k)
  i = i+1
return  C_1,...,C_k, c_1,...,c_k
```

 First lets simulate and plot some synthetic data.

In [None]:
# Generate synthetic data
X_1 = np.random.multivariate_normal([-1, 2], [[1, 0.5], [0.5, 1]], size=200)
X_2 = np.random.multivariate_normal([2, -1], [[1, 0.5], [0.5, 1]], size=200)
data = np.concatenate((X_1,X_2))

# Plot data
plt.scatter(data[:, 0], data[:, 1])
plt.show()


Let's now use sklearn and apply k-means to cluster this data.

In [None]:
# Apply k-means clustering
from sklearn.cluster import KMeans

# Create the k-means Model
kmeans = KMeans(n_clusters=2,n_init = 'auto')

# Fit the Model
kmeans.fit(data)

# Look at the clusters obtained
print(kmeans.cluster_centers_[0])
print(kmeans.cluster_centers_[1])

Let's plot the obtained clusters.

In [None]:
import matplotlib.pyplot as plt

# Plot the data points
plt.scatter(data[:, 0], data[:, 1], c=kmeans.labels_)

# Plot the cluster centers
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='x', color='red')

# Add labels and title
plt.xlabel("X_1")
plt.ylabel("X_2")
plt.title("Scatter plot of X_1 and X_2 with cluster labels")

# Show the plot
plt.show()


**Exercise:** Using the previous data lets try and implement k-means from scratch using only `numpy`.

*Hint: `np.argmin` may be useful.*

In [None]:
def k_means(k,data,max_iterations=10):
  # Pick k centers and place them in a list
  centroid = data[np.random.choice(range(len(data)),k),:]
  i = 0
  while i<= max_iterations:
    # Write some below
    cluster = []
    centroid = []
    i+=1

  return centroid


In [None]:
# @title Solution
def k_means(k,data,max_iterations=10):
  # Pick k centers and place them in a list
  centroid = data[np.random.choice(range(len(data)),k),:]
  i = 0
  while i<= max_iterations:
    # Construct a list that keeps track of the clostest centroid to each point
    closest = []
    for point in data:
      distances = [np.sum((center - point)**2) for center in centroid]
      closest_centroid = np.argmin(distances)
      closest.append(closest_centroid)
    closest=np.array(closest)
    # Compute new centroids based on the closest
    for j in range(k):
      new_centroid = np.mean(data[closest==j ],axis=0)
      centroid[j] = new_centroid
    i+=1
  return centroid



# More compactly Using list comprehension

def k_means(k,data,max_iterations=10):
  centroid = data[np.random.choice(range(len(data)),k),:]
  i = 0
  while i<= max_iterations:
    closest = np.array([np.argmin([np.sum((center - point)**2) for center in centroid]) for point in data])
    centroid = np.array([ np.mean(data[closest==j ],axis=0) for j in range(k)])
    i+=1
  return centroid

k_means(2,data,max_iterations=15)

### Keras/Tensorflow

[TensorFlow](https://www.tensorflow.org/) is an open-source machine learning framework developed by Google. It's one of the most popular tools for building and deploying machine learning models, particularly deep learning models. TensorFlow provides a comprehensive ecosystem of tools, libraries, and community resources that make it easier for developers and researchers to create and deploy machine learning applications. [Keras](https://keras.io/) is a high-level neural networks API (Application Programming Interface) seamlessly integrated into TensorFlow (Jax and Pytorch), allowing users to access Keras's high-level API while leveraging TensorFlow's scalability, performance, and ecosystem.

Let's build a simple neural network that classifies hand written digits.

In [None]:
# Import mnist hand written data from keras data set
# Load data as test train split

from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Size of training images
print(train_images.shape)

# Size of training labels
print(train_labels.shape)

# Size of training images
print(test_images.shape)

# Size of training labels
print(test_labels.shape)


In [None]:
# Lets look at the data

import matplotlib.pyplot as plt

digit = train_images[4]
print(f'Image shape: {digit.shape}')

plt.imshow(digit, cmap=plt.cm.binary)
plt.show()
print(f' Label: {train_labels[4]}')

Building a model requires 3 things minimum.
1. **Optimizer** - Mechanism through which the model will update itself based on the training data
2. **Loss function** - measures performance as feedback to optimizer
3. **Metrics** - Monitor performace during training and testing
      * Classification accuracy = fraction of images correctly classified


In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# Define Model
# Two dense layers
model = keras.Sequential([layers.Dense(512,activation="relu"),layers.Dense(10,activation="softmax")])

In [None]:
model.compile(optimizer="rmsprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])

In [None]:
train_images = train_images.reshape((60000, 28 * 28)) #Flatting 2 dim image data to vector
train_images = train_images.astype("float32") / 255 # Rescale values between [0,1]
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

In [None]:
history = model.fit(train_images, train_labels, epochs=5, batch_size=128)
# Store accuracy and loss while training
acc = history.history['accuracy']
loss = history.history['loss']

Lets plot the accuracy and loss during training to get a sense of what happened.

In [None]:
epochs = range(1, len(acc) + 1)

# Plotting training accuracy
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.title('Training accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()


In [None]:

# Plotting training and validation loss
plt.plot(epochs, loss, 'b', label='Training loss')
plt.title('Training loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()



Test on the first 10 images of the testing set

In [None]:
test_digits = test_images[0:10] # Select first 10 images
predictions = model.predict(test_digits) # Make prediction

In [None]:
# See prediction of first
print(predictions[0]) #See predicited probability of image being in one class
prediction = predictions[0].argmax()
print(prediction)
print(predictions[0][7]) #Probability of sample coming from the class 7

Evaluate the model on the test set

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc: .2f}")