<a href="https://colab.research.google.com/github/smartinternz02/SPSGP-102941-Salesforce-Developer-Catalyst-Self-Learning-Super-Badges/blob/main/Module_01-Introduction/Python_Introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Introduction
Greeshma Mandala

## Getting Started

* Colab - get notebook from gitmystuff DTSC5502 repository
* Save a Copy in Drive
* Remove Copy of
* Edit your name
* Clean up Colab Notebooks folder
* Submit shared link

#### Code Along

In [None]:
# code along

## What is Python?

* **Interpreted**: a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. (Compiled languages: Go, C++)
* **Object Oriented**: a programming paradigm based on the concept of "objects", which can contain data and code: data in the form of fields, and code, in the form of procedures. A common feature of objects is that procedures are attached to them and can access and modify the object's data fields.
* **Object**: An object is simply a collection of data (variables) and methods (functions) that act on those data. Similarly, a class is a blueprint for that object. We can think of a class as a sketch (prototype) of a house. It contains all the details about the floors, doors, windows, etc.
* **High-Level Language**: A high-level language (HLL) is a programming language such as C, FORTRAN, or Pascal that enables a programmer to write programs that are more or less independent of a particular type of computer. Such languages are considered high-level because they are closer to human languages and further from machine languages. Python, C#, C++, PHP, Java vs assembly language and machine code
* **Dynamic Semantics**: dynamic objects are instances of values contained into constructs in the code, and they exist at run-time level. Furthermore, we can assign to one object multiple values, since it will update itself, differently from a static semantic language. Namely, if we set a=2 and then a=’hello’, the string value will substitute the integer one as soon as the line is executed
* **Built-In Data Structures**: Organizing, managing, and storing data is important as it enables easier access and efficient modifications. Data Structures allows you to organize your data in such a way that enables you to store collections of data, relate them and perform operations on them accordingly. Python has implicit support for Data Structures which enable you to store and access data. Some of these structures are called List, Dictionary, Tuple and Set.
* **Dynamic Typing**: The term dynamic typing means that a compiler or an interpreter assigns a type to all the variables at run-time. The type of a variable is decided based on its value. The programs written using dynamic-typed languages are more flexible but will compile even if they contain errors.


In [None]:
# # dynamic typing example

# a = 'Hello World!'
# print(a, type(a), hex(id(a)))
# a = 42
# print(a, type(a), hex(id(a)))

# Make It Stick Dynamic Typing

Performance Trade-Off

* **Static Typing**: Because types are checked at compile time, the program knows exactly what kind of data is in each variable before it runs. This allows the compiler to create highly optimized machine code that can execute very efficiently. There is no need for extra checks during runtime, which saves processing time.
* **Dynamic Typing**: With dynamic typing, the program must perform type checks on the fly, as the code is being executed. This adds a small overhead to every operation, as the program has to determine the type of the variable at that exact moment. This continuous process of checking types at runtime makes dynamically typed languages inherently slower than their statically typed counterparts.

While this performance difference exists, it's often not a major issue for most applications. The speed and flexibility of dynamic languages like Python make them ideal for many tasks, such as scripting, web development, and data analysis, where development speed is more important than raw execution speed. For applications where performance is critical (like system software, game engines, or high-frequency trading), a statically typed language is usually the better choice.

# Make It Stick Numerical Interning

Numerical interning is an optimization technique in Python where multiple variables pointing to the same immutable numerical value in a specific range are stored as a single object in memory. This helps to conserve memory and improve the performance of operations involving these numbers.

Python pre-caches or "interns" a range of integers, typically from -5 to 256. When you create a new integer variable within this range, Python doesn't create a new object. Instead, it reuses the existing, single object for that number. This is why if you check the memory addresses of two separate variables assigned the same integer in this range, they will be identical

In [None]:
# # low value integers are pre-allocated (interning)

# a = 10
# print('a', a, hex(id(a)))
# b = 10
# print('b', b, hex(id(b)))

In [None]:
# # standard integer objects created on demand in memory (non-interned)

# a = 1000
# print('a', a, hex(id(a)))
# b = 1000
# print('b', b, hex(id(b)))

## Data Structures

**Primitive Types**

* Integers
* Floats
* Strings
* Booleans

**Non-Primitive Types (Collections)**

* Lists
* Tuples
* Sets
* Dictionaries

## Numbers

Integral (whole numbers)

* Integers
* Booleans

Non-Integral

* Floats
* Complex
* Decimals
* Fractions

## Zero Indexing

In a zero-indexed system, the first element of an iterable (like a list, tuple, or string) is located at index 0, the second at index 1, and so on. This is a fundamental concept in many programming languages, including Python, C++, and Java.

## Interval Notation

The notation tells you whether the endpoints of the range are included or excluded.

A square bracket [ or ] means the endpoint is included in the interval.

A parenthesis ( or ) means the endpoint is excluded from the interval.

So, let's break down the notation [0, 5):

The square bracket [ next to 0 means that 0 is included in the range.

The parenthesis ) next to 5 means that 5 is excluded from the range.

Therefore, the set of numbers represented by [0, 5) includes 0, and all real numbers up to, but not including, 5. This would be the correct way to write it. The notation [0, 1, 2, 3, 4) is incorrect because interval notation represents a continuous range of numbers, not a discrete list of integers.

In [None]:
# for number in range(0, 5):
#     print(number)

In [None]:
# # dynamic typing

# a = 42
# print(a, type(a))
# a = True
# print(a, type(a))
# a = 3.141
# print(a, type(a))
# a = 1/6
# print(a, type(a))

In [None]:
# # https://docs.python.org/3/library/fractions.html
# from fractions import Fraction

# a = 1/6
# print(a, type(a))
# a = Fraction(a).limit_denominator(1000)
# print(a, type(a))

# Make It Stick Collections

Python collections are specialized container data types that provide alternatives to the built-in types like list, dict, and tuple. They are part of the collections module in Python's standard library and are designed to offer more efficient or convenient ways to handle common data structures.

## Collections (Lists, Dictionaries, Tuples, and Sets)

Sequences

* Mutable: Lists
* Immutable: Tuples and Strings

Sets

* Mutable: Sets
* Immutable: Frozen Sets

Mappings

* Dictionaries

### Lists

In [None]:
# # list
# my_list = ['apples', 'bananas', 'oranges']
# my_list

In [None]:
# # multiple lines
# my_list = ['apples', # comment
#     'bananas',  # comment
#     'oranges',  # comment
#     ]
# my_list

In [None]:
# # mixed datatypes
# my_mixed_list = ['hello', 42, 13.1]
# my_mixed_list

#### Code Along

In [None]:
# mutable list code along

### Tuples

In [None]:
# # tuple
# my_tuple =  ('apples', 'carambola', 'oranges')
# my_tuple

#### Code Along

In [None]:
# immutable tuple code along


In [None]:
# # immutable string
# s = 'Hello World!'
# print(s, type(s))
# s[0] = 'J'

### Sets

In [None]:
# # set
# my_set = {'apples', 'carambola', 'oranges', 'apples'}
# my_set

We can't change a value in a set like we did for lists. We have to remove the item and then add a new item. Keep in mind sets are not ordered.

In [None]:
# # add and remove from set but can't change a value
# my_set.add('nuts')
# my_set

#### Code Along

In [None]:
# code along

In [None]:
# # dictionary
# my_dict = {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
# my_dict

In [None]:
# # change values by key
# my_dict['key1'] = 'valueX'
# my_dict

In [None]:
# # adding to a dictionary
# my_dict['key4'] = 'value4'
# my_dict

# Make It Stick Callables

A Python **callable** is any object that can be called, much like a function. If an object is callable, you can use the `()` operator on it. The built-in `callable()` function is a simple way to check if an object can be called.

**Types of Callables**

The most common types of callables in Python include:

* **Functions**: Both built-in functions (like `len()`) and user-defined functions are callable.
* **Methods**: Methods are functions that belong to a class. For example, `list.append()` is a callable method.
* **Classes**: When you "call" a class (e.g., `list()`), you're actually calling its constructor (`__init__`) to create a new instance of that class.
* **Class instances with a `__call__` method**: If a class defines a special method called `__call__`, instances of that class can be called like functions. This allows objects to behave as functions.
* **Lambda functions**: These are small, anonymous functions defined with the `lambda` keyword and are, by nature, callable.

The concept of a callable is a fundamental part of Python's object-oriented nature, as it allows for a flexible and unified way to interact with different types of objects.

### Callables

* User-Defined Functions
* Classes
* Built-in Functions (e.g. len(), abs(), range(), etc.)
* Built-in Methods (e.g. my_list.append(x), my_list.extend(other_list), etc.)

#### Code Along

In [None]:
# # user defined function
# def my_funct(name):
#     return f'Hello {name}'

# # code along

#### Code Along

In [None]:
# # class
# class InfoKart:

#     def __init__(self, name1, name2):
#         self.name1 = name1
#         self.name2 = name2

#     def on_your_mark(self):
#         return f'Drivers! {self.name1} and {self.name2}. On your mark...'

# race = InfoKart('Toadette', 'Yoshi')
# # code along

# Make It Stick Built-ins

Python **built-ins** are fundamental functions, types, and constants that are always available for use in any Python program without needing to be imported. They form the core of the language, providing essential, commonly-used functionality.

**Key Categories of Built-ins**

* **Functions**: These are pre-defined functions you can call directly, such as `print()` for displaying output, `len()` for getting the length of a sequence, and `range()` for generating a sequence of numbers.
* **Types**: These are the basic data structures of Python, including numeric types like `int` and `float`, sequences like `list` and `str`, and mappings like `dict`.
* **Constants**: These are special, unchanging values like `True`, `False`, and `None`.

**Why They Exist**

Built-ins make Python a highly productive language by providing a ready-to-use set of tools for common tasks. They are located in the `__builtins__` module, which is automatically loaded into the global namespace of every script. This convenience allows developers to start writing code immediately without a lot of boilerplate imports.

### Built-ins

Python built-ins are a set of fundamental functions, types, and exceptions that are always available for use without needing to be explicitly imported. They are the core components of the Python language, providing essential functionality for a wide range of tasks.

**Key Categories of Built-ins**

* **Built-in Functions:** These are pre-defined functions you can call directly. Some common examples include:
    * `print()`: Displays output to the console.
    * `len()`: Returns the number of items in an object.
    * `range()`: Generates a sequence of numbers.
    * `type()`: Returns the type of an object.
    * `str()`, `int()`, `float()`: Convert values to string, integer, or floating-point numbers.

* **Built-in Constants:** These are special values that are always available:
    * `True` and `False`: Boolean values.
    * `None`: Represents the absence of a value.
    * `__debug__`: A constant that is `True` by default and `False` if the Python interpreter is started with the `-O` (optimize) option.

* **Built-in Types:** These are the fundamental data structures that form the basis of most Python programs:
    * **Numeric Types:** `int` (integers), `float` (floating-point numbers), `complex` (complex numbers).
    * **Sequence Types:** `list` (mutable sequences), `tuple` (immutable sequences), `str` (strings).
    * **Mapping Type:** `dict` (dictionaries).
    * **Set Types:** `set` (mutable sets), `frozenset` (immutable sets).
    * **Boolean Type:** `bool`.

* **Built-in Exceptions:** These are errors that can be raised during program execution. They are part of a hierarchy of exception classes, and handling them is crucial for writing robust code. Some examples include:
    * `TypeError`: Raised when an operation is performed on an object of an inappropriate type.
    * `ValueError`: Raised when a function receives an argument of the correct type but an inappropriate value.
    * `NameError`: Raised when a local or global name is not found.
    * `ZeroDivisionError`: Raised when division or modulo by zero occurs.

**The `__builtins__` Module**

The built-ins are located in a module called `__builtins__`. When you run a Python program, this module's contents are automatically loaded into the global namespace of the script, which is why you don't need to import them. If you try to redefine a built-in, your local version will take precedence, but it's generally a bad practice as it can lead to confusion and bugs.

The availability of built-ins makes Python a highly productive language, allowing developers to immediately start writing code without worrying about importing basic utilities.

In [None]:
# # built-in function
# a = range(1, 6)
# print(a, len(a)) # [1 - 6)

In [None]:
# # built-in methods
# a = [1, 2]
# b = [3, 4]
# t = (a, b)
# print(a, hex(id(a)))
# print(b, hex(id(b)))
# print(t, hex(id(t)))

# print(a[0], hex(id(a[0])))
# a[0] = 5
# print(a[0], hex(id(a[0])), hex(id(a)))

# a.append(3)
# b.extend(a)
# print(a, hex(id(a)))
# print(b, hex(id(b)))
# print(t, hex(id(t)))

### Len and Range

* Len is short for length
* Range is sequence of numbers, start through (stop - 1), step range(start, stop, step)

In [None]:
# # len
# a = range(1, 6)
# print(a, len(a)) # [1 - 6)

## For and Comprehensions

### For

#### Code Along

In [None]:
# range code along

In [None]:
# # range(start, stop, step)
# for i in range(1, 6, 2):
#     print(i, end=', ')

In [None]:
# # reverse order
# for i in range(6, 0, -1):
#     print(i, end=', ')

### Comprehenson

#### Code Along

In [None]:
# list comprehension code along


In [None]:
# # dictionary comprehension
# {str(i): i**2 for i in range(6)}

In [None]:
# # for loop list
# my_kart = ['Baby Daisy', 'Baby Luigi', 'Baby Mario']
# for name in my_kart:
#     print(name)

In [None]:
# # nested for loop

# my_kart = [['Baby Daisy', 'Baby Luigi', 'Baby Mario'], ['Birdo', 'Bowser', 'Donkey Kong'], ['Princess Peach', 'Isabelle', 'Koopa Troopa']]
# for team in my_kart:
#     for name in team:
#         print(team, name)

In [None]:
# # nested comprehension
# [print(team, name) for team in my_kart for name in team]

#### Code Along

In [None]:
# # enumerate
# my_kart = ['Baby Daisy', 'Baby Luigi', 'Baby Mario']
# # code along

## If Elif Else

In [None]:
# # if else
# my_kart = ['Baby Daisy', 'Baby Luigi', 'Baby Mario']
# for i, name in enumerate(my_kart):
#     if name == 'Baby Luigi':
#         print(i, f'{name} is here')
#     else:
#         print('Baby Luigi\'s not at this index')

In [None]:
# # if elif else
# my_kart = ['Baby Daisy', 'Baby Luigi', 'Baby Mario']
# for i, name in enumerate(my_kart):
#     if name == 'Baby Daisy':
#         print(i, f'{name} is here')
#     elif name == 'Baby Luigi':
#         print(i, f'{name} is here')
#     else:
#         print(i, 'Baby Mario could be here')

## Errors

https://docs.python.org/3/library/exceptions.html

In [None]:
# # zero division error
# a = 10
# b = 0
# a / b

In [None]:
# # prompt: give some code that shows how to handle zerodivisionerror

# from fractions import Fraction

# # code example
# print('Hello World!')
# print('Good to see you!')

# # ... (rest of your existing code)

# # zero division error with try-except block
# a = 10
# b = 0
# try:
#     a / b
# except ZeroDivisionError as e:
#     print("Error: Division by zero:", e)

In [None]:
# # type error
# a = 10
# b = '1'
# a + b

## Try Except Break Continue

In [None]:
# # zero divison exception
# a = 10
# b = 0
# try:
#     a / b
# except ZeroDivisionError:
#     print('Ooops! Division by 0 not allowed.')

In [None]:
# # type error and finally
# a = 10
# b = '1'
# try:
#     a + b
# except TypeError:
#     print('Ooops! Adding a string and number causes problems')
# finally:
#     print('But, I can still do things')

In [None]:
# # operators https://www.tutorialspoint.com/python/python_basic_operators.htm
# a = 0
# b = 3
# while a < 3:
#     a += 1
#     b -= 1
#     print(a, b)

# print('All Done')

In [None]:
# # break
# a = 0
# b = 3
# while a < 4:
#     a += 1
#     b -= 1
#     print(a, b)
#     try:
#         a / b
#     except ZeroDivisionError:
#         print('Ooops!')
#         break
#     print('Still in while loop')

# print('All Done')

In [None]:
# # continue
# a = 0
# b = 3
# while a < 4:
#     a += 1
#     b -= 1
#     print(a, b)
#     try:
#         a / b
#     except ZeroDivisionError:
#         print('Ooops!')
#         continue
#     print('Still in while loop')

# print('All Done')

## Numpy

* Scalars
* Vectors
* Matrices

In [None]:
# # list to array
# import numpy as np

# my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# print(type(my_list))
# my_array = np.array(my_list)
# print(type(my_array)) # n-dimensional array

In [None]:
# # view list
# my_list

In [None]:
# # view array
# my_array

In [None]:
# # 3 x 3 array
# my_kart = [['Baby Daisy', 'Baby Luigi', 'Baby Mario'], ['Birdo', 'Bowser', 'Donkey Kong'], ['Princess Peach', 'Isabelle', 'Koopa Troopa']]
# my_kart_array = np.array(my_kart)
# my_kart_array

In [None]:
# # np.zeroes
# np.zeros((3, 3))

In [None]:
# # arrays from arange and reshape
# import numpy as np

# my_ndarray = np.arange(1, 5).reshape(2, 2)
# my_ndarray

In [None]:
# # scalar multiplaction
# my_ndarray * 2

In [None]:
# # scalar division
# my_ndarray / 2

In [None]:
# # element wise multiplication
# my_ndarray * my_ndarray

In [None]:
# # dot product
# import numpy as np

# array1 = np.array([1, 2, 3, 4])
# array2 = np.array([5, 6, 7, 8])

# # Calculate the dot product
# dot_product_result = np.dot(array1, array2)

# print(f"Array 1: {array1}")
# print(f"Array 2: {array2}")
# print(f"The dot product of the two arrays is: {dot_product_result}")

In [None]:
# # dot product
# print([list(a) for a in my_ndarray], end='')
# print()
# print([list(a) for a in my_ndarray], end='')
# np.dot(my_ndarray, my_ndarray)
# # [1 * 1 + 2 * 3, 1 * 2 + 2 * 4], [3 * 1 + 4 * 3, 3 * 2 + 4 * 4]

The `np.dot(my_ndarray, my_ndarray)` command performs **matrix multiplication**, multiplying the NumPy array `my_ndarray` by itself. For this operation to be valid, `my_ndarray` must be a **square matrix** (e.g., 2x2, 3x3, etc.), because the number of columns in the first matrix must match the number of rows in the second.

The result is a new matrix where each element is the **dot product** of a row from the first matrix and a column from the second.

### Matrix Multiplication Example

Let's assume `my_ndarray` is a 2x2 matrix:

$$\text{my\_ndarray} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$$

The calculation of `np.dot(my_ndarray, my_ndarray)` (or `my_ndarray @ my_ndarray`) results in a new matrix:

$$\text{Result} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \times \begin{bmatrix} a & b \\ c & d \end{bmatrix} = \begin{bmatrix} (a \cdot a + b \cdot c) & (a \cdot b + b \cdot d) \\ (c \cdot a + d \cdot c) & (c \cdot b + d \cdot d) \end{bmatrix}$$

Let's use a numerical example with a specific matrix:

$$\text{my\_ndarray} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$$

The result is calculated element by element:

* **Top-left element:** (Row 1 of first matrix) $\cdot$ (Column 1 of second matrix)
    $$(1 \cdot 1) + (2 \cdot 3) = 1 + 6 = 7$$

* **Top-right element:** (Row 1 of first matrix) $\cdot$ (Column 2 of second matrix)
    $$(1 \cdot 2) + (2 \cdot 4) = 2 + 8 = 10$$

* **Bottom-left element:** (Row 2 of first matrix) $\cdot$ (Column 1 of second matrix)
    $$(3 \cdot 1) + (4 \cdot 3) = 3 + 12 = 15$$

* **Bottom-right element:** (Row 2 of first matrix) $\cdot$ (Column 2 of second matrix)
    $$(3 \cdot 2) + (4 \cdot 4) = 6 + 16 = 22$$

The final result is the matrix:

$$\begin{bmatrix} 7 & 10 \\ 15 & 22 \end{bmatrix}$$

The `print` statements in your example (`[list(a) for a in my_ndarray]`) simply display the contents of the NumPy array as a list of lists; they do not perform any mathematical operation.

In [None]:
# # https://nbviewer.org/github/jmportilla/Udemy-notes/blob/master/Lec%209%20-Indexing%20Arrays.ipynb
# import numpy as np

# my_ndarray = np.arange(1, 17).reshape(4, 4)
# my_ndarray

In [None]:
# my_ndarray[0][0]

A two-dimensional NumPy array is structured like a grid or matrix, with rows and columns. To access a specific element, you provide its location using two indices: the **row index** and the **column index**.

  * The first index, `[0]`, refers to the **row**. Python uses **zero-based indexing**, so `0` corresponds to the first row.
  * The second index, `[0]`, refers to the **column**. Again, `0` corresponds to the first column.

So, `my_ndarray[0][0]` gets the element at the intersection of the **first row** and the **first column**.

### Example with `my_ndarray`

Given `my_ndarray = np.arange(1, 17).reshape(4, 4)`, the array looks like this:

```
[[ 1  2  3  4]   <- row 0
 [ 5  6  7  8]   <- row 1
 [ 9 10 11 12]   <- row 2
 [13 14 15 16]]  <- row 3
```

In this matrix, the element at `my_ndarray[0][0]` is the number `1`.

In [None]:
# # row
# my_ndarray[0]

In [None]:
# my_ndarray[:,0]

In [None]:
# my_ndarray[0:1, 0:1]

In [None]:
# my_ndarray[:1,:1]

In [None]:
# my_ndarray[1,1]

In [None]:
# my_ndarray

In [None]:
# my_ndarray()

In [None]:
# my_ndarray[:1,2:4]

In [None]:
# my_ndarray[:2,2:4]

In [None]:
# my_ndarray[2:4,1:3]

In [None]:
# # universal functions
# import random

# np.random.randint(10, size=9).reshape(3, 3)

In [None]:
# # some descriptive statistics
# import numpy as np

# stats_array = np.random.randint(10, size=9).reshape(3, 3)
# print(stats_array)
# print(stats_array.sum())
# print(stats_array.mean())
# print(stats_array.var())
# print(stats_array.std())

## Pandas

In [None]:
# # pandas read_csv
# import pandas as pd

# df = pd.read_csv('https://raw.githubusercontent.com/gitmystuff/Datasets/refs/heads/main/iris.csv')
# df.head()

In [None]:
# arr_test = df.drop('species', axis=1).to_numpy()
# arr_test

In [None]:
# # get a column by name
# df['sepal_width'].head()

In [None]:
# # get two columns by column name
# df[['sepal_width', 'petal_length']].head()

In [None]:
# # get two columns by position
# df[df.columns[1:3]].head()

In [None]:
# # The loc[] function uses the row index and column names
# df.loc[1:3, ['sepal_width', 'petal_length']].head()

In [None]:
# # select two rows and start stop columns
# df.loc[1:3, 'sepal_width': 'petal_width'].head()

In [None]:
# # select 3rd row
# df.loc[2, :]

In [None]:
# # iloc uses index and column numbers
# df.iloc[1:3, 1:3]

In [None]:
# # using just an index
# df.iloc[1]

In [None]:
# # using at
# df.at[1, 'sepal_width']

In [None]:
# # using iat
# df.iat[1, 1]

In [None]:
# # filter by single category
# df[df['sepal_width'] < 3].head()

In [None]:
# # how much did we filter
# print(df.shape)
# print(df[df['sepal_width'] < 3].shape)

In [None]:
# # multiple columns using or
# df[(df['sepal_width'] < 3) | (df['petal_length'] > 4)].shape

In [None]:
# # multiple columns using and
# df[(df['sepal_width'] < 3) & (df['petal_length'] > 4)].shape

In [None]:
# # query
# df.query('`sepal_width` < 3').shape

In [None]:
# # query and
# df.query('`sepal_width` < 3 & `petal_length` > 4').shape

## Matplotlib and Seaborn

In [None]:
# import matplotlib.pyplot as plt

# # Sample data
# x = [1, 2, 3, 4, 5]
# y = [2, 4, 1, 3, 5]

# # Create a line plot
# plt.plot(x, y)

# # Add labels and title
# plt.xlabel("X-axis")
# plt.ylabel("Y-axis")
# plt.title("Simple Line Plot")

# # Display the plot
# plt.show()

**Explanation:**

* `import matplotlib.pyplot as plt`: Imports the Matplotlib library's `pyplot` module, which provides functions for creating plots.
* `x` and `y`: These lists hold the data points for the plot.
* `plt.plot(x, y)`: Creates a line plot using the data in `x` and `y`.
* `plt.xlabel()`, `plt.ylabel()`, `plt.title()`: Add labels to the x-axis, y-axis, and the plot title.
* `plt.show()`: Displays the generated plot.

**Additional Matplotlib Examples:**

* **Scatter plot:**

In [None]:
# plt.scatter(x, y)
# plt.xlabel("X-axis")
# plt.ylabel("Y-axis")
# plt.title("Simple Scatter Plot")
# plt.show()

* **Bar chart:**

In [None]:
# categories = ['A', 'B', 'C', 'D']
# values = [10, 15, 5, 20]
# plt.bar(categories, values)
# plt.xlabel("Categories")
# plt.ylabel("Values")
# plt.title("Simple Bar Chart")
# plt.show()

* **Histogram:**

In [None]:
# data = [1, 2, 2, 3, 3, 3, 4, 4, 5]
# plt.hist(data)
# plt.xlabel("Value")
# plt.ylabel("Frequency")
# plt.title("Simple Histogram")
# plt.show()

**Introduction to Seaborn**

In [None]:
# import seaborn as sns
# import matplotlib.pyplot as plt

# # Load a sample dataset
# data = sns.load_dataset('iris')

# # Create a scatter plot with colors based on species
# sns.scatterplot(x='sepal_length', y='sepal_width', hue='species', data=data)

# # Add title
# plt.title("Scatter Plot with Seaborn")

# # Display the plot
# plt.show()

**Explanation:**

* `import seaborn as sns`: Imports the Seaborn library.
* `data = sns.load_dataset('iris')`: Loads the built-in 'iris' dataset from Seaborn.
* `sns.scatterplot(...)`: Creates a scatter plot using Seaborn.
    * `x` and `y`: Specify the columns for the x and y axes.
    * `hue`: Specifies a third column to color the points based on categories.
    * `data`: Specifies the dataset to use.
* `plt.title()`: Adds a title to the plot.
* `plt.show()`: Displays the generated plot.

**Additional Seaborn Examples:**

* **Histogram with density plot:**

In [None]:
# sns.histplot(data['sepal_length'], kde=True)
# plt.title("Histogram with Density Plot")
# plt.show()

* **Box plot:**

In [None]:
# sns.boxplot(x='species', y='sepal_length', data=data)
# plt.title("Box Plot")
# plt.show()

* **Violin plot:**

In [None]:
# sns.violinplot(x='species', y='sepal_length', data=data)
# plt.title("Violin Plot")
# plt.show()

## Sklearn

### Linear Regression

In [None]:
# import pandas as pd
# from sklearn.linear_model import LinearRegression
# from sklearn.model_selection import train_test_split
# from sklearn.metrics import mean_squared_error, r2_score

# # Load the diabetes dataset (or any dataset with numeric features and a target variable)
# from sklearn.datasets import load_diabetes
# diabetes = load_diabetes()
# data = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
# data['target'] = diabetes.target

# # Split data into features (X) and target (y)
# X = data.drop('target', axis=1)
# y = data['target']

# # Split data into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# # Create a linear regression model
# model = LinearRegression()

# # Train the model
# model.fit(X_train, y_train)

# # Make predictions on the test set
# y_pred = model.predict(X_test)

# # Evaluate the model
# mse = mean_squared_error(y_test, y_pred)
# r2 = r2_score(y_test, y_pred)

# print(f"Mean Squared Error: {mse}")
# print(f"R-squared: {r2}")

**Explanation:**

* **Import necessary libraries:** `pandas` for data manipulation, `sklearn.linear_model` for the Linear Regression model, `train_test_split` to split the data, and `mean_squared_error` and `r2_score` for model evaluation.
* **Load the dataset:** Use `load_diabetes()` to get a sample dataset, or replace it with your own data.
* **Prepare the data:** Separate features (X) and the target variable (y).
* **Split data:** Divide the data into training and testing sets using `train_test_split`.
* **Create and train the model:** Initialize a `LinearRegression` model and train it using the training data.
* **Make predictions:** Use the trained model to predict the target variable on the test set.
* **Evaluate the model:** Calculate Mean Squared Error and R-squared to assess the model's performance.

### Logistic Regression

In [None]:
# import pandas as pd
# from sklearn.linear_model import LogisticRegression
# from sklearn.model_selection import train_test_split
# from sklearn.metrics import accuracy_score, confusion_matrix

# # Load the breast cancer dataset (or any dataset with numeric features and a binary target variable)
# from sklearn.datasets import load_breast_cancer
# breast_cancer = load_breast_cancer()
# data = pd.DataFrame(breast_cancer.data, columns=breast_cancer.feature_names)
# data['target'] = breast_cancer.target

# # Split data into features (X) and target (y)
# X = data.drop('target', axis=1)
# y = data['target']

# # Split data into training and testing sets
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# # Create a logistic regression model
# model = LogisticRegression()

# # Train the model
# model.fit(X_train, y_train)

# # Make predictions on the test set
# y_pred = model.predict(X_test)

# # Evaluate the model
# accuracy = accuracy_score(y_test, y_pred)
# conf_matrix = confusion_matrix(y_test, y_pred)

# print(f"Accuracy: {accuracy}")
# print(f"Confusion Matrix:\n {conf_matrix}")

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

The `ConvergenceWarning` warning indicates that the `lbfgs` solver, which is the default for `LogisticRegression` in scikit-learn, didn't find the optimal solution within the maximum allowed number of iterations. It's not a syntax error, but rather a sign that the model-fitting process didn't complete as expected.

There are two common ways to address this, as the warning itself suggests:

1.  **Increase the number of iterations:** You can explicitly set a higher `max_iter` value when creating your `LogisticRegression` model. The default is 100. Increasing this gives the solver more attempts to converge.
2.  **Scale your data:** Logistic regression and its solvers work better when the features are on a similar scale. The breast cancer dataset has features with a wide range of values, which can make it difficult for the solver to converge. Scaling the data (e.g., using `StandardScaler`) can significantly improve convergence.

**Solution 1: Increase `max_iter`**

You can pass the `max_iter` parameter when you instantiate the `LogisticRegression` model. A value like `1000` is a good starting point.

```python
# Create a logistic regression model with more iterations
model = LogisticRegression(max_iter=1000)

# Train the model
model.fit(X_train, y_train)
```

By increasing `max_iter`, you're allowing the `lbfgs` solver to run for more steps in its optimization process, which can lead to convergence and remove the warning.

**Solution 2: Scale the Data**

This is generally the more robust solution for this type of problem. Scaling ensures that all features contribute equally to the distance calculation in the optimization process. We'll use `StandardScaler` from scikit-learn to transform the data.

#### Code Along

In [None]:
# standard scalar code along

**Explanation:**

* **Import necessary libraries:** Similar to linear regression, but import `LogisticRegression` for the model and `accuracy_score` and `confusion_matrix` for evaluation.
* **Load the dataset:** Use `load_breast_cancer()` or your own data with a binary target variable.
* **Prepare and split the data:** Similar to linear regression.
* **Create and train the model:** Initialize a `LogisticRegression` model and train it on the training data.
* **Make predictions:** Predict the target variable (classes) on the test set.
* **Evaluate the model:** Calculate accuracy and generate a confusion matrix to assess the model's performance in classification.

**Key Points for Teaching:**

* **Explain the concepts:** Briefly explain the theory behind linear and logistic regression.
* **Dataset selection:** Choose datasets that are easy to understand and relevant to your students.
* **Feature importance:** Discuss the importance of feature selection and engineering.
* **Model evaluation:** Explain different evaluation metrics and their significance.
* **Hands-on practice:** Encourage students to experiment with different datasets and model parameters.

Remember to tailor the code and explanations to your students' level of understanding and the specific goals of your class. Good luck!

<div class="md-recitation">
  Sources
  <ol>
  <li><a href="https://blog.gopenai.com/creating-your-own-large-language-model-step-by-step-guide-4cada28c13ad">https://blog.gopenai.com/creating-your-own-large-language-model-step-by-step-guide-4cada28c13ad</a></li>
  <li><a href="https://medium.com/@nandiniverma78988/ridge-regression-also-known-as-l2-regularization-is-a-linear-regression-technique-used-in-0b3935dddfd0">https://medium.com/@nandiniverma78988/ridge-regression-also-known-as-l2-regularization-is-a-linear-regression-technique-used-in-0b3935dddfd0</a></li>
  <li><a href="https://blog.csdn.net/u010916338/article/details/105990192">https://blog.csdn.net/u010916338/article/details/105990192</a></li>
  <li><a href="https://github.com/koppolisubramanyam/gdk">https://github.com/koppolisubramanyam/gdk</a></li>
  <li><a href="https://www.analyticsvidhya.com/blog/2023/07/using-data-science-to-identify-top-twitter-influencers/">https://www.analyticsvidhya.com/blog/2023/07/using-data-science-to-identify-top-twitter-influencers/</a></li>
  <li><a href="https://medium.com/@conniezhou678/decoding-my-musical-journey-insights-from-spotify-track-data-visualized-with-matplotlib-and-41263c819bb0">https://medium.com/@conniezhou678/decoding-my-musical-journey-insights-from-spotify-track-data-visualized-with-matplotlib-and-41263c819bb0</a></li>
  <li><a href="https://medium.com/@johnmccool_83148/predict-customer-nps-with-machine-learning-8aab1a2aeee1">https://medium.com/@johnmccool_83148/predict-customer-nps-with-machine-learning-8aab1a2aeee1</a></li>
  <li><a href="https://github.com/FutureInsightTech/FutureIsnight-Site">https://github.com/FutureInsightTech/FutureIsnight-Site</a> subject to MIT</li>
  <li><a href="https://github.com/drcfsorg/DRCFS_Chitwan_ML_Bootcamp">https://github.com/drcfsorg/DRCFS_Chitwan_ML_Bootcamp</a></li>
  <li><a href="https://github.com/Gonnabattula-Sravani/Bharat-intern">https://github.com/Gonnabattula-Sravani/Bharat-intern</a></li>
  <li><a href="https://www.sarthaks.com/3530205/artificial-intelligence">https://www.sarthaks.com/3530205/artificial-intelligence</a></li>
  <li><a href="https://buffml.com/titanic-dataset-classification-using-python/">https://buffml.com/titanic-dataset-classification-using-python/</a></li>
  <li><a href="https://medium.com/@shashikumarsiva12/logistic-regression-algorithm-an-introduction-to-binary-classification-4bbf8fc655c5?responsesOpen=true&sortBy=REVERSE_CHRON">https://medium.com/@shashikumarsiva12/logistic-regression-algorithm-an-introduction-to-binary-classification-4bbf8fc655c5?responsesOpen=true&sortBy=REVERSE_CHRON</a></li>
  </ol>
</div>

# Statsmodels and SciPy

**Statsmodels**

Statsmodels is a powerful library for statistical modeling, providing a wide range of statistical tests, models, and diagnostic tools. Here are a couple of examples:

**1. Ordinary Least Squares (OLS) Regression**

In [None]:
# import statsmodels.api as sm
# import pandas as pd
# from sklearn.datasets import load_diabetes

# # Load the diabetes dataset
# diabetes = load_diabetes()
# data = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
# data['target'] = diabetes.target  # Target variable is a quantitative measure of disease progression one year after baseline

# # Define the dependent and independent variables
# X = data[['bmi', 's5']]  # Body mass index and a blood serum measurement
# y = data['target']

# # Add a constant term to the independent variables
# X = sm.add_constant(X)

# # Create and fit the OLS model
# model = sm.OLS(y, X)
# results = model.fit()

# # Print the regression results
# print(results.summary())

This code demonstrates how to perform OLS regression using `statsmodels`. It loads the Boston housing dataset, selects two predictor variables (`RM` and `LSTAT`), adds a constant term, and fits the model. The `results.summary()` function provides a comprehensive output with statistical details, including coefficients, R-squared, p-values, and more.

**2. Analysis of Variance (ANOVA)**

In [None]:
# import statsmodels.formula.api as smf
# import pandas as pd

# # Create a sample dataset
# data = {'group': ['A', 'A', 'B', 'B', 'C', 'C'],
#         'value': [10, 12, 15, 18, 20, 22]}
# df = pd.DataFrame(data)

# # Fit the ANOVA model
# model = smf.ols('value ~ group', data=df)
# results = model.fit()

# # Perform ANOVA table
# anova_table = sm.stats.anova_lm(results, typ=2)

# # Print the ANOVA table
# print(anova_table)

This example shows how to conduct ANOVA using `statsmodels`. It creates a sample dataset with groups and their corresponding values. The `ols` function from `statsmodels.formula.api` is used to specify the model using R-style formulas. The `anova_lm` function then performs the ANOVA analysis and generates a table with F-statistics, p-values, and other relevant information.

**Scipy**

Scipy is a library for scientific computing that builds on NumPy and provides a wide range of algorithms and functions for various scientific tasks. Here's an example:

**1. T-test**

In [None]:
# from scipy import stats
# import numpy as np

# # Generate two sample datasets
# group1 = np.random.normal(loc=10, scale=2, size=20)
# group2 = np.random.normal(loc=12, scale=2, size=20)

# # Perform independent samples t-test
# t_stat, p_value = stats.ttest_ind(group1, group2)

# print(f"T-statistic: {t_stat}")
# print(f"P-value: {p_value}")

This code demonstrates how to perform an independent samples t-test using `scipy.stats`. It generates two sample datasets and uses the `ttest_ind` function to calculate the t-statistic and p-value. This test is used to determine if there is a significant difference between the means of two independent groups.

**Key Points for Teaching:**

* **Explain the purpose:** Clearly explain the purpose and applications of each library.
* **Focus on practical examples:** Use simple, relatable examples to demonstrate the functionality.
* **Connect to statistical concepts:** Emphasize the connection between the code and the underlying statistical concepts.
* **Encourage exploration:** Encourage students to explore the documentation and experiment with different functions and datasets.

By introducing `statsmodels` and `scipy` with these examples, you can provide your students with a solid foundation in statistical analysis and scientific computing in Python. Remember to adapt the complexity and examples to your students' level and the specific goals of your class.

<div class="md-recitation">
  Sources
  <ol>
  <li><a href="https://medium.com/@gururajab/linear-regression-a-comprehensive-guide-8d4ac0714ec1">https://medium.com/@gururajab/linear-regression-a-comprehensive-guide-8d4ac0714ec1</a></li>
  <li><a href="https://www.ml-zhuang.club/0521/787/">https://www.ml-zhuang.club/0521/787/</a></li>
  </ol>
</div>