# Python for Data Science & Machine Learning - A Beginner's Guide
This notebook provides a foundational introduction to Python and its key libraries for data science and machine learning. We'll cover basic Python syntax, data structures, and then dive into NumPy for numerical computation, SciPy for scientific computing, and scikit-learn for machine learning algorithms.

## Section 1: Python Basics

**Variables and Data Types**

In [None]:
x = 5  # Integer
y = 3.14  # Float
name = "Alice"  # String
is_valid = True  # Boolean

print(x)
print(y)
print(name)
print(is_valid)

**Check data types**

In [None]:
print(type(x))
print(type(y))
print(type(name))
print(type(is_valid))

**Operators**

In [None]:
a = 10
b = 3

print(a + b)  # Addition
print(a - b)  # Subtraction
print(a * b)  # Multiplication
print(a / b)  # Division
print(a // b) # Floor Division
print(a % b)  # Modulus (remainder)
print(a ** b) # Exponentiation

**Control Flow (if, elif, else)**

In [None]:
age = 20
if age >= 18:
    print("You are an adult.")
else:
    print("You are a minor.")

# 1.4 Loops (for, while)
# For loop
for i in range(5):  # range(5) generates numbers 0, 1, 2, 3, 4
    print(i)

# While loop
count = 0
while count < 3:
    print(count)
    count += 1


**Data Structures**

*List*

In [None]:
my_list = [1, 2, 3, "apple", "banana"]
print(my_list[0])  # Accessing elements (indexing starts at 0)
my_list.append("orange")  # Adding an element
print(my_list)

*Tuples (immutable)*

In [None]:
my_tuple = (1, 2, 3)
print(my_tuple[1])

*Dictionaries (key-value pairs)*

In [None]:
my_dict = {"name": "Bob", "age": 30}
print(my_dict["name"])
my_dict["city"] = "New York"
print(my_dict)

*Sets (unordered collection of unique elements)*

In [None]:
my_set = {1, 2, 2, 3}  # Duplicate 2 is automatically removed
print(my_set)

## Section 2: NumPy - Numerical Python

**Importing NumPy**

In [None]:
import numpy as np

**Creating NumPy Arrays**

In [None]:
my_array = np.array([1, 2, 3, 4, 5])
print(my_array)
print(type(my_array))

**Multi-dimensional arrays**

In [None]:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix)

**Creating arrays with specific values**

In [None]:
zeros_array = np.zeros((2, 3))  # 2x3 array filled with zeros
ones_array = np.ones((3, 2))   # 3x2 array filled with ones
range_array = np.arange(0, 10, 2) # Array from 0 to 10 (exclusive) with step 2
print(zeros_array)
print(ones_array)
print(range_array)

**Array Operations**

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)  # Element-wise addition
print(a * b)  # Element-wise multiplication
print(a.dot(b)) # Dot product

**Array Indexing and Slicing**

In [None]:
print(my_array[0])  # Accessing the first element
print(my_array[1:4])  # Slicing (elements from index 1 to 3)

**Array Reshaping**

In [None]:
reshaped_array = my_array.reshape((5, 1)) # Reshape to a 5x1 column vector
print(reshaped_array)

## Section 3: SciPy - Scientific Computing

**Importing SciPy**

In [None]:
import scipy as sp
from scipy import optimize

**Optimization**

In [None]:
# Example: Finding the minimum of a function
def f(x):
    return x**2 + 5*np.sin(x)

result = optimize.minimize(f, x0=0) # x0 is the initial guess
print(result)


**Statistics**

In [None]:
from scipy import stats

data = np.random.normal(loc=0, scale=1, size=1000) # Generate 1000 random numbers from a normal distribution
mean = stats.tmean(data)
std_dev = stats.tstd(data)
print(f"Mean: {mean}")
print(f"Standard Deviation: {std_dev}")

**Integration**

In [None]:
from scipy.integrate import quad

# Integrate x^2 from 0 to 1
result, error = quad(lambda x: x**2, 0, 1)
print(f"Integral: {result}")
print(f"Error: {error}")

## Section 4: scikit-learn - Machine Learning

<span style="color:red">Note:</span> We will learn linear regression in more details in the class. It is okay if you do not understand the following code completely.

**Importing scikit-learn**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

**Data Preparation (Example)**

In [None]:
# Generate some sample data
X = np.array([[1], [2], [3], [4], [5]])  # Feature
y = np.array([2, 4, 5, 4, 5])  # Target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


**Model Training**

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

**Making Predictions**

In [None]:
y_pred = model.predict(X_test)
print(f"Predictions: {y_pred}")

**Evaluating the Model**

In [None]:
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

## Section 5: Data Manipulation with Pandas

**Import pandas**

In [None]:
import pandas as pd

**Creating a DataFrame**

In [None]:
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df)

**Reading Data from a CSV file**

In [None]:
df = pd.read_csv('../datasets/iris.csv', header=None)

**Data Exploration**

In [None]:
print(df.head())  # First 5 rows
print(df.tail())  # Last 5 rows
print(df.info())  # Data types and missing values
print(df.describe()) # Summary statistics

**Data Selection**

In [None]:
print(df.loc[0])  # Select the first row

Add column names

In [None]:
df.columns = [
    'sepal_length',   # cm
    'sepal_width',    # cm
    'petal_length',   # cm
    'petal_width',    # cm
    'setosa',         # 1 if I. setosa, else 0
    'versicolor',     # 1 if I. versicolor, else 0
    'virginica'       # 1 if I. virginica, else 0
]
print(df.head())

**Data Cleaning (Handling Missing Values)**

In [None]:
# df.dropna() # Remove rows with missing values
# df.fillna(0) # Fill missing values with 0