<a href="https://colab.research.google.com/github/ro-witthawin/Basic-Python-for-DS-AI/blob/main/Basic_Python_for_Data_Science_%26_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🐍 Basic Python for Data Science & AI
Welcome to this hands-on Colab session!

In this notebook, we’ll learn the essential steps in a typical data science workflow using **Python**, **pandas**, **matplotlib**, **seaborn**, and **scikit-learn**.

We'll work on a real dataset: **Daily Fish Log** – which contains fishing records including date, species, and catch weight.

---

### 🧭 What You Will Learn
- How to load and explore data with `pandas`
- Data cleaning and preparation
- Visualizing trends and patterns
- Feature engineering techniques
- Basic statistical analysis
- Building simple machine learning models
- Exporting results

---

> 🔧 No prior machine learning experience needed — just some Python basics!

# 🧠 Logic & Boolean

In this notebook, you will learn the foundations of logic and boolean expressions in Python.

📌 Topics:
- Boolean values
- Comparison operators
- Logical operators
- If statements
- Practice exercises

In [None]:
# Boolean Values
print(True)
print(False)

# Type of boolean
print(type(True))
print(type(False))

## 🔍 Comparison Operators

| Operator | Description         | Example        |
|----------|---------------------|----------------|
| ==       | Equal to            | 3 == 3 → True  |
| !=       | Not equal to        | 3 != 4 → True  |
| >        | Greater than        | 5 > 2 → True   |
| <        | Less than           | 1 < 3 → True   |
| >=       | Greater or equal    | 5 >= 5 → True  |
| <=       | Less or equal       | 3 <= 4 → True  |

In [None]:
# Comparison Examples
print(5 == 5)
print(7 != 2)
print(3 < 10)
print(6 >= 9)

## 🔗 Logical Operators

| Operator | Description                | Example            |
|----------|----------------------------|--------------------|
| and      | True if both are true      | True and False     |
| or       | True if one is true        | False or True      |
| not      | Inverts the result         | not True → False   |

In [None]:
# Logical Operator Examples
print(True and True)
print(True and False)
print(False or True)
print(not False)

## 🔄 Conditional Statements (if, elif, else)

In [None]:
x = 15

if x > 10:
    print("x is greater than 10")
elif x == 10:
    print("x is exactly 10")
else:
    print("x is less than 10")

## 📝 Practice Exercise 1

Write a program that checks whether a number is positive, negative, or zero.

In [None]:
number = float(input("Enter a number: "))

# TODO: Write your logic here
if number > 0:
    print("Positive number")
elif number < 0:
    print("Negative number")
else:
    print("Zero")

## 🧪 Practice Exercise 2

Check if a student has passed. A student passes if their score is 50 or above.

In [None]:
score = int(input("Enter your score: "))

# TODO: Write your logic here
if score >= 50:
    print("Passed")
else:
    print("Failed")


✅ Great job! You’ve now learned the basics of logic and boolean operations in Python.

Keep practicing to master control flow and decision making.

# 🔢 Integers, Floats & Math Operations

In this lesson, you’ll learn how Python handles numbers — both integers and floating-point numbers — and how to perform basic arithmetic operations.

📌 Topics:
- Integer and Float types
- Basic arithmetic operations
- Type conversion
- Math library (import math)
- Practice problems

In [None]:
# Integer and Float Examples
print(10)           # Integer
print(3.14)         # Float
print(type(10))     # <class 'int'>
print(type(3.14))   # <class 'float'>

## ➕ Arithmetic Operators

| Operator | Description      | Example         |
|----------|------------------|-----------------|
| +        | Addition         | 2 + 3 → 5       |
| -        | Subtraction      | 5 - 2 → 3       |
| *        | Multiplication   | 4 * 2 → 8       |
| /        | Division         | 10 / 2 → 5.0    |
| //       | Floor Division   | 10 // 3 → 3     |
| %        | Modulus          | 10 % 3 → 1      |
| **       | Exponentiation   | 2 ** 3 → 8      |

In [None]:
# Arithmetic Examples
a = 10
b = 3

print("Addition:", a + b)
print("Subtraction:", a - b)
print("Multiplication:", a * b)
print("Division:", a / b)
print("Floor Division:", a // b)
print("Modulus:", a % b)
print("Exponentiation:", a ** b)

## 🔁 Type Conversion

Sometimes, you need to convert between integers and floats:
- `int()` converts to integer (truncates decimal)
- `float()` converts to float

In [None]:
x = 5.67
y = 10

print(int(x))    # Convert float to int
print(float(y))  # Convert int to float

## 🧮 The `math` Module

Python’s `math` module provides advanced mathematical functions like square root, power, pi, etc.

In [None]:
import math

print("Square root of 16:", math.sqrt(16))
print("Power (2^5):", math.pow(2, 5))
print("Pi value:", math.pi)
print("Sine of 90 degrees:", math.sin(math.radians(90)))

## 📝 Practice Exercise 1

Write a program that takes 2 numbers and prints the result of:
- Addition
- Subtraction
- Multiplication
- Division

In [None]:
x = float(input("Enter first number: "))
y = float(input("Enter second number: "))

print("Addition:", x + y)
print("Subtraction:", x - y)
print("Multiplication:", x * y)
print("Division:", x / y)

## 🧪 Practice Exercise 2

Calculate the area of a circle given the radius by the user. Use the formula:

$${Area} = \pi \times r^2$$

In [None]:
radius = float(input("Enter radius of the circle: "))
area = math.pi * radius ** 2
print("Area of the circle:", area)

✅ That’s it! You now understand numbers and math operations in Python.

Keep practicing to build strong numerical thinking in Python!

# 🔁 For Loops

In this notebook, you'll learn how to use `for` loops in Python to repeat actions, iterate over sequences, and write efficient code.

📌 Topics:
- Syntax of `for` loops
- Using `range()`
- Iterating over strings and lists
- Nested loops
- Practice exercises


In [None]:
# Basic for loop using range
for i in range(5):
    print("Iteration:", i)

## 🔢 `range()` Function

The `range()` function is commonly used with `for` loops.

- `range(n)` → 0 to n-1  
- `range(start, stop)` → from start to stop-1  
- `range(start, stop, step)` → with custom increment

In [None]:
# Various usages of range
print("range(5):")
for i in range(5):
    print(i)

print("\nrange(2, 7):")
for i in range(2, 7):
    print(i)

print("\nrange(10, 2, -2):")
for i in range(10, 2, -2):
    print(i)

## 📜 Iterating Over Strings and Lists

In [None]:
# Looping through a string
for letter in "Hello":
    print(letter)

# Looping through a list
fruits = ["apple", "banana", "mango"]
for fruit in fruits:
    print(fruit)

## 🔁 Nested Loops

You can use loops inside loops.

In [None]:
# Multiplication table
for i in range(1, 4):
    for j in range(1, 4):
        print(f"{i} x {j} = {i * j}")
    print("---")


## 📝 Practice Exercise 1

Print all even numbers from 1 to 20.


In [None]:
for i in range(1, 21):
    if i % 2 == 0:
        print(i)

## 🧪 Practice Exercise 2

Write a program that prints each item in a list of numbers and its square.

In [None]:
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(f"{num} squared is {num ** 2}")


✅ Great! You've now learned how to use `for` loops effectively in Python.

Keep practicing to build fluency with loops and iteration!

# 🧮 Functions

In this notebook, you'll learn how to define and use **functions** in Python to write reusable, modular code.

📌 Topics:
- What is a function?
- Defining a function using `def`
- Parameters and arguments
- Return values
- Default arguments
- Practice exercises

## 🧱 What is a Function?

A **function** is a block of code that runs only when it is called.  
Functions help organize code and avoid repetition.

In [None]:
# Basic function definition and call
def greet():
    print("Hello, welcome to Python 101!")

greet()

## 🔢 Parameters and Arguments

You can pass data into functions using **parameters**.

In [None]:
def greet(name):
    print("Hello,", name)

greet("Alice")
greet("Bob")

## 🔁 Return Values

Use the `return` keyword to get a result from a function.

In [None]:
def add(a, b):
    return a + b

result = add(10, 20)
print("Sum:", result)

## ⚙️ Default Arguments

You can provide default values to parameters.

In [None]:
def greet(name="Guest"):
    print("Hello,", name)

greet("Charlie")
greet()

## 📦 Multiple Return Values

Functions can return multiple values using tuples.

In [None]:
def stats(numbers):
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    return total, average

my_nums = [10, 20, 30]
t, avg = stats(my_nums)
print("Total:", t)
print("Average:", avg)


## 📝 Practice Exercise 1

Write a function `is_even(number)` that returns True if the number is even.

In [None]:
def is_even(number):
    return number % 2 == 0

print(is_even(4))   # True
print(is_even(7))   # False


## 🧪 Practice Exercise 2

Write a function `circle_area(radius)` that returns the area of a circle using the formula:

$${Area} = \pi r^2$$

In [None]:
import math

def circle_area(radius):
    return math.pi * radius ** 2

print("Area:", circle_area(5))


✅ Great work! You’ve learned how to define, call, and use functions in Python.

Functions are essential for clean, reusable, and scalable code.

# Introduction to Pandas
In this notebook, we will learn how to use Pandas for data analysis.  
We'll use the dataset `daily_fish_log_csv.csv` and explore common data analysis tasks.


In [None]:
import pandas as pd
import numpy as np

## 1. Load Data
We will load the fish log data from the CSV file using `pd.read_csv()`.

In [None]:
df = pd.read_csv("daily_fish_log_csv.csv")
df.head()

## 📁 Optional: Access Data from Google Drive or Upload File

You can either:
- 🔓 Mount your Google Drive and access a file from a folder, or
- 📤 Upload the dataset directly from your computer

### 📌 Option 1: Mount Google Drive and Load from a Folder

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Example path: adjust to match your folder
file_path = '/content/drive/MyDrive/fish_data/daily_fish_log_csv.csv'

import pandas as pd
df = pd.read_csv(file_path)
df.head()

### 📌 Option 2: Upload CSV File Manually

In [None]:
from google.colab import files
uploaded = files.upload()

# Automatically loads the first uploaded file
import io
df = pd.read_csv(io.BytesIO(next(iter(uploaded.values()))))
df.head()

## 2. Explore Dataset
Let's check the basic information and summary statistics of our dataset.

In [None]:
# Check the structure of the data
df.info()

# Summary statistics
df.describe()

# Unique species
df['Species'].unique()

## 3. Data Cleaning
We will check for missing values and duplicates.

In [None]:
# Check for missing values
df.isnull().sum()

# Check for duplicates
df.duplicated().sum()

## 4. Data Analysis
Let's analyze:
- Total fish weight by species
- Average price per species
- Daily total revenue

In [None]:
# Total weight per species
total_weight = df.groupby('Species')['Weight_kg'].sum()

# Average price per species
avg_price = df.groupby('Species')['Price_per_kg'].mean()

# Daily total revenue
df['Revenue'] = df['Weight_kg'] * df['Price_per_kg']
daily_revenue = df.groupby('Date')['Revenue'].sum()

total_weight, avg_price, daily_revenue

## 5. Data Visualization
We will plot:
- Total weight by species
- Daily revenue trends

In [None]:
import matplotlib.pyplot as plt

# Total weight per species
total_weight.plot(kind='bar', title='Total Weight by Species', ylabel='Weight (kg)')
plt.show()

# Daily revenue trend
daily_revenue.plot(kind='line', marker='o', title='Daily Revenue Trend', ylabel='Revenue')
plt.show()

## ✅ Feature Engineering
📅 Extracting Date Features
Dates can hold rich temporal information for modeling. We’ll extract features such as:

- Day of week (Monday–Sunday)

- Month

- Quarter

- Season (manually mapped)

- Year

In [None]:
df['date'] = pd.to_datetime(df['date'], errors='coerce')

# Extract date-based features
df['day_of_week'] = df['date'].dt.day_name()
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
df['quarter'] = df['date'].dt.quarter

# Map seasons (assuming northern hemisphere)
def map_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Autumn'

df['season'] = df['month'].apply(map_season)

df[['date', 'day_of_week', 'month', 'quarter', 'season']].head()

## 🧩 Encoding Categorical Variables
Machine learning models require numeric input. We convert text-based categorical data into numeric form using encoding techniques:

- Label Encoding: Useful for ordinal variables.

- One-Hot Encoding: Creates binary columns for each category (no implicit order).

We’ll encode:

- `species` (fish type)

- `season`

- Any other object columns (e.g. location, method) if available

In [None]:
from sklearn.preprocessing import LabelEncoder

# Label encode species and season
le_species = LabelEncoder()
df['species_encoded'] = le_species.fit_transform(df['species'])

le_season = LabelEncoder()
df['season_encoded'] = le_season.fit_transform(df['season'])

# One-hot encode day_of_week
df = pd.get_dummies(df, columns=['day_of_week'], prefix='dow')

df[['species', 'species_encoded', 'season', 'season_encoded'] + [col for col in df.columns if col.startswith('dow_')]].head()

🧠 Why Feature Engineering Matters
Feature engineering can significantly improve model performance by:

- Providing additional signal to the model

- Making temporal patterns more explicit

- Helping models generalize to unseen patterns

# 📊 Basic Statistical Analysis
📌 Overview
Statistical analysis provides insights into the distribution and relationships within the data. In this section, we will:

Calculate mean, median, and mode of catch-related values.

Use Interquartile Range (IQR) to detect outliers.

Explore correlation between numeric features.

Visualize relationships with pairplots.



## 📈 Mean, Median, Mode of Catch Weight
We'll examine the central tendency of the fish catch weight to understand typical values.


In [None]:
# Replace 'weight' with the correct column name if needed
print("Mean Weight:", df['weight'].mean())
print("Median Weight:", df['weight'].median())
print("Mode Weight:", df['weight'].mode().values)

## 🚨 Detecting Outliers with IQR
Outliers are extreme values that may distort analysis. The IQR method identifies them by comparing each value to the interquartile range (Q3 - Q1).


In [None]:
# Calculate Q1 (25th percentile) and Q3 (75th percentile)
Q1 = df['weight'].quantile(0.25)
Q3 = df['weight'].quantile(0.75)
IQR = Q3 - Q1

# Define outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Filter outliers
outliers = df[(df['weight'] < lower_bound) | (df['weight'] > upper_bound)]

print("Number of Outliers:", len(outliers))
outliers[['date', 'species', 'weight']]

##🔗 Correlation Matrix
A correlation matrix shows how strongly numeric variables relate to each other.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Compute correlation
corr = df.corr(numeric_only=True)

# Heatmap
plt.figure(figsize=(8, 5))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Matrix")
plt.show()

## 🧮 Pairplot for Visual Relationship
Pairplots allow us to view pairwise relationships between features and can reveal clusters or trends.

In [None]:
# Select numeric columns or subset
numeric_cols = ['weight', 'month', 'season_encoded', 'species_encoded']

# Plot pairplot
sns.pairplot(df[numeric_cols])

# 🤖 Intro to Machine Learning (Scikit-learn)
🎯 Goal
Use basic ML models to predict fish catch weight from engineered features like species, date, and season.



## 🧪 Step 1: Prepare Features and Target
We will use:

- Encoded features: `species_encoded`, `month`, `season_encoded`

- Target: `weight`

In [None]:
from sklearn.model_selection import train_test_split

# Define features and target
features = ['species_encoded', 'month', 'season_encoded']
target = 'weight'

X = df[features]
y = df[target]

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 📈 Step 2: Train Regression Models
We’ll train and compare 3 basic models:

- `LinearRegression`

- `DecisionTreeRegressor`

- `RandomForestRegressor`

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

# Initialize models
lr = LinearRegression()
dt = DecisionTreeRegressor(random_state=42)
rf = RandomForestRegressor(random_state=42)

# Train each model
lr.fit(X_train, y_train)
dt.fit(X_train, y_train)
rf.fit(X_train, y_train)

## 📊 Step 3: Evaluate the Models
We’ll use:

- Mean Absolute Error (MAE)

- Mean Squared Error (MSE)

- R² Score

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

models = {'Linear Regression': lr, 'Decision Tree': dt, 'Random Forest': rf}

for name, model in models.items():
    y_pred = model.predict(X_test)
    print(f"--- {name} ---")
    print("MAE:", mean_absolute_error(y_test, y_pred))
    print("MSE:", mean_squared_error(y_test, y_pred))
    print("R² Score:", r2_score(y_test, y_pred))
    print()

##🧪 (Optional) Step 4: Classification (e.g., Predict Fish Species)
If your dataset includes labels (like species), you can switch to classification:

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# (Optional) Classification: Predict species from season and month
X_cls = df[['month', 'season_encoded']]
y_cls = df['species_encoded']

X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_cls, y_cls, test_size=0.2, random_state=42)

clf = RandomForestClassifier(random_state=42)
clf.fit(X_train_c, y_train_c)
y_pred_c = clf.predict(X_test_c)

print("Classification Accuracy:", accuracy_score(y_test_c, y_pred_c))
print(classification_report(y_test_c, y_pred_c))

# 📤 Exporting Results

##📝 Save Cleaned Data to CSV
After data cleaning and feature engineering, we can save the processed dataset for future use.

In [None]:
# Save cleaned dataset to CSV
df.to_csv("cleaned_fish_data.csv", index=False)
print("Cleaned data saved as cleaned_fish_data.csv")

## 🖼 Export Plots
Matplotlib figures can be saved directly to files (PNG, JPG, etc.) using `plt.savefig()`.

In [None]:
# Example: Save a correlation heatmap as PNG
plt.figure(figsize=(8, 5))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Matrix")

plt.savefig("correlation_heatmap.png", dpi=300, bbox_inches='tight')
print("Correlation heatmap saved as correlation_heatmap.png")
plt.close()

##📊 Export Model Predictions
We can save predictions from trained models (e.g., RandomForestRegressor) along with the test set.

In [None]:
# Generate predictions
y_pred_rf = rf.predict(X_test)

# Create a DataFrame with actual and predicted values
results = X_test.copy()
results['Actual_Weight'] = y_test.values
results['Predicted_Weight'] = y_pred_rf

# Save results to CSV
results.to_csv("model_predictions.csv", index=False)
print("Model predictions saved as model_predictions.csv")

results.head()

# 🎉 Conclusion

Congratulations! You’ve completed the **Basic Python for Data Science & AI** notebook. Here's what you've accomplished:

✅ Loaded and explored a real-world dataset  
✅ Cleaned, transformed, and visualized the data  
✅ Engineered new features from dates and categories  
✅ Performed basic statistical analysis and identified outliers  
✅ Built simple ML models to predict fish catch weight  
✅ Exported results and plots for reporting

---

### 🧠 Key Takeaways
- Data understanding is critical before modeling
- Feature engineering enhances model performance
- Start simple, then iterate with more complexity
- Visualization reveals hidden patterns
- Exporting your work helps you share insights

---

### 🚀 Next Steps
- Try adding new features (e.g., time of day, location zones)
- Experiment with more ML models or tuning hyperparameters
- Share your insights using dashboards or reports

Happy Learning! 🎓🐟