### **Introduction to `schedule` in Python**

The `schedule` library in Python is a lightweight, simple-to-use tool for scheduling tasks to run at specified intervals. It is ideal for running background jobs such as periodic web scraping, sending reminders, automating tasks, or any repetitive process that needs to run at regular intervals.

It allows you to run functions periodically at intervals like every minute, hour, day, or even more specific schedules like every second or specific times of the day.

---

### **Basic Concepts in `schedule`**

- **Job**: A function or task that you want to run on a scheduled basis.
- **Interval**: The frequency at which the task should run (e.g., every 5 seconds, once a day, etc.).
- **Scheduler**: The object that manages and executes scheduled jobs.

---

### **Installing `schedule`**

To use the `schedule` library, you need to install it first. You can do so using pip:

```bash
pip install schedule
```

---

### **How It Works**

The `schedule` library is very simple to use. You define a function (job) and then use `schedule.every()` to specify when you want that job to run. Once set up, you run `schedule.run_pending()` in a loop to continuously check for pending jobs and execute them at the scheduled time.

Here’s a basic structure:

```python
import schedule
import time

def job():
    print("This is a scheduled job.")

# Schedule job to run every minute
schedule.every(1).minute.do(job)

# Loop to keep the script running and check for scheduled jobs
while True:
    schedule.run_pending()  # Execute the jobs that are due
    time.sleep(1)  # Wait for 1 second before checking again
```

### **Common Scheduling Intervals**
- `seconds`, `minutes`, `hours`: Schedule jobs at intervals like every 5 seconds or every 10 minutes.
- `at(time_string)`: Schedule jobs to run at a specific time of the day (e.g., 9:00 AM).
- `day.at(time_string)`: Schedule jobs to run at the same time every day.

### **Key Features of `schedule`**
1. **Interval-based Scheduling**:
   ```python
   schedule.every(10).seconds.do(job)  # Every 10 seconds
   schedule.every().hour.do(job)  # Every hour
   schedule.every().day.at("10:30").do(job)  # Every day at 10:30 AM
   ```
   
2. **Running Multiple Jobs**:
   You can schedule multiple tasks at different intervals, like this:
   ```python
   schedule.every(1).minute.do(job1)
   schedule.every(2).hours.do(job2)
   ```

3. **Job Cancellation**:
   You can cancel scheduled jobs by calling `job.remove()`.

4. **Job at Specific Times**:
   You can schedule jobs to run at specific times of the day:
   ```python
   schedule.every().day.at("14:00").do(job)  # Every day at 2:00 PM
   ```

---

### **Example: Simple Job Every Second**

```python
import schedule
import time

def job():
    print("This message prints every second.")

# Schedule job to run every second
schedule.every(1).seconds.do(job)

while True:
    schedule.run_pending()  # Run scheduled jobs
    time.sleep(1)  # Check for jobs every 1 second
```

---

### **Conclusion**

The `schedule` library is a great choice for scheduling simple, periodic tasks within Python applications. It provides an easy-to-use, Pythonic interface for defining recurring tasks without needing complex configurations or external tools. Whether you're automating web scraping, periodic checks, or other tasks, `schedule` can help you manage these operations effectively.

---

Please execute your script in a different file. Copy your scripts into the cells after finished for grading.

In [1]:
""" 
Objective: Run job every n second
"""

import schedule
import time

# TODO: Create a job function that print a message
# TODO: Create schedule for every 3 seconds

def job():
    print('job executed')

schedule.every(3).seconds.do(job)
while True:
    schedule.run_pending()
    time.sleep(1)

job executed
job executed
job executed
job executed
job executed
job executed


KeyboardInterrupt: 

In [7]:
""" 
Objective: Run job every minute at specific seconds
"""
# TODO: Create a job function that print a message
# TODO: Create schedule for every minute at the 23rd second
# TODO: Apply loop to execute pending schedule

import schedule
import time
import datetime

schedule.clear() # clear job from previous code

print(f"count of existing jobs {len(schedule.get_jobs())}\n\n")

def job():
    now = datetime.datetime.now()
    print(f'executed job at {now}' )

schedule.every().minutes.at(':23').do(job)

print(f"existing jobs : {len(schedule.get_jobs())}",)
print(f"Jobs : {schedule.get_jobs()}")

while True:
    schedule.run_pending()
    time.sleep(10)

count of existing jobs 0


existing jobs : 1
Jobs : [Every 1 minute at 00:00:23 do job() (last run: [never], next run: 2025-03-27 17:31:23)]
executed job at 2025-03-27 17:31:23.000014
executed job at 2025-03-27 17:32:23.000010
executed job at 2025-03-27 17:33:23.000012
executed job at 2025-03-27 17:34:23.000022
executed job at 2025-03-27 17:35:23.000020
executed job at 2025-03-27 17:36:23.000016
executed job at 2025-03-27 17:37:23.000011
executed job at 2025-03-27 17:38:23.000010
executed job at 2025-03-27 17:39:23.000011
executed job at 2025-03-27 17:40:23.000015
executed job at 2025-03-27 17:41:23.000012
executed job at 2025-03-27 17:42:23.000015
executed job at 2025-03-27 17:43:23.000016
executed job at 2025-03-27 17:44:23.000010
executed job at 2025-03-27 17:45:23.000011
executed job at 2025-03-27 17:46:23.000017
executed job at 2025-03-27 17:47:23.000013
executed job at 2025-03-27 17:48:23.000012
executed job at 2025-03-27 17:49:23.000017
executed job at 2025-03-27 17:50:23.000016

KeyboardInterrupt: 

In [11]:
""" 
Objective: Run job at specific time
"""
# TODO: Create a job function that print a message
# TODO: Create schedule for every day at 1 or 2 minutes in your current time
# TODO: Apply loop to execute pending schedule

import schedule
import time
import datetime

schedule.clear() # clear job from previous code

print(f"count of existing jobs {len(schedule.get_jobs())}\n\n")

def every_day_at_minute():
    print(f'called on {datetime.datetime.now()}')

schedule.every(1).to(2).minutes.do(every_day_at_minute)

print(f"existing jobs : {len(schedule.get_jobs())}",)
print(f"Jobs : {schedule.get_jobs()}")

while True:
    schedule.run_pending()
    time.sleep(10)


count of existing jobs 0


existing jobs : 1
Jobs : [Every 1 to 2 minute do every_day_at_minute() (last run: [never], next run: 2025-03-27 18:59:45)]
called on 2025-03-27 18:59:45.381367
called on 2025-03-27 19:00:45.381496
called on 2025-03-27 19:01:45.381626


KeyboardInterrupt: 

In [15]:
""" 
Objective: Canceling a job
"""
# TODO: Create a job function that print a message
# TODO: Create schedule for every seconds
# TODO: Cancel those job and see if your schedule executed or canceled

import schedule
import datetime
import time

schedule.clear() # clear job from previous code

print(f"count of existing jobs {len(schedule.get_jobs())}\n\n")

def every_seconds_job():
    print(f"job called on{datetime.datetime.now()}")

job = schedule.every(1).seconds.do(every_seconds_job)

print(f"existing jobs : {len(schedule.get_jobs())}",)
print(f"Jobs : {schedule.get_jobs()}")

n = 1
while True:
    schedule.run_pending()
    time.sleep(5)

    if n >= 5 :
        schedule.cancel_job(job)

    print(f"existing jobs : {len(schedule.get_jobs())}",)
    print(f"Jobs : {schedule.get_jobs()}")
    n+=1


count of existing jobs 0


existing jobs : 1
Jobs : [Every 1 second do every_seconds_job() (last run: [never], next run: 2025-03-27 19:06:11)]
existing jobs : 1
Jobs : [Every 1 second do every_seconds_job() (last run: [never], next run: 2025-03-27 19:06:11)]
job called on2025-03-27 19:06:15.202112
existing jobs : 1
Jobs : [Every 1 second do every_seconds_job() (last run: 2025-03-27 19:06:15, next run: 2025-03-27 19:06:16)]
job called on2025-03-27 19:06:20.202628
existing jobs : 1
Jobs : [Every 1 second do every_seconds_job() (last run: 2025-03-27 19:06:20, next run: 2025-03-27 19:06:21)]
job called on2025-03-27 19:06:25.202990
existing jobs : 1
Jobs : [Every 1 second do every_seconds_job() (last run: 2025-03-27 19:06:25, next run: 2025-03-27 19:06:26)]
job called on2025-03-27 19:06:30.203376
existing jobs : 0
Jobs : []


KeyboardInterrupt: 

In [16]:
""" 
Objective: Scheduling 2 job at once
"""
# TODO: Create 2 job function that print a message
# TODO: Create schedule to execute the first job once
# TODO: Create schedule to execute the second job every 3 seconds

import schedule
import datetime
import time

schedule.clear() # clear job from previous code

print(f"count of existing jobs {len(schedule.get_jobs())}\n\n")

def first_job():
    print(f"First Job: {datetime.datetime.now()}")
    return schedule.CancelJob


def second_job():
    print(f"Second Job : {datetime.datetime.now()}")

schedule.every(1).seconds.do(first_job)
schedule.every(3).seconds.do(second_job)


print(f"existing jobs : {len(schedule.get_jobs())}",)
print(f"Jobs : {schedule.get_jobs()}")

while True:
    schedule.run_pending()
    time.sleep(1)

count of existing jobs 0


existing jobs : 2
Jobs : [Every 1 second do first_job() (last run: [never], next run: 2025-03-27 19:12:11), Every 3 seconds do second_job() (last run: [never], next run: 2025-03-27 19:12:13)]
First Job: 2025-03-27 19:12:11.657744
Second Job : 2025-03-27 19:12:13.658180
Second Job : 2025-03-27 19:12:16.658803
Second Job : 2025-03-27 19:12:19.659386
Second Job : 2025-03-27 19:12:22.659894
Second Job : 2025-03-27 19:12:25.660383
Second Job : 2025-03-27 19:12:28.660974


KeyboardInterrupt: 

In [18]:
""" 
Objective: Running all job for the first time before the actual schedule time
"""
# TODO: Create 2 job functions
# TODO: Create schedule for every 4 seconds for the first job
# TODO: Create schedule for every seconds for the second job
# TODO: Run all job at first code execution

import schedule
import datetime
import time


schedule.clear() # clear job from previous code

print(f"count of existing jobs {len(schedule.get_jobs())}\n\n")

def job1():
    print(f"job1 => {datetime.datetime.now()}")

def job2():
    print(f"job2 => {datetime.datetime.now()}")


schedule.every(4).seconds.do(job1)
schedule.every(1).seconds.do(job2)


print(f"existing jobs : {len(schedule.get_jobs())}",)
print(f"Jobs : {schedule.get_jobs()}")

schedule.run_all()

while True:
    schedule.run_pending()
    time.sleep(1)



count of existing jobs 0


existing jobs : 2
Jobs : [Every 4 seconds do job1() (last run: [never], next run: 2025-03-27 19:20:17), Every 1 second do job2() (last run: [never], next run: 2025-03-27 19:20:14)]
job1 => 2025-03-27 19:20:13.613950
job2 => 2025-03-27 19:20:13.614385
job2 => 2025-03-27 19:20:14.615007
job2 => 2025-03-27 19:20:15.615328
job2 => 2025-03-27 19:20:16.615581
job1 => 2025-03-27 19:20:17.615894
job2 => 2025-03-27 19:20:17.616092
job2 => 2025-03-27 19:20:18.616288
job2 => 2025-03-27 19:20:19.616619
job2 => 2025-03-27 19:20:20.616904
job1 => 2025-03-27 19:20:21.617190
job2 => 2025-03-27 19:20:21.617362
job2 => 2025-03-27 19:20:22.617529
job2 => 2025-03-27 19:20:23.618006
job2 => 2025-03-27 19:20:24.618374
job1 => 2025-03-27 19:20:25.618935
job2 => 2025-03-27 19:20:25.619044
job2 => 2025-03-27 19:20:26.619207
job2 => 2025-03-27 19:20:27.619504
job2 => 2025-03-27 19:20:28.619706
job1 => 2025-03-27 19:20:29.619933
job2 => 2025-03-27 19:20:29.620059
job2 => 2025-03-27 19:

KeyboardInterrupt: 

In [21]:
""" 
Objective: Run a job at random intervals
"""
# TODO: Create a job function that print a message
# TODO: Create schedule for every 5 to 10 seconds.
# TODO: Cancel those job and see if your schedule executed or canceled
import schedule
import time
import datetime

schedule.clear() # clear job from previous code
print(f"count of existing jobs {len(schedule.get_jobs())}\n\n")

def job1():
    print(f"job1 => {datetime.datetime.now()}")


job = schedule.every(5).to(10).seconds.do(job1)

print(f"existing jobs : {len(schedule.get_jobs())}",)
print(f"Jobs : {schedule.get_jobs()}")

n=0

while True:
    schedule.run_pending()
    time.sleep(1)

    if n >= 15: 
        schedule.cancel_job(job)
    
    n+=1

    print(f"existing jobs : {len(schedule.get_jobs())}",)
    print(f"Jobs : {schedule.get_jobs()}")

count of existing jobs 0


existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-03-27 19:25:59)]
existing jobs : 1
Jobs : [Every 5 to 10 seconds do job1() (last run: [never], next run: 2025-

KeyboardInterrupt: 

In [23]:
""" 
Objective: Use a decorator to schedule a job
"""

from schedule import every, repeat, run_pending, clear, get_jobs
import time
import datetime
import time



clear() # clear job from previous code
print(f"count of existing jobs {len(get_jobs())}\n\n")

@repeat(every(2).seconds)
# TODO: Create a job function that print a message
def my_job():
    print(f"Hello at {datetime.datetime.now()}")


print(f"existing jobs : {len(get_jobs())}",)
print(f"Jobs : {get_jobs()}")

while True:
    run_pending()
    time.sleep(1)

count of existing jobs 0


existing jobs : 1
Jobs : [Every 2 seconds do my_job() (last run: [never], next run: 2025-03-27 19:31:12)]
Hello at 2025-03-27 19:31:12.667185
Hello at 2025-03-27 19:31:14.667715
Hello at 2025-03-27 19:31:16.668171
Hello at 2025-03-27 19:31:18.668620
Hello at 2025-03-27 19:31:20.669072
Hello at 2025-03-27 19:31:22.669529
Hello at 2025-03-27 19:31:24.670002
Hello at 2025-03-27 19:31:26.670442
Hello at 2025-03-27 19:31:28.670865
Hello at 2025-03-27 19:31:30.671347
Hello at 2025-03-27 19:31:32.671863
Hello at 2025-03-27 19:31:34.672360
Hello at 2025-03-27 19:31:36.672793
Hello at 2025-03-27 19:31:38.673287
Hello at 2025-03-27 19:31:40.673765
Hello at 2025-03-27 19:31:42.674244
Hello at 2025-03-27 19:31:44.674736
Hello at 2025-03-27 19:31:46.675197
Hello at 2025-03-27 19:31:48.675680
Hello at 2025-03-27 19:31:50.676119
Hello at 2025-03-27 19:31:52.676617
Hello at 2025-03-27 19:31:54.677119
Hello at 2025-03-27 19:31:56.677586
Hello at 2025-03-27 19:31:58.678030
Hel

KeyboardInterrupt: 

#### What is a Decorator?
A decorator in Python is a function that takes another function and extends or alters its behavior. Decorators are typically used to modify the behavior of a function or method in a clean and reusable way without directly modifying the function's code.

In [5]:
import time

# Define a decorator that measures the execution time of a function
def timer_decorator(func):
    """
    This decorator will measure the time it takes for the decorated function to run.
    """
    def wrapper(*args, **kwargs):
        start_time = time.time()  # Record the start time
        result = func(*args, **kwargs)  # Call the original function
        end_time = time.time()  # Record the end time
        
        elapsed_time = end_time - start_time  # Calculate the time difference
        print(f"{func.__name__} took {elapsed_time:.0f} seconds to execute.")
        
        return result  # Return the result of the function
        
    return wrapper

# Example function that will use the timer decorator
@timer_decorator
def add_numbers(a, b):
    time.sleep(2)  # Simulate a time-consuming task
    return a + b

@timer_decorator
def multiply_numbers(a, b):
    time.sleep(2)
    return a * b

# Calling the function
result = add_numbers(5, 7)
print(f"Result: {result}")

multiply_result = multiply_numbers(5, 7)
print(f"Result: {multiply_result}")


add_numbers took 2 seconds to execute.
Result: 12
multiply_numbers took 2 seconds to execute.
Result: 35


In [29]:
""" 
Objective: Run a scraping job with scheduling
"""
# TODO: Create a script to extract news website
# TODO: Add timestamp on when the data extracted
# TODO: Add scheduling to run it every hour
# TODO: Make sure to avoid duplicate data
# TODO: Append new data with the previous data instead of overwrite it in csv format

import requests
from schedule import every, repeat, run_pending, clear, get_jobs, run_all
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup as bs

WEB_URL = "https://jateng.tribunnews.com/"
CSV_FILEPATH = "tribun_jateng_headlines.csv"


clear() # clear job from previous code

def fetch_page(url:str):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
    }
    result = ""
    try:
        res = requests.get(url=url, headers=headers, timeout=5)
        res.raise_for_status()
        result =  res.text
        return  result
    except requests.exceptions.Timeout as e:
        print(f"error timeout : {e}")
    except requests.exceptions.RequestException as e:
        print(f"error : {e}")
    
    return ""

def extract_data(html):
    soup = bs(html, "html.parser")
    headline1 = soup.select_one("#headline_1_jateng")

    if headline1:
        return headline1.get("title", "").strip()
    
    return None

def read_csv(filepath):
    try:
        df = pd.read_csv(filepath)
        if df.empty:
            raise FileNotFoundError 
    except (FileNotFoundError, pd.errors.EmptyDataError):
        df = pd.DataFrame(columns=["No", "Headline", "Waktu"])
        df.to_csv(filepath, index=False)

    return df

def save_data(filepath, data: str, existing_data):
    if data in existing_data["Headline"].values:
        return
    
    new_data = pd.DataFrame([{
        "No": len(existing_data['No']),
        "Headline": data,
        "Waktu": f"{datetime.datetime.now()}"
    }])

    new_data.to_csv(filepath, mode='a', header=False, index=False)

def time_elapsed_tracker(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        print(f"function {func.__name__} called at {datetime.datetime.now()}")
        result = func(*args, **kwargs)
        end = time.time()
        print(f"function {func.__name__} finished at {datetime.datetime.now()}")
        time_elapsed = end - start
        print(f"running function {func.__name__} took time {time_elapsed:.0f} seconds")
        return result
    return wrapper

@repeat(every(1).hour.at("00:00"))
@time_elapsed_tracker
def scrape_news_web_job():
    html = fetch_page(WEB_URL)
    data = extract_data(html=html)

    existing_data = read_csv(filepath=CSV_FILEPATH)

    save_data(CSV_FILEPATH, data, existing_data)

    

def main():
    print(f"existing jobs len {len(get_jobs())} jobs : {get_jobs()}")
    run_all()
    while True:
        schedule.run_pending()
        time.sleep(1)

if __name__ == "__main__":
    main()




existing jobs len 1 jobs : [Every 1 hour at 00:00:00 do wrapper() (last run: [never], next run: 2025-03-27 20:00:00)]
function scrape_news_web_job called at 2025-03-27 19:41:31.685787
function scrape_news_web_job finished at 2025-03-27 19:41:32.510265
running function scrape_news_web_job took time 1 seconds


KeyboardInterrupt: 

### **Reflection**
When using a schedule, we need to keep the terminal when we running the script remain open. What problem might occurs?

- Terminal closes unexpectedly
- System reboots or crashes
- Process Terminated

### **Exploration**
Find a way to execute an app in the background or you can use cronjob.