<a href="https://colab.research.google.com/github/subho99/Computational-Data-Science/blob/main/SubhajitBasistha_M3_AST_01_Monitoring_Resources_Psutil_C.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Assignment 1: Monitoring Resources Using Psutil

## Learning Objectives

At the end of the experiment, you will be able to:
 
- Understand, what does monitoring your device mean
- Explore various functions of `psutil` package
- Explore `multiprocessing` package
- Evaluate the advantage of parallelism using Psutil

### What does monitoring your device mean?

It means to keep track of various resources in the system and their utilization


Resources such as:
- CPU
- GPU 
- Memory (RAM, Swap space, and Hard disk space)
- Disks 
- Network 
- Sensors

### Why do we want to monitor various resources?

1. Monitoring and measuring allows us to understand resource allocation better
2. Monitoring helps to regularly evaluate the performance of the critical system resources.
3. Helps to identify the process that is using the maximum resources.
4. It helps to evaluate if the current system's resources are sufficient to execute a particular task.
5. To reduce escalation of issues.

### Psutil

Psutil is a Python cross-platform library used to access system details and process utilities.
This library is used for system monitoring, profiling, limiting process resources, and the management of running processes.

Click [here](https://pypi.org/project/psutil/) to proceed to the official documentation of Psutil.

Let us explore various functions of Psutil.

### Setup Steps:

In [41]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2236624" #@param {type:"string"}

In [2]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "8240187807" #@param {type:"string"}

In [42]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook= "M3_AST_01_Monitoring_Resources_Psutil_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")  
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None
    
    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:        
        print(r["err"])
        return None   
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://cds.iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if not Additional: 
      raise NameError
    else:
      return Additional  
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None
  
  
# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None
  
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None
  

def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError 
    else: 
      return Answer
  except NameError:
    print ("Please answer Question")
    return None
  

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup() 
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Import required packages

In [4]:
# Importing libraries
import psutil
import platform

### System profile




Here we will explore the Psutil functions that helps us explore about the system.

Profile your system to know the system name, OS version, if the system is a 64-bit architecture or 32-bit architecture, number of physical and virtual cores, and the max and min frequency of the CPU.

In [5]:
#Windows or Linux
uname = platform.uname()
print(f"System: {uname.system}")  

System: Linux


In [6]:
# System name
print(f"Node Name: {uname.node}") 

Node Name: 01589b08f2b6


In [7]:
# OS release version like  10(Windows) or 5.4.0-72-generic(linux)
print(f"Release: {uname.release}") 

Release: 5.10.147+


In [8]:
print(f"Version: {uname.version}")

Version: #1 SMP Sat Dec 10 16:00:40 UTC 2022


In [9]:
# machine can be AMD64 or x86-64
print(f"Machine: {uname.machine}")  

Machine: x86_64


In [10]:
#  Intel64 Family 6 or x86_64
print(f"Processor: {uname.processor}") 

Processor: x86_64


In [11]:
#Number of physical cores
print("Physical cores:", psutil.cpu_count(logical=False))

Physical cores: 1


In [12]:
print("Total cores:", psutil.cpu_count(logical=True))

Total cores: 2


* user – time spent by normal processes executing in user mode.
* system – time spent by processes executing in kernel mode.
* idle – time when system was idle.
* nice – time spent by priority processes executing in user mode.
* iowait – time spent waiting for I/O to complete. This is not accounted in idle time counter.
* irq – time spent for servicing hardware interrupts.
* softirq – time spent for servicing software interrupts.
* steal – time spent by other operating systems running in a virtualized environment.
* guest – time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.

In [13]:
print(psutil.cpu_times())

scputimes(user=55.64, nice=0.0, system=28.54, idle=326.86, iowait=22.67, irq=0.0, softirq=1.62, steal=0.31, guest=0.0, guest_nice=0.0)


This function calculates the current system-wide percentage CPU utilization. It is recommended to provide time interval (seconds) as parameter to the function over which the average CPU usage will be calculated; ignoring the interval parameter could result in high variation in usage values.

In [14]:
print(psutil.cpu_percent(1))

3.5


* ctx_switches – number of context switches since boot.
* interrupts – number of interrupts since boot.
* soft_interrupts – number of software interrupts since boot.
* syscalls – number of system calls since boot. Always set to 0 in Ubuntu.

In [15]:
print("CPU Statistics", psutil.cpu_stats())

CPU Statistics scpustats(ctx_switches=806527, interrupts=453445, soft_interrupts=421224, syscalls=0)


In [16]:
print(psutil.boot_time())

1681549940.0


 This function returns the system boot time which is expressed in seconds since the epoch. 

### Monitoring and Limiting Memory

Virtual memory is a combination of RAM and the disk space that all the processes running on the CPU use, while Swap space is the portion of virtual memory on the hard disk used by the running processes when the RAM is full.

* total – total physical memory excluding swap.
* available – the memory that can be given instantly to processes without the system going into swap.
* used – memory used.
* free – memory not used at and is readily available
* active – memory currently in use or very recently used.
* inactive – memory that is marked as not used.
* buffers – cache data like file system metadata.
* cached – cached data
* shared – memory that may be accessed by multiple processes.

In [17]:
print(psutil.virtual_memory())

svmem(total=13616332800, available=12586950656, percent=7.6, used=722268160, free=8426934272, active=597020672, inactive=4342464512, buffers=335396864, cached=4131733504, shared=1437696, slab=176406528)


* total – total swap memory in bytes
* used – used swap memory in bytes
* free – free swap memory in bytes
* percent – the percentage usage that is calculated as (total – available) / total * 100
* sin – the number of bytes the system has swapped in from disk
* sout – the number of bytes the system has swapped out from disk

In [18]:
print(psutil.swap_memory())

sswap(total=0, used=0, free=0, percent=0.0, sin=0, sout=0)


In [19]:
def get_size(bytes, suffix="B"):
    """
    Scale bytes to its proper format- KB, MB, GB, TB and PB
    """
    factor = 1024
    for unit in ["", "K", "M", "G", "T", "P"]:
        if bytes < factor:
            return f"{bytes:.2f}{unit}{suffix}"
        bytes /= factor

In [20]:
print("Virtual memory")
svmem = psutil.virtual_memory()
print(f"Total: {get_size(svmem.total)}")
print(f"Available: {get_size(svmem.available)}")
print(f"Used: {get_size(svmem.used)}")
print(f"Percentage: {svmem.percent}%")


Virtual memory
Total: 12.68GB
Available: 11.72GB
Used: 689.19MB
Percentage: 7.6%


In [21]:
# get the swap memory details (if exists)
swap = psutil.swap_memory()
print("SWAP memory")
print(f"Total: {get_size(swap.total)}")
print(f"Free: {get_size(swap.free)}")
print(f"Used: {get_size(swap.used)}")
print(f"Percentage: {swap.percent}%")

SWAP memory
Total: 0.00B
Free: 0.00B
Used: 0.00B
Percentage: 0.0%


### Monitoring and Limiting Hard Disk Space

This function provides the details of all mounted disk partitions as a list of tuples including device, mount point and filesystem type.

In [22]:
print(psutil.disk_partitions())

[sdiskpart(device='/dev/root', mountpoint='/usr/sbin/docker-init', fstype='ext2', opts='ro,relatime', maxfile=255, maxpath=4096), sdiskpart(device='/dev/sda1', mountpoint='/etc/resolv.conf', fstype='ext4', opts='rw,nosuid,nodev,relatime,commit=30', maxfile=255, maxpath=4096), sdiskpart(device='/dev/sda1', mountpoint='/etc/hostname', fstype='ext4', opts='rw,nosuid,nodev,relatime,commit=30', maxfile=255, maxpath=4096), sdiskpart(device='/dev/sda1', mountpoint='/etc/hosts', fstype='ext4', opts='rw,nosuid,nodev,relatime,commit=30', maxfile=255, maxpath=4096)]


 This function gives disk usage statistics as a tuple for a given path. Total, used and free space are expressed in bytes, along with the percentage usage.

In [23]:
print(psutil.disk_usage('/'))

sdiskusage(total=115658190848, used=24879468544, free=90761945088, percent=21.5)


In [24]:
print( "Hard Disk Information")
print("Partitions and Usage:")
# get all disk partitions on the device
partitions = psutil.disk_partitions()
for partition in partitions:
    print("Device:",partition.device)
    print("Partition Mountpoint: ",partition.mountpoint)
    print("Partition File system type",partition.fstype)
    try:
        partition_usage = psutil.disk_usage(partition.mountpoint)
    except PermissionError:
        continue
    print("Total Size: ", get_size(partition_usage.total))
    print("Used Space: ", get_size(partition_usage.used))
    print("Free hard disk Space", get_size(partition_usage.free))
    print("Hard disk Used Percentage: ", partition_usage.percent, "%")
    if(partition_usage.percent >82):
        print("Disk space nearing full")

Hard Disk Information
Partitions and Usage:
Device: /dev/root
Partition Mountpoint:  /usr/sbin/docker-init
Partition File system type ext2
Total Size:  1.91GB
Used Space:  1.09GB
Free hard disk Space 840.96MB
Hard disk Used Percentage:  57.0 %
Device: /dev/sda1
Partition Mountpoint:  /etc/resolv.conf
Partition File system type ext4
Total Size:  69.65GB
Used Space:  42.02GB
Free hard disk Space 27.62GB
Hard disk Used Percentage:  60.3 %
Device: /dev/sda1
Partition Mountpoint:  /etc/hostname
Partition File system type ext4
Total Size:  69.65GB
Used Space:  42.02GB
Free hard disk Space 27.62GB
Hard disk Used Percentage:  60.3 %
Device: /dev/sda1
Partition Mountpoint:  /etc/hosts
Partition File system type ext4
Total Size:  69.65GB
Used Space:  42.02GB
Free hard disk Space 27.62GB
Hard disk Used Percentage:  60.3 %


### Monitoring and Limiting Network Usage

All network protocols are associated with a specific address family. An address family provides services like packet fragmentation and reassembly, routing, addressing, and transporting. The address family provides interprocess communication between processes that run on the same system or different systems.

An address family is normally comprised of several protocols, one per socket type.

Different networks address families and their purpose:
* AF_INET: IPv4 Internet protocols
* AF_INET6: IPv6 Internet protocols
* AF_NETLINK: Kernel user interface device
* AF_PACKET: Low-level packet interface

* family – the socket family, either AF_INET or AF_INET6
* address – the primary NIC address
* netmask – the netmask address
* broadcast – the broadcast address.
* ptp – “point to point” it is the destination address on a point to point interface.

In [25]:
print(psutil.net_if_addrs())

{'lo': [snicaddr(family=<AddressFamily.AF_INET: 2>, address='127.0.0.1', netmask='255.0.0.0', broadcast=None, ptp=None), snicaddr(family=<AddressFamily.AF_PACKET: 17>, address='00:00:00:00:00:00', netmask=None, broadcast=None, ptp=None)], 'eth0': [snicaddr(family=<AddressFamily.AF_INET: 2>, address='172.28.0.12', netmask='255.255.0.0', broadcast='172.28.255.255', ptp=None), snicaddr(family=<AddressFamily.AF_PACKET: 17>, address='02:42:ac:1c:00:0c', netmask=None, broadcast='ff:ff:ff:ff:ff:ff', ptp=None)]}


In [26]:
print( "Network Information")
# get all network interfaces (virtual and physical)
if_addrs = psutil.net_if_addrs()
for interface_name, interface_addresses in if_addrs.items():
    for address in interface_addresses:
        print(" Interface: ", interface_name)
        if str(address.family) == 'AddressFamily.AF_INET':
            print("  IP Address: ", address.address)
            print("  Netmask: ", address.netmask)
            print("  Broadcast IPv4: ",address.broadcast)
        elif str(address.family) == 'AddressFamily.AF_PACKET':
            print("  MAC Address: {address.address}")
            print("  Netmask: {address.netmask}")
            print("  Broadcast MAC: {address.broadcast}")
        elif str(address.family) == 'AddressFamily.AF_INET6':
            print("  IP Address: ", address.address)
            print("  Netmask: ", address.netmask)
            print("  Broadcast IPv6: ",address.broadcast)

Network Information
 Interface:  lo
  IP Address:  127.0.0.1
  Netmask:  255.0.0.0
  Broadcast IPv4:  None
 Interface:  lo
  MAC Address: {address.address}
  Netmask: {address.netmask}
  Broadcast MAC: {address.broadcast}
 Interface:  eth0
  IP Address:  172.28.0.12
  Netmask:  255.255.0.0
  Broadcast IPv4:  172.28.255.255
 Interface:  eth0
  MAC Address: {address.address}
  Netmask: {address.netmask}
  Broadcast MAC: {address.broadcast}


 Return system-wide network I/O statistics like bytes sent, bytes received, incoming packets that were dropped, or outgoing packets dropped

* bytes_sent – number of bytes sent
* bytes_recv – number of bytes received
* packets_sent – number of packets sent
* packets_recv – number of packets received
* errin – total number of errors while receiving
* errout – total number of errors while sending
* dropin – total number of incoming packets which were dropped
* dropout – total number of outgoing packets which were dropped

In [27]:
print(psutil.net_io_counters())

snetio(bytes_sent=1248786, bytes_recv=1454455, packets_sent=4204, packets_recv=4474, errin=0, errout=0, dropin=0, dropout=0)


In [28]:
net_io = psutil.net_io_counters()
print("Total Bytes Sent: ", get_size(net_io.bytes_sent))
print("Total Bytes Received: ", get_size(net_io.bytes_recv))
print("Total outgoing packets dropped: ", net_io.dropin)
print("Total incoming packets dropped:", net_io.dropout)
print("Total outgoing errors: ", net_io.errout)
print("Total incoming errors:", net_io.errin)

Total Bytes Sent:  1.23MB
Total Bytes Received:  1.43MB
Total outgoing packets dropped:  0
Total incoming packets dropped: 0
Total outgoing errors:  0
Total incoming errors: 0


This function gives the list of socket connections of a system as a named tuples.

* fd – the socket file descriptor.
* family – the socket family, either AF_INET, AF_INET6 or AF_UNIX.
* type – the socket type, either SOCK_STREAM, SOCK_DGRAM or SOCK_SEQPACKET.
* laddr – the local address as a (ip, port) named tuple
* raddr – the remote address as a (ip, port) named tuple
* status – represents the status of a TCP connection.
* pid – the PID of the process which opened the socket, if retrievable, else None.

In [29]:
print(psutil.net_connections())

[sconn(fd=-1, family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_DGRAM: 2>, laddr=addr(ip='127.0.0.11', port=39842), raddr=(), status='NONE', pid=None), sconn(fd=8, family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, laddr=addr(ip='172.28.0.12', port=43432), raddr=addr(ip='172.28.0.12', port=9000), status='ESTABLISHED', pid=33), sconn(fd=-1, family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, laddr=addr(ip='172.28.0.12', port=6000), raddr=addr(ip='172.28.0.12', port=49030), status='TIME_WAIT', pid=None), sconn(fd=3, family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, laddr=addr(ip='172.28.0.12', port=6000), raddr=(), status='LISTEN', pid=33), sconn(fd=-1, family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, laddr=addr(ip='172.28.0.12', port=6000), raddr=addr(ip='172.28.0.12', port=43914), status='TIME_WAIT', pid=None), sconn(fd=10, family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, laddr=addr(ip=

### Exploring Multiprocessing

One of the key objectives for a developer is to make the code run faster. However, many tasks take time to be processed, even on fast computers with several cores. Partially it happens because of Python GIL (Python Global Interpreter Lock) that allows only one thread to take control over Python Interpreter, so we end up never using the whole power of the machine by just executing a function or method.

### Comparison between synchronous and multiprocessing code.

The test in this case is a simple HTTP GET request from the public API, https://httpbin.org. They offer many API services but the one used here is `uuid`, which simply returns a unique 36 character string every time you make a GET request.

Data fetching from the Internet can be done in various ways. Here to keep things simple `requests` module is used.

## Synchronous code

Synchronous code is a sequence of tasks that are done in a synchronous fashion or one after the other. What that means is that before task 3 is started, the machine has to wait for task 2 to finish, but prior to that, wait for task 1 to finish. This is not an efficient way of doing things but sometimes there is just no other way. Synchronous code is quite easy to understand and easy to write.

In [30]:
import time
import requests

URL = 'https://httpbin.org/uuid'

t0 = time.time()

for _ in range(300):
    response = requests.get(URL)

print(f'It took {time.time() - t0:.2f} s to finish the request')

It took 51.16 s to finish the request


What the code does is that it starts 300 GET requests to the API in question and returns the result. Before starting the requests a time variable t0 is initialized that holds the start time. After exiting the 'for' loop, the time it took to finish all the requests is displayed.

## Multiprocessing code

A multiprocessing code is able to leverage modern CPUs by [spawning processes](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). What that means is that if there are 8 cores in the CPU, 8 different processes can be spawned to attempt 8 different tasks simultaneously. An additional module called `ProcessPoolExecutor` is used here. It is a subclass that executes calls asynchronously using a pool object. One benefit of using a process pool executor is that it returns a generator object instead of a simple list containing the results. It is possible to iterate over results we can simple call __next__ on it.

In [31]:
import time
import requests
from concurrent.futures import ProcessPoolExecutor

def fetch(url):
    response = requests.get(url)
    return response.json()

if __name__ == '__main__':
    
    URL = 'https://httpbin.org/uuid'
    t0 = time.time()
    
    with ProcessPoolExecutor() as executor:
        future = executor.map(fetch, [URL for _ in range(300)])
   
    print(f'It took {time.time() - t0:.2f} secs to finish multiprocessing')

It took 23.81 secs to finish multiprocessing


As seen from the above results, there is quite an improvement with few lines of code, in comparison with synchronous results.

### Please answer the questions below to complete the experiment:

In [32]:
#@title Which of the following describes spawning? { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "A server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process" #@param ["", "The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method", "The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process", "A server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process"]

In [33]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good, But Not Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [34]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "It was a good exercise" #@param {type:"string"}


In [35]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [36]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [37]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [43]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 4146
Date of submission:  15 Apr 2023
Time of submission:  15:00:04
View your submissions: https://cds.iisc.talentsprint.com/notebook_submissions
