#### HTML-Based Method



In this method, we will scrape the data directly from the HTML structure of the webpage.

1. Inspect the Web Page: Use your browser's developer tools to inspect the structure of the webpage and identify the HTML elements that contain the information you need. You'll typically look for HTML elements such as div, table, or ul with specific classes or IDs that indicate where the employee information is located.

2. HTML Parsing: You can use a library like BeautifulSoup in Python to parse the HTML and extract the required information. Here's a sample Python script to get you started:

In [11]:
import requests
from bs4 import BeautifulSoup
import csv

url = "https://www.svnit.ac.in/web/department/computer/faculty.php"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the HTML elements that contain employee information
faculty_list = soup.find_all("div", class_="faculty-list")

# Create a CSV file to save the data
with open("faculty_data_html.csv", "w", newline="") as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerow(["Name", "Office Email", "Personal Email", "Phone Number", "Designation", "Highest Degree", "Research Area", "Webpage Link"])
    
    for faculty in faculty_list:
        # Extract the information for each faculty member and write it to the CSV file
        name = faculty.find("h4").text
        office_email = faculty.find("a", class_="contact-fac-email").text
        personal_email = faculty.find("a", class_="contact-per-email").text
        phone_number = faculty.find("a", class_="contact-phone").text
        designation = faculty.find("p", class_="contact-desig").text
        highest_degree = faculty.find("p", class_="contact-degree").text
        research_area = faculty.find("p", class_="contact-area").text
        webpage_link = faculty.find("a", class_="contact-web").get("href")
        
        csv_writer.writerow([name, office_email, personal_email, phone_number, designation, highest_degree, research_area, webpage_link])



API-Based Method

If the website provides an API to fetch the data, it's a more reliable and structured way to extract information. However, not all websites offer APIs.

1. API Inspection: Check if the website provides an API to access the faculty data. You can do this by inspecting the network traffic in your browser's developer tools when the page loads or by looking for an API endpoint in the webpage source code.

2. API Request: Use a programming language like Python to send an HTTP GET request to the API endpoint and retrieve the data in JSON format.

3. Data Extraction and CSV Writing: Extract the required information from the JSON response and save it to a CSV file, similar to the HTML-based method.

Here's a template for making an API request in Python (assuming there's an API):

In [12]:
import requests
import csv

api_url = "https://www.example.com/api/faculty"

response = requests.get(api_url)

if response.status_code == 200:
    try:
        data = response.json()
        
        # Check if the JSON response is a list (assumes data is an array of faculty)
        if isinstance(data, list):
            with open("faculty_data_api.csv", "w", newline="") as csvfile:
                csv_writer = csv.writer(csvfile)
                csv_writer.writerow(["Name", "Office Email", "Personal Email", "Phone Number", "Designation", "Highest Degree", "Research Area", "Webpage Link"])
                
                for faculty in data:
                    name = faculty.get("name", "")
                    office_email = faculty.get("office_email", "")
                    personal_email = faculty.get("personal_email", "")
                    phone_number = faculty.get("phone_number", "")
                    designation = faculty.get("designation", "")
                    highest_degree = faculty.get("highest_degree", "")
                    research_area = faculty.get("research_area", "")
                    webpage_link = faculty.get("webpage_link", "")
                    
                    csv_writer.writerow([name, office_email, personal_email, phone_number, designation, highest_degree, research_area, webpage_link])
        else:
            print("API response is not in the expected format (list of faculty).")
    except ValueError:
        print("API response is not valid JSON.")
else:
    print("API request failed with status code:", response.status_code)


API request failed with status code: 404


In [13]:
import requests 

# Making a GET request 
r = requests.get('https://www.svnit.ac.in/web/department/computer/faculty.php') 
# success code - 200 
print(r) 
# print content of request 
print(r.content)


<Response [200]>
b'<!DOCTYPE html>\n<html lang="en-US">\n<style type="text/css">\n\ta.btn-orange, .btn-orange:active, .btn-orange:focus, .btn-blue, .btn-blue:active, .btn-blue:focus, button, input[type="button"], input[type="submit"], .woocommerce button.button, .woocommerce input.button, .i-email-subscribe, .footer-widget .wpcf7-form .wpcf7-submit, .navbar li.pull-right a.woo-menu-cart span, #wp-submit\n\t\tbackground-color:  #2ab7ed99;\n\t}\n \n</style>\n<!-- <style type="text/css">\n    .faculty{\n        width: 95%;\n        margin-left: 5%;\n        margin-right: 5%;\n        padding-left: 5%;\n        padding-right: 15%;\n\n    }\n</style> -->\n<!DOCTYPE html>\r\n<html lang="en-US">\r\n\r\n\r\n<meta http-equiv="content-type" content="text/html;charset=UTF-8" />\r\n<head>\r\n<base >\r\n<meta charset="UTF-8">\r\n<meta name="viewport" content="width=device-width, initial-scale=1">\r\n\r\n<link rel="icon"  type="image/png" href="//www.svnit.ac.in/images/logo.png">\r\n<title>SVNIT, Su

In [14]:
# print request object 
print(r.url) 
    
# print status code 
print(r.status_code)

https://www.svnit.ac.in/web/department/computer/faculty.php
200


In [15]:

import requests 
from bs4 import BeautifulSoup 
  
# check status code for response received 
# success code - 200 
print(r) 
  
# Parsing the HTML 
soup = BeautifulSoup(r.content, 'html.parser') 
print(soup.prettify()) 

<Response [200]>
<!DOCTYPE html>
<html lang="en-US">
 <style type="text/css">
  a.btn-orange, .btn-orange:active, .btn-orange:focus, .btn-blue, .btn-blue:active, .btn-blue:focus, button, input[type="button"], input[type="submit"], .woocommerce button.button, .woocommerce input.button, .i-email-subscribe, .footer-widget .wpcf7-form .wpcf7-submit, .navbar li.pull-right a.woo-menu-cart span, #wp-submit
		background-color:  #2ab7ed99;
	}
 </style>
 <!-- <style type="text/css">
    .faculty{
        width: 95%;
        margin-left: 5%;
        margin-right: 5%;
        padding-left: 5%;
        padding-right: 15%;

    }
</style> -->
 <!DOCTYPE html>
 <html lang="en-US">
  <meta content="text/html;charset=utf-8" http-equiv="content-type"/>
  <head>
   <base/>
   <meta charset="utf-8"/>
   <meta content="width=device-width, initial-scale=1" name="viewport"/>
   <link href="//www.svnit.ac.in/images/logo.png" rel="icon" type="image/png"/>
   <title>
    SVNIT, Surat
   </title>
   <title>
    

In [None]:
name=[]
email=[]
destination=[]
mob = []
