# Web Scraping Code
### Assignment 2 | CIS 3389
Program Authored by Rowan Rollman

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

In [2]:
page = requests.get('https://www.math.txst.edu/directory.html')
soup = BeautifulSoup(page.content,'html.parser')

The format of the next block of code is as follows:
* Find all instances of the "info-container" class, this is where all the information for each faculty is stored.
* For each info-container, do the following
    * Find the persons name, position, email, office, phone number, and pronouns
    * If the variable under that tag is not None type (meaning, if there is something in that tag), do the following:
        * Get the text (no HTML tags) and strip the text (making sure there are no newline characters)
        * Add this string to the appropriate list.
    * If the variable is None type, append 'N/A' to the appropriate list. We do this to assure that all the lists are the same length.

In the email, office, and phone number section, there is some string manipulation going on. This is because when the information is stripped, it shows up as "Email: ..." and we only want the email. 

We change the variable from None type to str, making sure that we're passing the text and not the HTML tags.
```
temp = str(temp.get_text())
```
Then, we replace the section we don't want with a blank character.
```
temp = temp.replace('Email: ','')
```
Finally, we append this string to the appropriate email, stripping it to make sure there are no newline characters.
```
emails.append(temp.strip())
```

In [3]:
container = soup.find_all(class_='info-container')
names = []
positions = []
emails = []
office = []
phone = []
pronouns = []
for c in container:
    #this gets the person's name
    temp = c.find(class_='listitem-title')
    if temp is not None:
        names.append(temp.get_text().strip())
    else:
        names.append('N/A')
    #this gets the person's position
    temp = c.find(class_='listitem-position')
    if temp is not None:
        positions.append(temp.get_text().strip())
    else:
        positions.append('N/A')
    #this gets the person's email
    temp = c.find(class_='listitem-email')
    if temp is not None:
        temp = str(temp.get_text())
        temp = temp.replace('Email: ',"")
        emails.append(temp.strip())
    else:
        emails.append('N/A')
    #this gets the person's office
    temp = c.find(class_='listitem-office')
    if temp is not None:
        temp = str(temp.get_text())
        temp = temp.replace('Office: ','')
        office.append(temp.strip())
    else:
        office.append('N/A')
    #this gets the person's phone
    temp = c.find(class_='listitem-phone')
    if temp is not None:
        temp = str(temp.get_text())
        temp = temp.replace('Phone: ','')
        phone.append(temp.strip())
    else:
        phone.append('N/A')
        #this gets the person's pronouns, if they are listed
    temp = c.find(class_='listitem-pronouns')
    if temp is not None:
        pronouns.append(temp.get_text().strip())
    else:
        pronouns.append('N/A')

In [4]:
#We are creating a dictionary 'data' with the lists from earlier as the columns
data = {'Names':names,'Pronouns':pronouns,'Position':positions,
       'Email':emails,'Phone Number':phone,'Office':office}


In [5]:
#Here, we transform the data from a dictionary to a DataFrame and display part of it
d = pd.DataFrame(data)
d

Unnamed: 0,Names,Pronouns,Position,Email,Phone Number,Office
0,Michael Q Abili,He/Him/His/Himself,Lecturer,m.abili@txstate.edu,(512) 245-2551,MCS 462
1,Carlos I Acevedo,,,cia24@txstate.edu,,
2,Dr. Connor T Ahlbach,,Lecturer,c_a518@txstate.edu,(512) 245-4021,ELTB B312
3,Dr. Weam Al-Tameemi,,Lecturer,wma22@txstate.edu,(512) 245-6179,DERR 211
4,Jenna R Ashby,,,j_a802@txstate.edu,,
...,...,...,...,...,...,...
128,Elizabeth Wrightsman,She/Her/Hers/Herself,,ewrightsman@txstate.edu,,Derrick
129,Dr. Yong Yang,,Associate Professor,yang@txstate.edu,(512) 245-3742,MCS 461
130,Dr. Mohammad Zarrin,,Lecturer,m.zarrin@txstate.edu,(512) 245-2551,MCS 462
131,Dr. Qiang Zhao,,Associate Professor,qiang.zhao@txstate.edu,(512) 245-3737,MCS 481


In [6]:
#Then we save the data to a csv file
d.to_csv('WebScrapingData_group11.csv')