<a href="https://colab.research.google.com/github/jhayesn13/Test/blob/main/Dictionary_Working_chatgpt_web_crawler.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse

#Create webcrawler class
class WebCrawler:
   def __init__(self, start_url, visiting_strategy='preorder'):
       self.start_url = start_url
       self.visiting_strategy = visiting_strategy.lower()
       self.visited_urls = set()
       self.corpus = {}
       self.main_domain = urlparse(start_url).netloc

   def crawl(self, url, depth=0):
       if depth > 10:  # Limiting depth to avoid potential infinite loops
           print(f"Reached maximum depth for {url}")
           return

       if url not in self.visited_urls and self.is_same_domain(url):
           print(f"Visiting: {url}")
           self.visited_urls.add(url)
           try:
               response = requests.get(url)
               soup = BeautifulSoup(response.content, 'html.parser')
               title = soup.title.string.strip() if soup.title else 'Untitled'
               text_content = self.extract_text_content(soup)
               self.corpus[title] = text_content
               print(f"Text Content: {text_content[:100]}...")  # Output a snippet of text

               if self.visiting_strategy == 'preorder':
                   links = self.extract_links(soup)
                   for link in links:
                       self.crawl(link, depth + 1)

               # Additional visiting strategies (inorder, postorder) can be implemented here

           except Exception as e:
               print(f"Error crawling {url}: {e}")

   def extract_text_content(self, soup):
       # Extract text content only from the body of the HTML
       text_content = ' '.join([p.get_text(separator=' ', strip=True) for p in soup.body.find_all('p')])
       return text_content

   def extract_links(self, soup):
       # Extract all links from the page
       links = [link.get('href') for link in soup.find_all('a', href=True)]
       # Filter internal links only
       links = [urljoin(self.start_url, link) for link in links if link.startswith(('http', 'https'))]
       # Exclude PDF links
       links = [link for link in links if not link.endswith('.pdf')]
       # Filter out external links
       links = [link for link in links if self.is_same_domain(link)]
       # Exclude links with 'resources' in the URL
       links = [link for link in links if 'resources' not in link.lower()]
       return links

   def is_same_domain(self, url):
       return urlparse(url).netloc == self.main_domain

   def start_crawling(self):
       self.crawl(self.start_url)
       self.save_corpus()

   def save_corpus(self):
       # Save the extracted corpus to text files
       for title, content in self.corpus.items():
           with open(f"{title}.txt", 'w', encoding='utf-8') as file:
               file.write(content)

if __name__ == "__main__":
   # Get the starting URL from the user
   start_url = input("Enter the website's URL: ")

   # Instantiate the WebCrawler with the provided URL and visiting strategy
   crawler = WebCrawler(start_url=start_url, visiting_strategy='preorder')

   # Start crawling
   crawler.start_crawling()

Enter the website's URL: https://www.stjohns.edu/
Visiting: https://www.stjohns.edu/
Text Content: See how your journey aligns with what drives you. Hannah M. Queens, NY Maria Orlando, FL Jenna Charl...
Visiting: https://www.stjohns.edu/life-st-johns/career-services
Text Content: University Career Services is committed to ensuring student and alumni success. Our dedicated team o...
Visiting: https://www.stjohns.edu/life-st-johns/career-services/leadership-development
Text Content: Developing leadership skills and giving back to the community are essential components to the educat...
Visiting: https://www.stjohns.edu/uis
Text Content: HELP | EXIT  You can set up a tuition payment plan safely and securely online via your UIS account. ...
Visiting: https://www.stjohns.edu/about/leadership-and-administration/office-president/presidents-society
Text Content: Founded in 1968, the President’s Society honors those students who combine scholarship, integrity, m...
Visiting: https://www.stjohns.



Error crawling https://www.stjohns.edu/files/phd-curriculum-and-instruction-essay-prompt-instructions: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/law/academics
Text Content: J.D. Programs LL.M. Programs Clinics Centers Co-Curricular Programs Study Abroad Course Catalog Acad...
Visiting: https://www.stjohns.edu/law/apply
Text Content: We seek to identify and admit a diverse group of talented J.D. students who will succeed at St. John...
Visiting: https://www.stjohns.edu/law/give
Text Content: Please note that this form is optimized for use with the current version of the Chrome browser. If y...
Visiting: https://www.stjohns.edu/law/law-career-development
Text Content: Current Students Alumni Employers Externship Program Career Development Team Graduate Employment Dat...
Reached maximum depth for https://www.stjohns.edu/law/apply
Reached maximum depth for https://www.stjohns.edu/law/give
Reached maximum depth for https://www.stjohns.edu/law/law-career



Error crawling https://www.stjohns.edu/sites/default/files/2020-03/M1-12853%20MA%20Chinese%20%26%20EAS.PDF: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/media/64841
Text Content: 8000 Utopia Parkway Queens NY 11439 718-990-2000 St. John’s University does not discriminate on the ...
Visiting: https://www.stjohns.edu/academics/schools/college-professional-studies
Text Content: The Lesley H. and William L. Collins College of Professional Studies is a launchpad for innovators, ...
Visiting: https://www.stjohns.edu/academics/faculty/wendell-cruz
Text Content: 8000 Utopia Parkway Queens NY 11439 718-990-2000 St. John’s University does not discriminate on the ...
Visiting: https://www.stjohns.edu/events?cals=3508970&start=2023-10-22&end=2023-11-22
Text Content: Please consult this calendar frequently as events are subject to change.  Loading upcoming events......
Visiting: https://www.stjohns.edu/about/news/all-news?school=31
Text Content: The competition wa



Error crawling https://www.stjohns.edu/sites/default/files/2023-10/School%20of%20Education%20Visiting%20Scholar%20Review%20Checklist.docx: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/academics/global-programs/office-international-education-inbound-programs/english-language-institute
Text Content: Join us on campus in New York City (NYC) for our innovative program that connects the classroom to t...
Visiting: https://www.stjohns.edu/academics/global-programs/english-language-and-american-culture-programs/educationusa-academy-and-academy-connects
Text Content: St. John's University is proud to offer two programs for secondary school students interested in lea...
Visiting: https://www.stjohns.edu/academics/global-programs/english-language-and-american-culture-programs/educationusa-academy-academy-connects#EdUSA
Text Content: St. John's University is proud to offer two programs for secondary school students interested in lea...
Visiting: https://www.stjo



Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Instructions%20for%20Completing%20Summer%20Program%20Form.docx: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Camp%20Emergency%20Contact%20%26%20Consent%20to%20Treat%20Form.docx




Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Camp%20Emergency%20Contact%20%26%20Consent%20to%20Treat%20Form.docx: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Meningitis%20Vaccination%20Parent%20Information%20%26%20Response%20Form.docx




Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Meningitis%20Vaccination%20Parent%20Information%20%26%20Response%20Form.docx: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/life-st-johns?utm_source=TD%20Student%20Life&utm_medium=website&utm_campaign=2022%20Admissions%20Changes&utm_id=UGAdmissions
Text Content: New York is just one of the remarkable places the St. John’s community calls home.  With a lively Re...
Visiting: https://www.stjohns.edu/life-st-johns/residence-life?utm_source=TD%20Residence%20Life&utm_medium=website&utm_campaign=2022%20Admissions%20Changes&utm_id=UGAdmissions
Text Content: Residence Life aims to develop a residential community that supports and enhances the academic missi...
Visiting: https://www.stjohns.edu/life-st-johns/public-safety#contact
Text Content: St. John’s University Department of Public Safety provides safety and security services to our commu...
Visiting: https://www.stjohns.edu
Tex



Error crawling https://www.stjohns.edu/files/2023-annual-security-and-fire-safety-report: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/life-st-johns/public-safety
Text Content: St. John’s University Department of Public Safety provides safety and security services to our commu...
Visiting: https://www.stjohns.edu/life-st-johns/student-services/career-services/employers
Text Content: Our Mission: Preparing and empowering all students for their career journey, creating connections wi...
Visiting: https://www.stjohns.edu/life-st-johns/student-services/career-services
Text Content: University Career Services is committed to ensuring student and alumni success. Our dedicated team o...
Visiting: https://www.stjohns.edu/life-st-johns/student-services/bookstore
Text Content: Show your St. John's pride all year long with the University Bookstore's authentic assortment of col...
Visiting: https://www.stjohns.edu/about/leadership-and-administration/administrativ

FileNotFoundError: [Errno 2] No such file or directory: "Bachelor of Science / Juris Doctor | St. John's University.txt"

In [None]:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import threading

# Create webcrawler class
class WebCrawler:
    def __init__(self, start_url, visiting_strategy='preorder'):
        self.start_url = start_url
        self.visiting_strategy = visiting_strategy.lower()
        self.visited_urls = set()
        self.corpus = {}
        self.main_domain = urlparse(start_url).netloc
        self.lock = threading.Lock()  # Lock for thread-safe access to shared data

    def crawl(self, url):
        if url not in self.visited_urls and self.is_same_domain(url):
            print(f"Visiting: {url}")
            self.visited_urls.add(url)
            try:
                response = requests.get(url)
                soup = BeautifulSoup(response.content, 'html.parser')
                title = soup.title.string.strip() if soup.title else 'Untitled'
                text_content = self.extract_text_content(soup)

                with self.lock:  # Thread-safe update of shared data
                    self.corpus[title] = text_content

                print(f"Text Content: {text_content[:100]}...")  # Output a snippet of text

                if self.visiting_strategy == 'preorder':
                    links = self.extract_links(soup)
                    threads = []
                    for link in links:
                        thread = threading.Thread(target=self.crawl, args=(link,))
                        threads.append(thread)
                        thread.start()

                    # Wait for all threads to complete
                    for thread in threads:
                        thread.join()

                # Additional visiting strategies (inorder, postorder) can be implemented here

            except Exception as e:
                print(f"Error crawling {url}: {e}")

    def extract_text_content(self, soup):
        # Extract text content only from the body of the HTML
        text_content = ' '.join([p.get_text(separator=' ', strip=True) for p in soup.body.find_all('p')])
        return text_content

    def extract_links(self, soup):
        # Extract all links from the page
        links = [link.get('href') for link in soup.find_all('a', href=True)]
        # Filter internal links only
        links = [urljoin(self.start_url, link) for link in links if link.startswith(('http', 'https'))]
        # Exclude PDF links
        links = [link for link in links if not link.endswith('.pdf')]
        # Filter out external links
        links = [link for link in links if self.is_same_domain(link)]
        # Exclude links with 'resources' in the URL
        links = [link for link in links if 'resources' not in link.lower()]
        return links

    def is_same_domain(self, url):
        return urlparse(url).netloc == self.main_domain

    def start_crawling(self):
        self.crawl(self.start_url)
        self.save_corpus()

    def save_corpus(self):
        # Save the extracted corpus to text files
        for title, content in self.corpus.items():
            with open(f"{title}.txt", 'w', encoding='utf-8') as file:
                file.write(content)

if __name__ == "__main__":
    # Get the starting URL from the user
    start_url = input("Enter the website's URL: ")

    # Instantiate the WebCrawler with the provided URL and visiting strategy
    crawler = WebCrawler(start_url=start_url, visiting_strategy='preorder')

    # Start crawling
    crawler.start_crawling()


Enter the website's URL: https://www.stjohns.edu/
Visiting: https://www.stjohns.edu/
Text Content: See how your journey aligns with what drives you. Hannah M. Queens, NY Maria Orlando, FL Jenna Charl...
Visiting: https://www.stjohns.edu/life-st-johns/career-services
Visiting: https://www.stjohns.edu/about/leadership-and-administration/office-president/presidents-society
Visiting: https://www.stjohns.edu/who-we-are/faith-and-mission/campus-ministry/opportunities/plunge-programVisiting: https://www.stjohns.edu/who-we-are/campus-sustainability

Visiting: https://www.stjohns.edu/academics/programs?level%5B151%5D=151Visiting: https://www.stjohns.edu/admission/graduate-admission

Text Content: The graduate schools at St. John’s University in New York City offer more than 60 graduate degree an...
Text Content: Sustainability is a long-term responsibility to meet the needs of the present without compromising t...
Visiting: https://www.stjohns.edu/admission/apply
Visiting: https://www.stjohns.e



Visiting: https://www.stjohns.edu/node/1826?school=21
Visiting: https://www.stjohns.edu/academics/programs/accounting-master-business-administration
Visiting: http://www.stjohns.edu/admission-aid/tuition-and-financial-aid/undergraduate-aid

Visiting: https://www.stjohns.edu/law/law-career-development/externship-program
Text Content: View the current and upcoming Academic Calendar for the School of Law. PDF of Fall Semester 2023 Cal...Error crawling https://www.stjohns.edu/files/phd-curriculum-and-instruction-essay-prompt-instructions: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/node/1826?degree=2031&school=21

Visiting: https://www.stjohns.edu/law/law-career-development/career-development-team
Text Content: Secure Your Spot at St. John's University Congratulations on your decision to enroll at St. John's U...
Visiting: https://www.stjohns.edu/law/law-career-development/graduate-employment-data
Text Content: Law school is an investment and you want to



Text Content: St. John's University is proud to offer two programs for secondary school students interested in lea...Error crawling https://www.stjohns.edu/files/january-2020-review-business: 'NoneType' object has no attribute 'find_all'

Text Content: Stay Connected Spend the Fall with us! The St. John’s story is the story of New York, and our alumni...Text Content: Welcome to St. John’s University Conference Services! Conference Services coordinates the space requ...

Text Content: Not all roads lead to riches... ...some lead to Johnny Thunderbird with an unfortunate message. Thin...
Visiting: https://www.stjohns.edu/node/132741?degree=716
Visiting: https://www.stjohns.edu/node/132741?degree=2836
Text Content: Congratulations Class of 2024! Commencement is a very special time at St. John's University that we ...
Visiting: https://www.stjohns.edu/node/30431#flyers
Text Content: 8000 Utopia Parkway Queens NY 11439 718-990-2000 St. John’s University does not discriminate on the ...
Text



Error crawling https://www.stjohns.edu/sites/default/files/2023-10/School%20of%20Education%20Visiting%20Scholar%20Review%20Checklist.docx: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/node/132741?degree=3851
Text Content: Elda Tsou joined the faculty at St. John’s University in the fall of 2007. She offers undergraduate ...Text Content: Red Storm Sports, Clubs, and Intramurals for All A founding member of the BIG EAST, St. John’s has a...

Visiting: https://www.stjohns.edu/academics/faculty/syed-ahmad-chan-bukhari
Text Content: Each semester, students enrolled in the Global Microloan Program will update this site with their we...
Visiting: https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Instructions%20for%20Completing%20Summer%20Program%20Form.docx
Text Content: As the global community becomes increasingly interdependent, we are faced with new political, cultur...
Text Content: The Academic Center for Equity and Inclusion (or “Academi



Visiting: https://www.stjohns.edu/academics/faculty/cynthia-chambers
Visiting: https://www.stjohns.edu/admission/other-programs/visiting-students?utm_source=Academic+Info+Accordion+Visiting&utm_medium=website&utm_campaign=2022+Admissions+Changes&utm_id=UGAdmissions
Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Instructions%20for%20Completing%20Summer%20Program%20Form.docx: 'NoneType' object has no attribute 'find_all'
Visiting: https://www.stjohns.edu/about/news/2017-10-11/10-under-10-young-alumni-make-difference




Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Camp%20Emergency%20Contact%20%26%20Consent%20to%20Treat%20Form.docx: 'NoneType' object has no attribute 'find_all'
Text Content: Environmental Studies; Homeland Security 8000 Utopia Parkway Queens NY 11439 718-990-2000 St. John’s...
Visiting: https://www.stjohns.edu/admission-aid/undergraduate-admission/test-optional
Text Content: St. John's students benefit from New York City’s culturally rich environment, exciting entertainment...
Visiting: https://www.stjohns.edu/academics/programs/public-history-master-arts
Visiting: https://www.stjohns.edu/academics/programs/library-and-information-science-master-science
Text Content: A dedication to diversity, equity, and inclusion is at the heart of our mission. As a Catholic and V...
Text Content: First-year applicants to St. John’s University have the option to submit a test-optional application...Visiting: https://www.stjohns.edu/elevate-your-earning-potential/mas



Text Content: Whether you want to learn more about our beautiful residential campus, our 100+ exciting majors, or ...Text Content: In the Department of Philosophy at St. John’s University, we feel the study of philosophy is central...Text Content: The Founders Society consists of alumni and friends who have left an indelible mark on the St. John’...

Visiting: https://www.stjohns.edu/academics/faculty/arnold-felberbaum
Visiting: https://www.stjohns.edu/academics/programs/social-justice-information-professions-advanced-certificate





Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Meningitis%20Vaccination%20Parent%20Information%20%26%20Response%20Form.docx: 'NoneType' object has no attribute 'find_all'
Text Content: Our world-class scholar-teachers are one reason students from around the globe select St. John’s to ...
Text Content: Our world-class scholar-teachers are one reason students from around the globe select St. John’s to ...
Visiting: https://www.stjohns.edu/about/leadership-and-administration/administrative-offices/office-provost/division-student-affairs
Text Content: ...
Text Content: Networking Is Vital to Television Success, Says Alumna After Annette Lellis-DeFonzo ’92SVC arrived a...Visiting: https://www.stjohns.edu/admission/request-info?utm_source=Sidebar%20Info&utm_medium=website&utm_campaign=2022%20Admissions%20Changes&utm_id=UGAdmissions

Text Content: On-Campus Training Prepares Alumna for Television Career As a child, Stefany Steinman-Nonnenmacher ’...
Error craw



Text Content: How can $750 bring you the world? Introducing St. John's Global Passport Program! At St. John's Univ...
Visiting: http://www.stjohns.edu/studyabroad
Text Content: There are a number of ways to support the University and make a difference in the lives of our stude...
Visiting: https://www.stjohns.edu/about/news/2021-01-22/cyber-security-program-host-talk-robots
Text Content: Faculty from the Cyber Security Systems program in The Lesley H. and William L. Collins College of P...
Error crawling https://www.stjohns.edu/sites/default/files/uploads/2022_Clare_Boothe_Luce_Summer_Research_Scholarship_application%20%285%29.docx: 'NoneType' object has no attribute 'find_all'
Text Content: Produced by:             Kathleen Smith, Class of 2022, MS International Hospitality Management When...
Text Content: Not all roads lead to riches... ...some lead to Johnny Thunderbird with an unfortunate message. Thin...
Visiting: https://www.stjohns.edu/academics/schools-and-colleges/st-johns-col



Error crawling https://www.stjohns.edu/files/2023-annual-security-and-fire-safety-report: 'NoneType' object has no attribute 'find_all'




Error crawling https://www.stjohns.edu/sites/default/files/2020-01/53R%20Brief%20%28Revised%29.PDF: 'NoneType' object has no attribute 'find_all'


FileNotFoundError: [Errno 2] No such file or directory: "Graduate/Law Aid | St. John's University.txt"

In [None]:
#Working Code

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import threading
import re

# Create webcrawler class
class WebCrawler:
    def __init__(self, start_url, visiting_strategy='preorder'):
        self.start_url = start_url
        self.visiting_strategy = visiting_strategy.lower()
        self.visited_urls = set()
        self.corpus = {}
        self.main_domain = urlparse(start_url).netloc
        self.lock = threading.Lock()  # Lock for thread-safe access to shared data

    def crawl(self, url):
        if url not in self.visited_urls and self.is_same_domain(url):
            print(f"Visiting: {url}")
            self.visited_urls.add(url)
            try:
                response = requests.get(url)
                soup = BeautifulSoup(response.content, 'html.parser')
                title = soup.title.string.strip() if soup.title else 'Untitled'
                text_content = self.extract_text_content(soup)

                with self.lock:  # Thread-safe update of shared data
                    self.corpus[url] = text_content

                print(f"Text Content: {text_content[:100]}...")  # Output a snippet of text

                if self.visiting_strategy == 'preorder':
                    links = self.extract_links(soup)
                    threads = []
                    for link in links:
                        thread = threading.Thread(target=self.crawl, args=(link,))
                        threads.append(thread)
                        thread.start()

                    # Wait for all threads to complete
                    for thread in threads:
                        thread.join()

                # Additional visiting strategies (inorder, postorder) can be implemented here

            except Exception as e:
                print(f"Error crawling {url}: {e}")

    def extract_text_content(self, soup):
        # Extract text content only from the body of the HTML
        text_content = ' '.join([p.get_text(separator=' ', strip=True) for p in soup.body.find_all('p')])
        return text_content

    def extract_links(self, soup):
        # Extract all links from the page
        links = [link.get('href') for link in soup.find_all('a', href=True)]
        # Filter internal links only
        links = [urljoin(self.start_url, link) for link in links if link.startswith(('http', 'https'))]
        # Exclude PDF links
        links = [link for link in links if not link.endswith('.pdf')]
        # Filter out external links
        links = [link for link in links if self.is_same_domain(link)]
        # Exclude links with 'resources' in the URL
        links = [link for link in links if 'resources' not in link.lower()]
        return links

    def is_same_domain(self, url):
        return urlparse(url).netloc == self.main_domain

    def start_crawling(self):
        self.crawl(self.start_url)

    def get_crawled_data(self):
        return self.corpus

if __name__ == "__main__":
    # Get the starting URL from the user
    start_url = input("Enter the website's URL: ")

    # Instantiate the WebCrawler with the provided URL and visiting strategy
    crawler = WebCrawler(start_url=start_url, visiting_strategy='preorder')

    # Start crawling
    crawler.start_crawling()

    # Get the crawled data
    crawled_data = crawler.get_crawled_data()

    # Print the crawled data
    for url, content in crawled_data.items():
        print(f"URL: {url}")
        print(f"Content: {content[:100]}...")  # Print a snippet of content


Enter the website's URL: https://www.stjohns.edu/
Visiting: https://www.stjohns.edu/
Text Content: See how your journey aligns with what drives you. Hannah M. Queens, NY Maria Orlando, FL Jenna Charl...
Visiting: https://www.stjohns.edu/life-st-johns/career-services
Visiting: https://www.stjohns.edu/about/leadership-and-administration/office-president/presidents-society
Visiting: https://www.stjohns.edu/who-we-are/faith-and-mission/campus-ministry/opportunities/plunge-programVisiting: https://www.stjohns.edu/who-we-are/campus-sustainability

Visiting: https://www.stjohns.edu/academics/programs?level%5B151%5D=151
Visiting: https://www.stjohns.edu/admission/graduate-admission
Text Content: Sustainability is a long-term responsibility to meet the needs of the present without compromising t...
Text Content: The graduate schools at St. John’s University in New York City offer more than 60 graduate degree an...
Visiting: https://www.stjohns.edu/admission/apply
Visiting: https://www.stjohns.e



Visiting: https://www.stjohns.edu/node/1826?degree=2031&school=21
Visiting: https://www.stjohns.edu/academics/programs/accounting-master-business-administration
Text Content: Located in Queens, New York, one of the most diverse places in the world and one of the five borough...
Text Content: You can create your own career path by taking advantage of the Law School's resources. Since not all...Visiting: https://www.stjohns.edu/academics/programs?degree=2811&school=21Visiting: https://www.stjohns.edu/academics/faculty/victoria-l-shoaf-phd

Visiting: https://www.stjohns.edu/academics/programs/educational-leadership-master-business-administrationVisiting: https://www.stjohns.edu/news-media/news/2022-06-23/st-johns-law-launches-house-defense-and-advocacy-clinic


Text Content: Student and graduate employment and internship postings are immediately posted for students and grad...Visiting: https://www.stjohns.edu/academics/schools/peter-j-tobin-college-business/maurice-r-greenberg-school-risk



Text Content: Welcome to the Health Data Science Research Lab, located at Collins College of Professional Studies ...
Text Content: St. John’s offers a free online application for all 100+ undergraduate programs, and graduate applic...
Visiting: https://www.stjohns.edu/node/30431#flyers
Text Content: First-year applicants to St. John’s University have the option to submit a test-optional application...
Visiting: https://www.stjohns.edu/node/1411?type=71
Error crawling https://www.stjohns.edu/sites/default/files/2023-10/School%20of%20Education%20Visiting%20Scholar%20Review%20Checklist.docx: 'NoneType' object has no attribute 'find_all'
Text Content: Sustainability is a growing field in both the public and private sectors. The Master of Arts in Envi...
Text Content:  The Collins College of Professional Studies offers fast track programs (pathways/dual degrees) that...
Text Content: 8000 Utopia Parkway Queens NY 11439 718-990-2000 St. John’s University does not discriminate on the ...Visi



Visiting: https://www.stjohns.edu/academics/faculty/syed-ahmad-chan-bukhari
Visiting: https://www.stjohns.edu/academics/commencement#2024
Visiting: https://www.stjohns.edu/alumni-friends/ways-contribute/founders-society
Error crawling https://www.stjohns.edu/sites/default/files/2020-03/M1-12853%20MA%20Chinese%20%26%20EAS.PDF: 'NoneType' object has no attribute 'find_all'
Text Content: Bukhari, S. A., Mart\'\inez-Romero, Marcos, , O’Connor, M. J., Egyedi, A. L., Willrett, D., Graybeal...
Visiting: https://www.stjohns.edu/academics/faculty/muhammed-bilah
Text Content: The Television, Film and Radio Center is a broadcast-quality HD production and post-production facil...
Text Content: Congratulations Class of 2024! Commencement is a very special time at St. John's University that we ...
Text Content: The Founders Society consists of alumni and friends who have left an indelible mark on the St. John’...
Visiting: https://www.stjohns.edu/academics/faculty/enju-wang
Text Content: Environment



Text Content: At St. John's, your journey is as spiritual as it is intellectual.  Our passion for creative teachin...
Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Instructions%20for%20Completing%20Summer%20Program%20Form.docx: 'NoneType' object has no attribute 'find_all'
Text Content: Caroline Fuchs is the University Librarian, Dean of Libraries and Professor. She holds an MLS, an M....
Visiting: https://www.stjohns.edu/academics/schools/college-professional-studies/innovation-and-entrepreneurship/design-factory
Visiting: https://www.stjohns.edu/academics/programs/integrated-advertising-communication-master-science
Visiting: https://www.stjohns.edu/about/campuses-and-locations/manhattan-campus?utm_source=Sidebar%20Manhattan&utm_medium=website&utm_campaign=2022%20Admissions%20Changes&utm_id=UGAdmissions
Text Content: New Design Factory is Open for Innovation Pitch Johnny Competition Showcases Student Entrepreneurs A...
Visiting: https://www.stjohns.ed



Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Camp%20Emergency%20Contact%20%26%20Consent%20to%20Treat%20Form.docx: 'NoneType' object has no attribute 'find_all'
Text Content: The St. John’s University Bachelor of Fine Arts (B.F.A.) degree in Graphic Design is a 132-credit pr...
Text Content: Joel Ndimkora ’12C, ’22CCPS, Joel Ndimkora ’12C, ’22CCPS, who graduated with a Bachelor of Arts degr...
Text Content: Produced by:             Office of Marketing and Communications As a Catholic and Vincentian Univers...
Visiting: https://www.stjohns.edu/about/campuses-and-locations/rome-campus?utm_source=Sidebar%20Rome&utm_medium=website&utm_campaign=2022%20Admissions%20Changes&utm_id=UGAdmissions
Visiting: https://www.stjohns.edu/about/news/success-stories/salvatore-valentinetti
Visiting: https://www.stjohns.edu/academics/programs/museum-administration-master-arts
Visiting: https://www.stjohns.edu/academics/faculty/ann-jusino
Text Content: 8000 Utopia Parkway Qu



Visiting: https://www.stjohns.edu/about/news/success-stories?type=71
Visiting: https://www.stjohns.edu/academics/programs/library-and-information-science-master-science
Error crawling https://www.stjohns.edu/sites/default/files/2022-04/SJU%202022%20Meningitis%20Vaccination%20Parent%20Information%20%26%20Response%20Form.docx: 'NoneType' object has no attribute 'find_all'
Text Content: Practitioners of Public History document, preserve, and interpret the past. Earn a master's degree i...
Visiting: https://www.stjohns.edu/academics/faculty/anthony-todman
Visiting: https://www.stjohns.edu/academics/faculty/benjamin-turner




Text Content: The Lesley H. and William L. Collins College of Professional Studies (CCPS) centers and laboratories...
Visiting: https://www.stjohns.edu/academics/faculty/john-m-otero
Error crawling https://www.stjohns.edu/files/january-2020-review-business: 'NoneType' object has no attribute 'find_all'
Text Content: St. John’s 30-credit master’s degree in Cyber and Information Security accelerates your career and t...
Visiting: https://www.stjohns.edu/academics/faculty/anil-chacko
Text Content: The program provides extensive training in research methods and specialized areas of psychological s...
Text Content: Earn an ALA-accredited master's degree in Library and Information Science online! Earn an ALA-accred...
Text Content: Communication Arts; Journalism; Mass Communications; Criminal Justice; Multicultural Studies 8000 Ut...
Visiting: https://www.stjohns.edu/academics/programs/health-and-human-services-bachelor-science
Text Content: 8000 Utopia Parkway Queens NY 11439 718-990-2000 S



Text Content: Producer | Senior Marketing & Content Executive | Consultant | Board Member | Advisor A dynamic, pro...
Text Content: Accelerate your path to law by pursuing a combined B.A./J.D. program. Lawyers advise and represent i...
Text Content: Our world-class scholar-teachers are one reason students from around the globe select St. John’s to ...
Text Content: 8000 Utopia Parkway Queens NY 11439 718-990-2000 St. John’s University does not discriminate on the ...Text Content: I was born in the Dominican Republic. My family immigrated to Queens, NY, in the mid 1960s and I hav...
Text Content: Our world-class scholar-teachers are one reason students from around the globe select St. John’s to ...Error crawling https://www.stjohns.edu/sites/default/files/uploads/2022_Clare_Boothe_Luce_Summer_Research_Scholarship_application%20%285%29.docx: 'NoneType' object has no attribute 'find_all'


Text Content: “Education is one thing no one can take away from you.” This quote by Elin Nordegren f



Error crawling https://www.stjohns.edu/files/2023-annual-security-and-fire-safety-report: 'NoneType' object has no attribute 'find_all'




Error crawling https://www.stjohns.edu/sites/default/files/2020-01/53R%20Brief%20%28Revised%29.PDF: 'NoneType' object has no attribute 'find_all'
URL: https://www.stjohns.edu/
Content: See how your journey aligns with what drives you. Hannah M. Queens, NY Maria Orlando, FL Jenna Charl...
URL: https://www.stjohns.edu/who-we-are/campus-sustainability
Content: Sustainability is a long-term responsibility to meet the needs of the present without compromising t...
URL: https://www.stjohns.edu/admission/graduate-admission
Content: The graduate schools at St. John’s University in New York City offer more than 60 graduate degree an...
URL: https://www.stjohns.edu/who-we-are/faith-and-mission/campus-ministry/opportunities/plunge-program
Content: Plunges, or service immersion, are weeklong experiences where students are given the opportunity to ...
URL: https://www.stjohns.edu/life-st-johns/career-services
Content: University Career Services is committed to ensuring student and alumni success. O

In [None]:
# Assuming 'crawled_data' is the dictionary containing the crawled data
desired_url = 'https://www.stjohns.edu/academics/programs/clinical-laboratory-sciences-bachelor-science'

# Check if the URL is present in the crawled data
if desired_url in crawled_data:
    content = crawled_data[desired_url]
    print("Content for URL:")
    print(content)
else:
    print("URL not found in crawled data.")


Content for URL:
70% of medical decisions depend on laboratory test results performed by clinical laboratory scientists. -CDC The clinical laboratory sciences program is an accredited undergraduate degree achieving a Bachelor of Sciences that leads to national board certification, from the American Society for Clinical Pathology (ASCP), and NYS licensure. Medical Laboratory Scientists (MLS), also known as Clinical Laboratory Technologists, are nationally certified healthcare professionals who aid in the detection, diagnosis, and treatment of disease. An undergraduate degree in clinical laboratory sciences involves an extensive theoretical and experiential knowledge base through clinical internships at affiliated hospitals and reference laboratories. The Clinical Laboratory Sciences program within the College of Pharmacy and Health Sciences is committed to a student-centered inclusive pedagogical model that supports student success, personal, and professional development. Our graduates 