Skip to content

mezeru/Intelligent-Study-Material-Download-Program

Repository files navigation

A Simple Web Scraping and Text Identification Program (Python 3.7.6) Documentation

Introduction

This documentation provides instructions on how to use the "A Simple Web Scraping and Text Identification Program." This program is implemented in Python 3.7.6 and offers the following features:

  • Downloading all downloadable files and links from a website.
  • Segregating different types of files.
  • Displaying the top five PDF files relevant to a phrase entered by the user.
  • Displaying phrases related to the one entered by the user.

How to Use

Follow the steps below to utilize the program effectively:

  • Enter the address of the website or URL: When prompted, provide the address of the website you want to scrape or the URL of the file you wish to download. Ensure that the URL is valid and accessible.
  • Enter the Phrase You Want to Search: Enter the phrase you want to search for in the downloaded content. This phrase will be used to identify relevant PDF files and display related phrases.
  • Initiate the Process: Once the website address or URL and the search phrase are entered, the program will start processing the data. It will inform you if the URL is invalid or if there are issues with your internet connection.
  • HTTP Status Response Code: The program will display the HTTP status response code received from the website. This code indicates the success or failure of the HTTP request made by the program.
  • Display Found Files: While searching for files, the program will display the types of files it has found. It will segregate different types of files based on their extensions.
  • Specify the Data Folder: The program will prompt you to name the folder where the downloaded data will be stored. Enter a suitable name for the folder.
  • Check for PDF Availability: The program will check if PDF files are available among the downloaded files.
  • Display Top 5 Relevant PDF Files: If PDF files are found, the program will parse through them and display the top five PDF files relevant to the entered search phrase. The relevance is determined based on the content of the PDF files.
  • Display Similar Phrases: The program will display phrases similar to the one entered by the user. These phrases are extracted from the downloaded content and can provide further insights or related information.
  • Process Completion and Data Storage: Once the program completes the process, it will store the downloaded data in the same location as the program file. The data will be saved in the folder specified earlier.

Notes

  • Make sure you have a stable internet connection and sufficient disk space to store the downloaded data
  • Please Read the Requirement.txt File and install any modules necessary

Feel free to explore and utilize this "A Simple Web Scraping and Text Identification Program" to extract useful information and identify relevant content from websites or downloadable files.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages