Summarization Website

This repository contains the code for a summarization website that can generate summaries from either YouTube videos or PDF files. The backend is built using Flask, while the frontend is developed with React. The project utilizes several machine learning models and libraries to perform tasks such as transcription, translation, and summarization.

Features

Summarize YouTube videos by extracting the transcript and generating a summary.
Summarize PDF files by extracting text and generating a summary.
Seamless integration between the backend and frontend.
Provides a clean and user-friendly interface.

Setup and Installation

Backend

Clone the repository:

git clone https://github.com/your-username/summarization-website.git
cd summarization-website

Create a requirements.txt file with the following content:

Flask==2.0.1
flask-cors==3.0.10
torch==2.0.1
transformers==4.28.1
yt-dlp==2021.12.1
youtube-transcript-api==0.4.4
faster-whisper==0.2.3
pymupdf==1.19.6
pytesseract==0.3.8
Pillow==9.0.1

Install the dependencies:
```
pip install -r requirements.txt
```
Run the Flask application:
```
flask --app server run
```

Frontend

Navigate to the frontend directory:
```
cd frontend
```
Install the dependencies:
```
npm install
```
Start the React application:
```
npm run dev
```

Usage

Access the website: Open your browser and go to http://localhost:3000 to access the summarization website.
Summarize a YouTube video:
- Enter the YouTube video link in the provided input field.
- Click the "Generate" button to fetch the transcript and generate a summary.
Summarize a PDF file:
- Upload a PDF file using the file input field.
- Click the "Generate" button to extract text from the PDF and generate a summary.

API Endpoints

`POST /generate-summary`

Generates a summary from a YouTube video.

Request Body:

{
  "video_link": "https://www.youtube.com/watch?v=your_video_id"
}

Response:

{
  "summary": "Summary of the video."
}

`POST /upload-pdf`

Generates a summary from a PDF file.

Request:
- Multipart/form-data with the PDF file.
Response:
```
{
  "summary": "Summary of the PDF."
}
```

Frontend Components

App.js: Main component that handles the state and logic for generating summaries.
PDFInput.js: Component for uploading PDF files.
Link.js: Component for inputting YouTube video links.
Summary.js: Component for displaying the generated summary.
EmbedVideo.js: Component for embedding the YouTube video.
Spinner.js: Component for displaying a loading spinner.

Libraries Used

Backend

Flask: Web framework for building the backend.
Flask-CORS: Extension for handling Cross-Origin Resource Sharing (CORS).
Torch: Machine learning library for deep learning models.
Transformers: Library for natural language processing models.
yt-dlp: YouTube video downloader.
youtube-transcript-api: API for fetching YouTube video transcripts.
faster-whisper: Library for faster Whisper model inference.
PyMuPDF: Library for PDF processing.
pytesseract: OCR tool for extracting text from images.
Pillow: Imaging library for Python.

Frontend

React: JavaScript library for building user interfaces.
React-DOM: Package for working with the DOM in React applications.
Tailwind CSS: Utility first CSS framework
React Spinner: Package for adding a spinner

Methodology

The methodology for generating summaries from YouTube videos or PDF documents involves several steps, as illustrated in the following flowchart:

Workflow:

User Interface for Input:
- The user provides a YouTube link or uploads a PDF document through the user interface.
YouTube Video Processing:
- If a YouTube link is provided, the system checks if a transcript is available:
  - Yes: The transcript is cleaned up using spaCy.
  - No: The audio is downloaded and transcribed using Whisper.
PDF Document Processing:
- If a PDF document is uploaded, the system checks if the text is selectable:
  - Yes: Text is extracted directly from the PDF.
  - No: The PDF is scanned and text is extracted using OCR.
Text Processing:
- The extracted text (from YouTube or PDF) is chunked into smaller parts for easier processing.
Summarization:
- Each chunk is summarized individually using the summarization model.
Summary Cleanup:
- The individual summaries are concatenated and cleaned up to form the final summary.
Output:
- The final generated summary is outputted to the user.

This structured approach ensures that the system can handle different types of input and generate coherent summaries effectively.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
client		client
server		server
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarization Website

Table of Contents

Features

Setup and Installation

Backend

Frontend

Usage

API Endpoints

`POST /generate-summary`

`POST /upload-pdf`

Frontend Components

Libraries Used

Backend

Frontend

Methodology

Workflow:

Screenshots

Home Screen

Summary from YT Link

Summary for PDF

About

Releases

Packages

Languages

karthiikJR/video-pdf-summarization

Folders and files

Latest commit

History

Repository files navigation

Summarization Website

Table of Contents

Features

Setup and Installation

Backend

Frontend

Usage

API Endpoints

POST /generate-summary

POST /upload-pdf

Frontend Components

Libraries Used

Backend

Frontend

Methodology

Workflow:

Screenshots

Home Screen

Summary from YT Link

Summary for PDF

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /generate-summary`

`POST /upload-pdf`

Packages